Pose sentences : a new representation for understanding human actions
Item Usage Stats
In this thesis we address the problem of human action recognition from video sequences. Our main contribution to the literature is the compact use of poses while representing videos and most importantly considering actions as pose-sentences and exploit string matching approaches for classification. We focus on single actions, where the actor performs one simple action through the video sequence. We represent actions as documents consisting of words, where a word refers to a pose in a frame. We think pose information is a powerful source for describing actions. In search of a robust pose descriptor, we make use of four well-known techniques to extract pose information, Histogram of Oriented Gradients, k-Adjacent Segments, Shape Context and Optical Flow Histograms. To represent actions, first we generate a codebook which will act as a dictionary for our action dataset. Action sequences are then represented using a sequence of pose-words, as posesentences. The similarity between two actions are obtained using string matching techniques. We also apply a bag-of-poses approach for comparison purposes and show the superiority of pose-sentences. We test the efficiency of our method with two widely used benchmark datasets, Weizmann and KTH. We show that pose is indeed very descriptive while representing actions, and without having to examine complex dynamic characteristics of actions, one can apply simple techniques with equally successful results.