Browsing by Subject "Action recognition"

Now showing 1 - 13 of 13

Open Access
3D human pose search using oriented cylinders
(IEEE, 2009-09-10) Pehlivan, Selen; Duygulu, Pınar
In this study, we present a representation based on a new 3D search technique for volumetric human poses which is then used to recognize actions in three dimensional video sequences. We generate a set of cylinder like 3D kernels in various sizes and orientations. These kernels are searched over 3D volumes to find high response regions. The distribution of these responses are then used to represent a 3D pose. We use the proposed representation for (i) pose retrieval using Nearest Neighbor (NN) based classification and Support Vector Machine (SVM) based classification methods, and for (ii) action recognition on a set of actions using Dynamic Time Warping (DTW) and Hidden Markov Model (HMM) based classification methods. Evaluations on IXMAS dataset supports the effectiveness of such a robust pose representation. ©2009 IEEE.
Open Access
Histogram of oriented rectangles: a new pose descriptor for human action recognition
(Elsevier BV, 2009-09-02) İkizler, N.; Duygulu, P.
Most of the approaches to human action recognition tend to form complex models which require lots of parameter estimation and computation time. In this study, we show that, human actions can be simply represented by pose without dealing with the complex representation of dynamics. Based on this idea, we propose a novel pose descriptor which we name as Histogram-of-Oriented-Rectangles (HOR) for representing and recognizing human actions in videos. We represent each human pose in an action sequence by oriented rectangular patches extracted over the human silhouette. We then form spatial oriented histograms to represent the distribution of these rectangular patches. We make use of several matching strategies to carry the information from the spatial domain described by the HOR descriptor to temporal domain. These are (i) nearest neighbor classification, which recognizes the actions by matching the descriptors of each frame, (ii) global histogramming, which extends the idea of Motion Energy Image proposed by Bobick and Davis to rectangular patches, (iii) a classifier-based approach using Support Vector Machines, and (iv) adaptation of Dynamic Time Warping on the temporal representation of the HOR descriptor. For the cases when pose descriptor is not sufficiently strong alone, such as to differentiate actions "jogging" and "running", we also incorporate a simple velocity descriptor as a prior to the pose based classification step. We test our system with different configurations and experiment on two commonly used action datasets: the Weizmann dataset and the KTH dataset. Results show that our method is superior to other methods on Weizmann dataset with a perfect accuracy rate of 100%, and is comparable to the other methods on KTH dataset with a very high success rate close to 90%. These results prove that with a simple and compact representation, we can achieve robust recognition of human actions, compared to complex representations. © 2009 Elsevier B.V. All rights reserved.
Open Access
Human action recognition with line and flow histograms
(IEEE, 2008-12) İkizler, Nazlı; Cinbiş, R. Gökberk; Duygulu, Pınar
We present a compact representation for human action recognition in videos using line and optical flow histograms. We introduce a new shape descriptor based on the distribution of lines which are fitted to boundaries of human figures. By using an entropy-based approach, we apply feature selection to densify our feature representation, thus, minimizing classification time without degrading accuracy. We also use a compact representation of optical flow for motion information. Using line and flow histograms together with global velocity information, we show that high-accuracy action recognition is possible, even in challenging recording conditions. © 2008 IEEE.
Open Access
A key-pose based representation for human action recognition
(2011) Kurt, Mehmet Can
This thesis utilizes a key-pose based representation to recognize human actions in videos. We believe that the pose of the human figure is a powerful source for describing the nature of the ongoing action in a frame. Each action can be represented by a unique set of frames that include all the possible spatial configurations of the human body parts throughout the time the action is performed. Such set of frames for each action referred as “key poses” uniquely distinguishes that action from the rest. For extracting “key poses”, we define a similarity value between the poses in a pair of frames by using the lines forming the human figure along with a shape matching method. By the help of a clustering algorithm, we group the similar frames of each action into a number of clusters and use the centroids as “key poses” for that action. Moreover, in order to utilize the motion information present in the action, we include simple line displacement vectors for each frame in the “key poses” selection process. Experiments on Weizmann and KTH datasets show the effectiveness of our key-pose based approach in representing and recognizing human actions.
Open Access
Knives are picked before slices are cut: Recognition through activity sequence analysis
(ACM, 2013-10) İşcen, Ahmet; Duygulu, Pınar
In this paper, we introduce a model to classify cooking activities using their visual and temporal coherence information. We fuse multiple feature descriptors for fine-grained activity recognition as we would need every single detail to catch even subtle differences between classes with low inter-class variance. Considering the observation that daily activities such as cooking are likely to be performed in sequential patterns of activities, we also model temporal coherence of activities. By combining both aspects, we show that we can improve the overall accuracy of cooking recognition tasks. © Copyright 2013 ACM.
Open Access
A line based pose representation for human action recognition
(2013) Baysal, S.; Duygulu, P.
In this paper, we utilize a line based pose representation to recognize human actions in videos. We represent the pose in each frame by employing a collection of line-pairs, so that limb and joint movements are better described and the geometrical relationships among the lines forming the human figure are captured. We contribute to the literature by proposing a new method that matches line-pairs of two poses to compute the similarity between them. Moreover, to encapsulate the global motion information of a pose sequence, we introduce line-flow histograms, which are extracted by matching line segments in consecutive frames. Experimental results on Weizmann and KTH datasets emphasize the power of our pose representation, and show the effectiveness of using pose ordering and line-flow histograms together in grasping the nature of an action and distinguishing one from the others. © 2013 Elsevier B.V. All rights reserved.
Open Access
A new pose-based representation for recognizing actions from multiple cameras
(Academic Press, 2011-02) Pehlivan, S.; Duygulu, P.
We address the problem of recognizing actions from arbitrary views for a multi-camera system. We argue that poses are important for understanding human actions and the strength of the pose representation affects the overall performance of the action recognition system. Based on this idea, we present a new view-independent representation for human poses. Assuming that the data is initially provided in the form of volumetric data, the volume of the human body is first divided into a sequence of horizontal layers, and then the intersections of the body segments with each layer are coded with enclosing circles. The circular features in all layers (i) the number of circles, (ii) the area of the outer circle, and (iii) the area of the inner circle are then used to generate a pose descriptor. The pose descriptors of all frames in an action sequence are further combined to generate corresponding motion descriptors. Action recognition is then performed with a simple nearest neighbor classifier. Experiments performed on the benchmark IXMAS multi-view dataset demonstrate that the performance of our method is comparable to the other methods in the literature. © 2010 Elsevier Inc. All rights reserved.
Open Access
Pose sentences : a new representation for understanding human actions
(2008) Hatun, Kardelen
In this thesis we address the problem of human action recognition from video sequences. Our main contribution to the literature is the compact use of poses while representing videos and most importantly considering actions as pose-sentences and exploit string matching approaches for classification. We focus on single actions, where the actor performs one simple action through the video sequence. We represent actions as documents consisting of words, where a word refers to a pose in a frame. We think pose information is a powerful source for describing actions. In search of a robust pose descriptor, we make use of four well-known techniques to extract pose information, Histogram of Oriented Gradients, k-Adjacent Segments, Shape Context and Optical Flow Histograms. To represent actions, first we generate a codebook which will act as a dictionary for our action dataset. Action sequences are then represented using a sequence of pose-words, as posesentences. The similarity between two actions are obtained using string matching techniques. We also apply a bag-of-poses approach for comparison purposes and show the superiority of pose-sentences. We test the efficiency of our method with two widely used benchmark datasets, Weizmann and KTH. We show that pose is indeed very descriptive while representing actions, and without having to examine complex dynamic characteristics of actions, one can apply simple techniques with equally successful results.
Open Access
Pose sentences: a new representation for action recognition using sequence of pose words
(IEEE, 2008-12) Hatun, Kardelen; Duygulu, Pınar
We propose a method for recognizing human actions in videos. Inspired from the recent bag-of-words approaches, we represent actions as documents consisting of words, where a word refers to the pose in a frame. Histogram of oriented gradients (HOG) features are used to describe poses, which are then vector quantized to obtain pose-words. As an alternative to bagof- words approaches, that only represent actions as a collection of words by discarding the temporal characteristics of actions, we represent videos as ordered sequence of pose-words, that is as pose sentences. Then, string matching techniques are exploited to find the similarity of two action sequences. In the experiments, performed on data set of Blank et al., 92% performance is obtained. © 2008 IEEE.
Open Access
Prototypes : exemplar based video representation
(2016-06) Yalçınkaya, Özge
Recognition of actions from videos is a widely studied problem and there have been many solutions introduced over the years. Labeling of the training data that is required for classification has been an important bottleneck for scalability of these methods. On the other hand, utilization of large number of weakly-labeled web data continues to be a challenge due to the noisy content of the videos. In this study, we tackle the problem of eliminating irrelevant videos through pruning the collection and discovering the most representative elements. Motivated by the success of methods that discover the discriminative parts for image classification, we propose a novel video representation method that is based on selected distinctive exemplars. We call these discriminative exemplars as “prototypes” which are chosen from each action class separately to be representative for the class of interest. Then, we use these prototypes to describe the entire dataset. Following the traditional supervised classification methods and utilizing the available state-of-the-art low and deep-level features, we show that even with simple selection and representation methods, use of prototypes can increase the recognition performance. Moreover, by reducing the training data to the selected prototypes only, we show that less number of carefully selected examples could achieve the performance of a larger training data. In addition to prototypes, we explore the effect of irrelevant data elimination in action recognition and give the experimental results which are comparable to or better than the state-of-the-art studies on benchmark video datasets UCF-101 and ActivityNet.
Open Access
Snippet based trajectory statistics histograms for assistive technologies
(Springer, 2014-09) İscen, Ahmet; Wang Y.; Duygulu, Pınar; Hauptmann, A.
Due to increasing hospital costs and traveling time, more and more patients decide to use medical devices at home without traveling to the hospital. However, these devices are not always very straight-forward for usage, and the recent reports show that there are many injuries and even deaths caused by the wrong use of these devices. Since human supervision during every usage is impractical, there is a need for computer vision systems that would recognize actions and detect if the patient has done something wrong. In this paper, we propose to use Snippet Based Trajectory Statistics Histograms descriptor to recognize actions in two medical device usage problems; inhaler device usage and infusion pump usage. Snippet Based Trajectory Statistics Histograms encodes the motion and position statistics of densely extracted trajectories from a video. Our experiments show that by using Snippet Based Trajectory Statistics Histograms technique, we improve the overall performance for both tasks. Additionally, this method does not require heavy computation, and is suitable for real-time systems. © Springer International Publishing Switzerland 2015.
Open Access
Vision based behavior recognition of laboratory animals for drug analysis and testing
(2009) Sandıkcı, Selçuk
In pharmacological experiments, a popular method to discover the effects of psychotherapeutic drugs is to monitor behaviors of laboratory mice subjected to drugs by vision sensors. Such surveillance operations are currently performed by human observers for practical reasons. Automating behavior analysis of laboratory mice by vision-based methods saves both time and human labor. In this study, we focus on automated action recognition of laboratory mice from short video clips in which only one action is performed. A two-stage hierarchical recognition method is designed to address the problem. In the first stage, still actions such as sleeping are separated from other action classes based on the amount of the motion area. Remaining action classes are discriminated by the second stage for which we propose four alternative methods. In the first method, we project 3D action volume onto 2D images by encoding temporal variations of each pixel using discrete wavelet transform (DWT). Resulting images are modeled and classified by hidden Markov models in maximum likelihood sense. The second method transforms action recognition problem into a sequence matching problem by explicitly describing pose of the subject in each frame. Instead of segmenting the subject from the background, we only take temporally active portions of the subject into consideration in pose description. Histograms of oriented gradients are employed to describe poses in frames. In the third method, actions are represented by a set of histograms of normalized spatio-temporal gradients computed from entire action volume at different temporal resolutions. The last method assumes that actions are collections of known spatio-temporal templates and can be described by histograms of those. To locate and describe such templates in actions, multi-scale 3D Harris corner detector and histogram of oriented gradients and optical flow vectors are employed, respectively. We test the proposed action recognition framework on a publicly available mice action dataset. In addition, we provide comparisons of each method with well-known studies in the literature. We find that the second and the fourth methods outperform both related studies and the other two methods in our framework in overall recognition rates. However, the more successful methods suffer from heavy computational cost. This study shows that representing actions as an ordered sequence of pose descriptors is quite effective in action recognition. In addition, success of the fourth method reveals that sparse spatio-temporal templates characterize the content of actions quite well.
Open Access
Yüksek boyutlu öznitelik uzayında hareket tanıma
(IEEE, 2013-04) Adıgüzel, Hande; Erdem, Hayrettin; Ferhatosmanoǧlu, Hakan; Duygulu, Pınar
Analyzing and interpreting human actions is an important and challenging area of computer vision. Different solutions are used for representing human actions; we prefer to use spatio-temporal interest points for motion descriptors. Besides, the space-time interest point feature space is considerably high-dimensional and it is hard to eliminate the curse of dimensionality with traditional similarity functions. We apply a matching based approach for high dimensional feature space that matches sequences to classify actions. © 2013 IEEE.