Histogram of oriented rectangles: a new pose descriptor for human action recognition
Author
İkizler, N.
Duygulu, P.
Date
2009-09-02Source Title
Image and Vision Computing
Print ISSN
0262-8856
Publisher
Elsevier BV
Volume
27
Issue
10
Pages
1515 - 1526
Language
English
Type
ArticleItem Usage Stats
127
views
views
165
downloads
downloads
Abstract
Most of the approaches to human action recognition tend to form complex models which require lots of parameter estimation and computation time. In this study, we show that, human actions can be simply represented by pose without dealing with the complex representation of dynamics. Based on this idea, we propose a novel pose descriptor which we name as Histogram-of-Oriented-Rectangles (HOR) for representing and recognizing human actions in videos. We represent each human pose in an action sequence by oriented rectangular patches extracted over the human silhouette. We then form spatial oriented histograms to represent the distribution of these rectangular patches. We make use of several matching strategies to carry the information from the spatial domain described by the HOR descriptor to temporal domain. These are (i) nearest neighbor classification, which recognizes the actions by matching the descriptors of each frame, (ii) global histogramming, which extends the idea of Motion Energy Image proposed by Bobick and Davis to rectangular patches, (iii) a classifier-based approach using Support Vector Machines, and (iv) adaptation of Dynamic Time Warping on the temporal representation of the HOR descriptor. For the cases when pose descriptor is not sufficiently strong alone, such as to differentiate actions "jogging" and "running", we also incorporate a simple velocity descriptor as a prior to the pose based classification step. We test our system with different configurations and experiment on two commonly used action datasets: the Weizmann dataset and the KTH dataset. Results show that our method is superior to other methods on Weizmann dataset with a perfect accuracy rate of 100%, and is comparable to the other methods on KTH dataset with a very high success rate close to 90%. These results prove that with a simple and compact representation, we can achieve robust recognition of human actions, compared to complex representations. © 2009 Elsevier B.V. All rights reserved.
Keywords
Action recognitionHuman motion understanding
Pose descriptor
Accuracy rate
Action sequences
Compact representation
Complex model
Computation time
Data sets
Descriptor
Dynamic time warping
Histogramming
Human pose
Human silhouette
Human-action recognition
Motion energy
Nearest neighbor classification
Rectangular patch
Robust recognition
Spatial domains
Temporal domain
Temporal representations
Gesture recognition
Parameter estimation
Statistical tests
Human form models
Permalink
http://hdl.handle.net/11693/22633Published Version (Please cite this version)
http://dx.doi.org/10.1016/j.imavis.2009.02.002Collections
Related items
Showing items related by title, author, creator and subject.
-
Two-person interaction recognition via spatial multiple instance embedding
Sener F.; Ikizler-Cinbis, N. (Academic Press Inc., 2015)Abstract In this work, we look into the problem of recognizing two-person interactions in videos. Our method integrates multiple visual features in a weakly supervised manner by utilizing an embedding-based multiple instance ... -
Karşılıklı bilgi ölçütü kullanılarak giyilebilir hareket duyucu sinyallerinin aktivite tanıma amaçlı analizi
Dobrucalı, Oğuzcan; Barshan, Billur (IEEE, 2014-04)Giyilebilir hareket duyucuları ile insan aktivitelerinin saptanmasında, uygun duyucu yapılanışının seçimi önem taşıyan bir konudur. Bu konu, kullanılacak duyucuların sayısının, türünün, sabitlenecekleri konum ve yönelimin ... -
Recognizing human actions from noisy videos via multiple instance learning
şener, Fadime; Samet, Nermin; Duygulu, Pınar; Ikizler-Cinbis, N. (IEEE, 2013)In this work, we study the task of recognizing human actions from noisy videos and effects of noise to recognition performance and propose a possible solution. Datasets available in computer vision literature are relatively ...