Multiple view human activity recognition
Item Usage Stats
MetadataShow full item record
This thesis explores the human activity recognition problem when multiple views are available. We follow two main directions: we first present a system that performs volume matching using constructed 3D volumes from calibrated cameras, then we present a flexible system based on frame matching directly using multiple views. We examine the multiple view systems compared to single view systems, and measure the performance improvements in recognition using more views by various experiments. Initial part of the thesis introduces compact representations for volumetric data gained through reconstruction. The video frames recorded by many cameras with significant overlap are fused by reconstruction, and the reconstructed volumes are used as substitutes of action poses. We propose new pose descriptors over these three dimensional volumes. Our first descriptor is based on the histogram of oriented cylinders in various sizes and orientations. We then propose another descriptor which is view-independent, and which does not require pose alignment. We show the importance of discriminative pose representations within simpler activity classification schemes. Activity recognition framework based on volume matching presents promising results compared to the state-of-the-art. Volume reconstruction is one natural approach for multi camera data fusion, but there can be few cameras with overlapping views. In the second part of the thesis, we introduce an architecture that is adaptable to various number of cameras and features. The system collects and fuses activity judgments from cameras using a voting scheme. The architecture requires no camera calibration. Performance generally improves when there are more cameras and more features; training and test cameras do not need to overlap; camera drop in or drop out is handled easily with little penalty. Experiments support the performance penalties, and advantages for using multiple views versus single view.
Human activity recognition