Prototypes : exemplar based video representation
Item Usage Stats
Recognition of actions from videos is a widely studied problem and there have been many solutions introduced over the years. Labeling of the training data that is required for classification has been an important bottleneck for scalability of these methods. On the other hand, utilization of large number of weakly-labeled web data continues to be a challenge due to the noisy content of the videos. In this study, we tackle the problem of eliminating irrelevant videos through pruning the collection and discovering the most representative elements. Motivated by the success of methods that discover the discriminative parts for image classification, we propose a novel video representation method that is based on selected distinctive exemplars. We call these discriminative exemplars as “prototypes” which are chosen from each action class separately to be representative for the class of interest. Then, we use these prototypes to describe the entire dataset. Following the traditional supervised classification methods and utilizing the available state-of-the-art low and deep-level features, we show that even with simple selection and representation methods, use of prototypes can increase the recognition performance. Moreover, by reducing the training data to the selected prototypes only, we show that less number of carefully selected examples could achieve the performance of a larger training data. In addition to prototypes, we explore the effect of irrelevant data elimination in action recognition and give the experimental results which are comparable to or better than the state-of-the-art studies on benchmark video datasets UCF-101 and ActivityNet.
Iterative noisy data elimination