Utilizing multiple instance learning for computer vision tasks
Şahin, Pınar Duygulu
Please cite this item using this persistent URLhttp://hdl.handle.net/11693/15890
The Multiple Instance Learning (MIL) paradigm arises to be useful in many application domains, whereas it is particularly suitable for computer vision problems due to the difficulty of obtaining manual labeling. Multiple Instance Learning methods have large applicability to a variety of challenging learning problems in computer vision, including object recognition and detection, tracking, image classification, scene classification and more. As opposed to working with single instances as in standard supervised learning, Multiple Instance Learning operates over bags of instances. A bag is labeled as positive if it is known to contain at least one positive instance; otherwise it is labeled as negative. The overall learning task is to learn a model for some concept using a training set that is formed of bags. A vital component of using Multiple Instance Learning in computer vision is its design for abstracting the visual problem to multi-instance representation, which involves determining what the bag is and what are the instances in the bag. In this context, we consider three different computer vision problems and propose solutions for each of them via novel representations. The first problem is image retrieval and re-ranking; we propose a method that automatically constructs multiple candidate Multi-instance bags, which are likely to contain relevant images. The second problem we look into is recognizing actions from still images, where we extract several candidate object regions and approach the problem of identifying related objects from a weakly supervised point of view. Finally, we address the recognition of human interactions in videos within a MIL framework. In human interaction recognition, videos may be composed of frames of different activities, and the task is to identify the interaction in spite of irrelevant activities that are scattered through the video. To overcome this problem, we use the idea of Multiple Instance Learning to tackle irrelevant actions in the whole video sequence classification. Each of the outlined problems are tested on benchmark datasets of the problems and compared with the state-of-the-art. The experimental results verify the advantages of the proposed MIL approaches to these vision problems.