Utilizing multiple instance learning for computer vision tasks

Date

2013

Editor(s)

Advisor

Şahin, Pınar Duygulu

Supervisor

Co-Advisor

Co-Supervisor

Instructor

Source Title

Print ISSN

Electronic ISSN

Publisher

Volume

Issue

Pages

Language

English

Type

Journal Title

Journal ISSN

Volume Title

Attention Stats
Usage Stats
2
views
5
downloads

Series

Abstract

The Multiple Instance Learning (MIL) paradigm arises to be useful in many application domains, whereas it is particularly suitable for computer vision problems due to the difficulty of obtaining manual labeling. Multiple Instance Learning methods have large applicability to a variety of challenging learning problems in computer vision, including object recognition and detection, tracking, image classification, scene classification and more. As opposed to working with single instances as in standard supervised learning, Multiple Instance Learning operates over bags of instances. A bag is labeled as positive if it is known to contain at least one positive instance; otherwise it is labeled as negative. The overall learning task is to learn a model for some concept using a training set that is formed of bags. A vital component of using Multiple Instance Learning in computer vision is its design for abstracting the visual problem to multi-instance representation, which involves determining what the bag is and what are the instances in the bag. In this context, we consider three different computer vision problems and propose solutions for each of them via novel representations. The first problem is image retrieval and re-ranking; we propose a method that automatically constructs multiple candidate Multi-instance bags, which are likely to contain relevant images. The second problem we look into is recognizing actions from still images, where we extract several candidate object regions and approach the problem of identifying related objects from a weakly supervised point of view. Finally, we address the recognition of human interactions in videos within a MIL framework. In human interaction recognition, videos may be composed of frames of different activities, and the task is to identify the interaction in spite of irrelevant activities that are scattered through the video. To overcome this problem, we use the idea of Multiple Instance Learning to tackle irrelevant actions in the whole video sequence classification. Each of the outlined problems are tested on benchmark datasets of the problems and compared with the state-of-the-art. The experimental results verify the advantages of the proposed MIL approaches to these vision problems.

Course

Other identifiers

Book Title

Degree Discipline

Computer Engineering

Degree Level

Master's

Degree Name

MS (Master of Science)

Citation

Published Version (Please cite this version)