Browsing by Subject "Multiple instance learning"
Now showing 1 - 9 of 9
Results Per Page
Sort Options
Item Open Access Ensemble of multiple instance classifiers for image re-ranking(Elsevier Ltd, 2014) Sener F.; Ikizler-Cinbis, N.Text-based image retrieval may perform poorly due to the irrelevant and/or incomplete text surrounding the images in the web pages. In such situations, visual content of the images can be leveraged to improve the image ranking performance. In this paper, we look into this problem of image re-ranking and propose a system that automatically constructs multiple candidate "multi-instance bags (MI-bags)", which are likely to contain relevant images. These automatically constructed bags are then utilized by ensembles of Multiple Instance Learning (MIL) classifiers and the images are re-ranked according to the final classification responses. Our method is unsupervised in the sense that, the only input to the system is the text query itself, without any user feedback or annotation. The experimental results demonstrate that constructing multiple instance bags based on the retrieval order and utilizing ensembles of MIL classifiers greatly enhance the retrieval performance, achieving on par or better results compared to the state-of-the-art. © 2014 Elsevier B.V.Item Open Access Facial analysis of dyadic interactions using multiple instance learning(Bilkent University, 2021-09) Giritlioğlu, DersuInterpretation of nonverbal behavior is vital for a reliable analysis of social interactions. To this end, we automatically analyze facial expressions of romantic couples during their dyadic interactions, for the first time in the literature. We use a recently collected romantic relationship dataset, including videos of 167 couples while talking on a conflicting case and a positive experience they share. To distin-guish between interactions during positive experience and conflicting discussions, we model facial expressions employing a deep multiple instance learning (MIL) framework, adapted from the anomaly detection literature. Spatio-temporal rep-resentation of facial behavior is obtained from short video segments through a 3D residual network and used as the instances in MIL bag formations. The goal is to detect conflicting sessions by revealing distinctive facial cues that are displayed in short periods. To this end, instance representations of positive experience and conflict sessions are further optimized, so as to be more separable using deep met-ric learning. In addition, for a more reliable analysis of dyadic interaction, facial expressions of both subjects in the interaction are analyzed in a joint manner. Our experiments show that the proposed approach reaches an accuracy of 71%. In addition to providing comparisons to several baseline models, we have also conducted a human evaluation study for the same task, employing 6 participants. The proposed approach performs 5% more accurately than humans as well as outperforming all baseline models. As suggested by the experimental results, reliable modeling of facial behavior can greatly contribute to the analysis of dyadic interactions, yielding a better performance than that of humans.Item Open Access Görsel arama sonuçlarının çoklu örnekle öğrenme yöntemiyle yeniden sıralanması(IEEE, 2012-04) Şener, Fadime; Cinbiş, N. I.; Duygulu-Şahin, PınarBu çalışmada, çoklu öğrenme yöntemi ile metin tabanlı arama motorlarından elde edilen görsel sorgu sonuçlarını iyileştirmek için geliştirilmiş olan, zayıf denetimli öğrenen bir yöntem sunulmaktadır. Bu yöntemde arama motorundan dönen sonuçlar zayıf pozitif kabul edilerek, sorgu kategorisinden görüntü içermeyen negatif görüntüler de kullanılarak; çoklu örnekle öğrenme için torbalar oluşturulmaktadır. Bu torbalar ve veri kümesindeki örnekler arasında kurulan torba-örnek benzerliğinden yararlanarak; torbalar yeni bir örnek uzayına taşınmakta ve problem klasik bir denetimli öğrenme problemi haline getirilmektedir. Daha sonra, lineer destek vektör makinesi (DVM) kullanılarak her sorgu için sınıflandırma modelleri oluşturulmaktadır. Elde edilen sınıflandırma değerlerine göre görseller yeniden sıralanmış ve arama motorundan gelen sonuçların iyileştirildiği görülmüştür. Bu çerçevede, torba boyları arasında bir örüntü bulmak için yaptığımız deneyleri sunmaktayız. In this study, we propose a weakly-supervised multiple instance learning (MIL) method to improve the results of text-based image search engines. In this approach, ranked image list of search engine for a keyword query is treated as weak-positive input data, and with additional negative input data, multiple instance learning bags are constructed. Then, Multiple Instance problem is converted to a standard supervised learning problem by mapping each bag into a feature space defined by instances in training bags using a bag-instance similarity measure. At the end, linear SVM is used to construct a classifier to re-rank keyword-based image search data. Based on the classification scores, we re-rank the images and improve precision over the search engine results. In this respect, we also present our experiments conducted to find a pattern for multiple instance bag sizes to obtain better average precision. © 2012 IEEE.Item Open Access Multi-channel TDMA scheduling in wireless sensor networks(Bilkent University, 2013) Uyanık, ÖzgeThe Multiple Instance Learning (MIL) paradigm arises to be useful in many application domains, whereas it is particularly suitable for computer vision problems due to the difficulty of obtaining manual labeling. Multiple Instance Learning methods have large applicability to a variety of challenging learning problems in computer vision, including object recognition and detection, tracking, image classification, scene classification and more. As opposed to working with single instances as in standard supervised learning, Multiple Instance Learning operates over bags of instances. A bag is labeled as positive if it is known to contain at least one positive instance; otherwise it is labeled as negative. The overall learning task is to learn a model for some concept using a training set that is formed of bags. A vital component of using Multiple Instance Learning in computer vision is its design for abstracting the visual problem to multi-instance representation, which involves determining what the bag is and what are the instances in the bag. In this context, we consider three different computer vision problems and propose solutions for each of them via novel representations. The first problem is image retrieval and re-ranking; we propose a method that automatically constructs multiple candidate Multi-instance bags, which are likely to contain relevant images. The second problem we look into is recognizing actions from still images, where we extract several candidate object regions and approach the problem of identifying related objects from a weakly supervised point of view. Finally, we address the recognition of human interactions in videos within a MIL framework. In human interaction recognition, videos may be composed of frames of different activities, and the task is to identify the interaction in spite of irrelevant activities that are scattered through the video. To overcome this problem, we use the idea of Multiple Instance Learning to tackle irrelevant actions in the whole video sequence classification. Each of the outlined problems are tested on benchmark datasets of the problems and compared with the state-of-the-art. The experimental results verify the advantages of the proposed MIL approaches to these vision problems.Item Open Access On recognizing actions in still images via multiple features(Springer, Berlin, Heidelberg, 2012) Şener, Fadime; Bas, C.; Ikizler-Cinbis, N.We propose a multi-cue based approach for recognizing human actions in still images, where relevant object regions are discovered and utilized in a weakly supervised manner. Our approach does not require any explicitly trained object detector or part/attribute annotation. Instead, a multiple instance learning approach is used over sets of object hypotheses in order to represent objects relevant to the actions. We test our method on the extensive Stanford 40 Actions dataset [1] and achieve significant performance gain compared to the state-of-the-art. Our results show that using multiple object hypotheses within multiple instance learning is effective for human action recognition in still images and such an object representation is suitable for using in conjunction with other visual features. © 2012 Springer-Verlag.Item Open Access Recognizing human actions from noisy videos via multiple instance learning(IEEE, 2013) şener, Fadime; Samet, Nermin; Duygulu, Pınar; Ikizler-Cinbis, N.In this work, we study the task of recognizing human actions from noisy videos and effects of noise to recognition performance and propose a possible solution. Datasets available in computer vision literature are relatively small and could include noise due to labeling source. For new and relatively big datasets, noise amount would possible increase and the performance of traditional instance based learning methods is likely to decrease. In this work, we propose a multiple instance learning-based solution in case of an increase in noise. For this purpose, each video is represented with spatio-temporal features, then bag-of-words method is applied. Then, using support vector machines (SVM), both instance-based learning and multiple instance learning classifiers are constructed and compared. The classification results show that multiple instance learning classifiers has better performance than instance based learning counterparts on noisy videos. © 2013 IEEE.Item Open Access Two-person interaction recognition via spatial multiple instance embedding(Academic Press Inc., 2015) Sener F.; Ikizler-Cinbis, N.Abstract In this work, we look into the problem of recognizing two-person interactions in videos. Our method integrates multiple visual features in a weakly supervised manner by utilizing an embedding-based multiple instance learning framework. In our proposed method, first, several visual features that capture the shape and motion of the interacting people are extracted from each detected person region in a video. Then, two-person visual descriptors are formed. Since the relative spatial locations of interacting people are likely to complement the visual descriptors, we propose to use spatial multiple instance embedding, which implicitly incorporates the distances between people into the multiple instance learning process. Experimental results on two benchmark datasets validate that using two-person visual descriptors together with spatial multiple instance learning offers an effective way for inferring the type of the interaction. © 2015 Elsevier Inc.Item Open Access Utilizing multiple instance learning for computer vision tasks(Bilkent University, 2013) Şener, FadimeThe Multiple Instance Learning (MIL) paradigm arises to be useful in many application domains, whereas it is particularly suitable for computer vision problems due to the difficulty of obtaining manual labeling. Multiple Instance Learning methods have large applicability to a variety of challenging learning problems in computer vision, including object recognition and detection, tracking, image classification, scene classification and more. As opposed to working with single instances as in standard supervised learning, Multiple Instance Learning operates over bags of instances. A bag is labeled as positive if it is known to contain at least one positive instance; otherwise it is labeled as negative. The overall learning task is to learn a model for some concept using a training set that is formed of bags. A vital component of using Multiple Instance Learning in computer vision is its design for abstracting the visual problem to multi-instance representation, which involves determining what the bag is and what are the instances in the bag. In this context, we consider three different computer vision problems and propose solutions for each of them via novel representations. The first problem is image retrieval and re-ranking; we propose a method that automatically constructs multiple candidate Multi-instance bags, which are likely to contain relevant images. The second problem we look into is recognizing actions from still images, where we extract several candidate object regions and approach the problem of identifying related objects from a weakly supervised point of view. Finally, we address the recognition of human interactions in videos within a MIL framework. In human interaction recognition, videos may be composed of frames of different activities, and the task is to identify the interaction in spite of irrelevant activities that are scattered through the video. To overcome this problem, we use the idea of Multiple Instance Learning to tackle irrelevant actions in the whole video sequence classification. Each of the outlined problems are tested on benchmark datasets of the problems and compared with the state-of-the-art. The experimental results verify the advantages of the proposed MIL approaches to these vision problems.Item Open Access Weakly supervised object localization with multi-fold multiple instance learning(IEEE Computer Society, 2017) Cinbis, R. G.; Verbeek, J.; Schmid, C.Object category localization is a challenging problem in computer vision. Standard supervised training requires bounding box annotations of object instances. This time-consuming annotation process is sidestepped in weakly supervised learning. In this case, the supervised information is restricted to binary labels that indicate the absence/presence of object instances in the image, without their locations. We follow a multiple-instance learning approach that iteratively trains the detector and infers the object locations in the positive training images. Our main contribution is a multi-fold multiple instance learning procedure, which prevents training from prematurely locking onto erroneous object locations. This procedure is particularly important when using high-dimensional representations, such as Fisher vectors and convolutional neural network features. We also propose a window refinement method, which improves the localization accuracy by incorporating an objectness prior. We present a detailed experimental evaluation using the PASCAL VOC 2007 dataset, which verifies the effectiveness of our approach. © 2016 IEEE.