Bilkent Repository :: Browsing by Subject "Image retrieval"

Browsing by Subject "Image retrieval"

Now showing 1 - 20 of 36

Open Access
Authorship attribution: performance of various features and classification methods
(IEEE, 2007-11) Bozkurt, İlker Nadi; Bağlıoğlu, Özgür; Uyar, Erkan
Authorship attribution is the process of determining the writer of a document. In literature, there are lots of classification techniques conducted in this process. In this paper we explore information retrieval methods such as tf-idf structure with support vector machines, parametric and nonparametric methods with supervised and unsupervised (clustering) classification techniques in authorship attribution. We performed various experiments with articles gathered from Turkish newspaper Milliyet. We performed experiments on different features extracted from these texts with different classifiers, and combined these results to improve our success rates. We identified which classifiers give satisfactory results on which feature sets. According to experiments, the success rates dramatically changes with different combinations, however the best among them are support vector classifier with bag of words, and Gaussian with function words. ©2007 IEEE.
Open Access
Bilkent University at TRECVID 2005
(National Institute of Standards and Technology, 2005-11) Aksoy, Selim; Avcı, Akın; Balçık, Erman; Çavuş, Özge; Duygulu, Pınar; Karaman, Zeynep; Kavak, Pınar; Kaynak, Cihan; Küçükayvaz, Emre; Öcalan, Çağdaş; Yıldız, Pınar
We describe our second-time participation, that includes one high-level feature extraction run, and three manual and one interactive search runs, to the TRECVID video retrieval evaluation. All of these runs have used a system trained on the common development collection. Only visual and textual information were used where visual information consisted of color, texture and edgebased low-level features and textual information consisted of the speech transcript provided in the collection. With the experience gained with our second-time participation, we are in the process of building a system for automatic classification and indexing of video archives.
Open Access
Bilkent university at TRECVID 2006
(National Institute of Standards and Technology, 2006-11) Aksoy, Selim; Duygulu, Pınar; Akçay, Hüseyin Gökhan; Ataer, Esra; Baştan, Muhammet; Can, Tolga; Çavuş, Özge; Doǧgrusöz, Emel; Gökalp, Demir; Akaydın, Ateş; Akoǧlu, Leman; Angın, Pelin; Cinbiş, R. Gökberk; Gür, Tunay; Ünlü, Mehmet
We describe our third participation, that includes one high-level feature extraction run, and two manual and one interactive search runs, to the TRECVID video retrieval evaluation. All of these runs have used a system trained on the common development collection. Only visual and textual information were used where visual information consisted of color, texture and edge-based low-level features and textual information consisted of the speech transcript provided in the collection.
Open Access
Bilkent University at TRECVID 2007
(National Institute of Standards and Technology, 2007) Aksoy, Selim; Duygulu, Pınar; Aksoy, C.; Aydin, E.; Gunaydin, D.; Hadimli, K.; Koç L.; Olgun, Y.; Orhan, C.; Yakin G.
We describe our fourth participation, that includes two high-level feature extraction runs, and one manual search run, to the TRECVID video retrieval evaluation. All of these runs have used a system trained on the common development collection. Only visual information, consisting of color, texture and edge-based low-level features, was used.
Open Access
COST292 experimental framework for TRECVID 2006
(National Institute of Standards and Technology, 2006) Ćalić J.; Krämer P.; Naci, U.; Vrochidis, S.; Aksoy, S.; Zhangk Q.; Benois-Pineau J.; Saracoglu, A.; Doulaverakis, C.; Jarina, R.; Campbell, N.; Mezaris V.; Kompatsiaris I.; Spyrou, E.; Koumoulos G.; Avrithis, Y.; Dalkilic, A.; Alatan, A.; Hanjalic, A.; Izquierdo, E.
In this paper we give an overview of the four TRECVID tasks submitted by COST292, European network of institutions in the area of semantic multimodal analysis and retrieval of digital video media. Initially, we present shot boundary evaluation method based on results merged using a confidence measure. The two SB detectors user here are presented, one of the Technical University of Delft and one of the LaBRI, University of Bordeaux 1, followed by the description of the merging algorithm. The high-level feature extraction task comprises three separate systems. The first system, developed by the National Technical University of Athens (NTUA) utilises a set of MPEG-7 low-level descriptors and Latent Semantic Analysis to detect the features. The second system, developed by Bilkent University, uses a Bayesian classifier trained with a "bag of subregions" for each keyframe. The third system by the Middle East Technical University (METU) exploits textual information in the video using character recognition methodology. The system submitted to the search task is an interactive retrieval application developed by Queen Mary, University of London, University of Zilina and ITI from Thessaloniki, combining basic retrieval functionalities in various modalities (i.e. visual, audio, textual) with a user interface supporting the submission of queries using any combination of the available retrieval tools and the accumulation of relevant retrieval results over all queries submitted by a single user during a specified time interval. Finally, the rushes task submission comprises a video summarisation and browsing system specifically designed to intuitively and efficiently presents rushes material in video production environment. This system is a result of joint work of University of Bristol, Technical University of Delft and LaBRI, University of Bordeaux 1.
Open Access
The COST292 experimental framework for TRECVID 2007
(National Institute of Standards and Technology, 2007) Zhang, Q.; Corvaglia, M.; Aksoy, Selim; Naci, U.; Adami, N.; Aginako, N.; Alatan, A.; Alexandre, L. A.; Almeida, P.; Avrithis, Y.; Benois-Pineau, J.; Chandramouli, K.; Damnjanovic, U.; Esen, E.; Goya, J.; Grzegorzek, M.; Hanjalic, A.; Izquierdo, E.; Jarina, R.; Kapsalas, P.; Kompatsiaris, I.; Kuba, M.; Leonardi, R.; Makris, L.; Mansencal, B.; Mezaris, V.; Moumtzidou, A.; Mylonas, P.; Nikolopoulos, S.; Piatrik, T.; Pinheiro, A. M. G.; Reljin, B.; Spyrou, E.; Tolias, G.; Vrochidis, S.; Yakın, G.; Zajic, G.
In this paper, we give an overview of the four tasks submitted to TRECVID 2007 by COST292. In shot boundary (SB) detection task, four SB detectors have been developed and the results are merged using two merging algorithms. The framework developed for the high-level feature extraction task comprises four systems. The first system transforms a set of low-level descriptors into the semantic space using Latent Semantic Analysis and utilises neural networks for feature detection. The second system uses a Bayesian classifier trained with a "bag of subregions". The third system uses a multi-modal classifier based on SVMs and several descriptors. The fourth system uses two image classifiers based on ant colony optimisation and particle swarm optimisation respectively. The system submitted to the search task is an interactive retrieval application combining retrieval functionalities in various modalities with a user interface supporting automatic and interactive search over all queries submitted. Finally, the rushes task submission is based on a video summarisation and browsing system comprising two different interest curve algorithms and three features.
Open Access
El yazısı belgelerde kelime tabanlı arama
(IEEE, 2008-04) Can, Ethem F.; Duygulu, Pınar
Bu çalışmada el yazısı belgelerde arama yapabilmek için yeni yöntemler önerilmiştir. Bu çalışmadaki en temel varsayım ve yola çıkış noktası; her bir kelimenin resim gibi ele alınabileceği ve dolayısıyla resim arama teknikleri ile sorgulama yapılabileceğidir. Özel olarak resim üzerindeki kenar noktalarının eğimlerinin yönlerinin dağılımı ve korelasyon katsayısı tabanlı iki yöntem önerilmiş, ayrıca bu iki yöntemin nasıl birleştirilebileceği anlatılmıştır. Deneyler George Washington'un el yazmaları veri kümesi üzerinde yapılmıştır. We present new methods to retrieve words in historical handwritten documents. With the assumption that the words can be seen as images, we used the word spotting idea and search for the words in the documents using image retrieval techniques. Specifically, we proposed two methods, one based on the histogram of gradient orientations and one based on the correlation coefficient. We also proposed a new method by combining these two methods. In the experiments the data set consisting of George Washington's handwritings is used. ©2008 IEEE.
Open Access
Ensemble of multiple instance classifiers for image re-ranking
(Elsevier Ltd, 2014) Sener F.; Ikizler-Cinbis, N.
Text-based image retrieval may perform poorly due to the irrelevant and/or incomplete text surrounding the images in the web pages. In such situations, visual content of the images can be leveraged to improve the image ranking performance. In this paper, we look into this problem of image re-ranking and propose a system that automatically constructs multiple candidate "multi-instance bags (MI-bags)", which are likely to contain relevant images. These automatically constructed bags are then utilized by ensembles of Multiple Instance Learning (MIL) classifiers and the images are re-ranked according to the final classification responses. Our method is unsupervised in the sense that, the only input to the system is the text query itself, without any user feedback or annotation. The experimental results demonstrate that constructing multiple instance bags based on the retrieval order and utilizing ensembles of MIL classifiers greatly enhance the retrieval performance, achieving on par or better results compared to the state-of-the-art. © 2014 Elsevier B.V.
Open Access
Fuzzy color histogram-based video segmentation
(Academic Press, 2010) Küçüktunç, O.; Güdükbay, Uğur; Ulusoy, Özgür
We present a fuzzy color histogram-based shot-boundary detection algorithm specialized for content-based copy detection applications. The proposed method aims to detect both cuts and gradual transitions (fade, dissolve) effectively in videos where heavy transformations (such as cam-cording, insertions of patterns, strong re-encoding) occur. Along with the color histogram generated with the fuzzy linking method on L*a*b* color space, the system extracts a mask for still regions and the window of picture-in-picture transformation for each detected shot, which will be useful in a content-based copy detection system. Experimental results show that our method effectively detects shot boundaries and reduces false alarms as compared to the state-of-the-art shot-boundary detection algorithms. © 2009 Elsevier Inc. All rights reserved.
Open Access
GCap: Graph-based automatic image captioning
(IEEE, 2004) Pan J.-Y.; Yang H.-J.; Faloutsos C.; Duygulu, Pınar
Given an image, how do we automatically assign keywords to it? In this paper, we propose a novel, graph-based approach (GCap) which outperforms previously reported methods for automatic image captioning. Moreover, it is fast and scales well, with its training and testing time linear to the data set size. We report auto-captioning experiments on the "standard" Corel image database of 680 MBytes, where GCap outperforms recent, successful auto-captioning methods by up to 10 percentage points in captioning accuracy (50% relative improvement). © 2004 IEEE.
Open Access
A histogram-based approach for object-based query-by-shape-and-color in image and video databases
(Elsevier, 2005) Şaykol, E.; Güdükbay, Uğur; Ulusoy, Özgür
Considering the fact that querying by low-level object features is essential in image and video data, an efficient approach for querying and retrieval by shape and color is proposed. The approach employs three specialized histograms, (i.e. distance, angle, and color histograms) to store feature-based information that is extracted from objects. The objects can be extracted from images or video frames. The proposed histogram-based approach is used as a component in the query-by-feature subsystem of a video database management system. The color and shape information is handled together to enrich the querying capabilities for content-based retrieval. The evaluation of the retrieval effectiveness and the robustness of the proposed approach is presented via performance experiments. © 2005 Elsevier Ltd. All rights reserved.
Open Access
İçerik tabanlı görüntü erişimi için sahne sınıflandırması
(IEEE, 2008-04) Çavuş, Özge; Aksoy, Selim
Son yıllarda çok geniş veri tabanlarının kullanımıyla birlikte içerik tabanlı görüntü indekslemesi ve erişimi önemli bir araştırma konusu halini almıştır. Bu çalışmada, görüntü indekslemesi için sahne sınıflandırmasını baz alan bir görüntü erişim sistemi tanımlanmıştır. Görüntülerden çıkarılan alt düzey öznitelikler görüntü indekslemesinde doğrudan kullanılmak yerine, bu öznitelikler sahne sınıflandırması için kullanılmış ve görüntüler sınıflandırma sonucunda elde edilen anlamsal sınıf bilgileriyle indekslenmiştir. Sahne sınıflandırması için “kelime kümesi” (bag of words) dokuman analizi yöntemi olarak bilinen tekniğin bir uyarlaması kullanılmıştır. Görüntü erişim sistemini insan algısıyla desteklemek ve anlambilimsel uçurumu en aza indirgemek için erişim senaryosuna tek sınıf sınıflandırıcı bazlı ilgililik geri beslemesi eklenmiştir. Bunun için, ilgili görüntüleri çok iyi modelleyen, ilgili olmayan görüntülerden de bir o kadar uzak duran bir hiperkure oluşturan destek vektör veri tanımlaması kullanılmıştır. Önerilen yöntemler Corel veri kümesinde denenmiş ve başarılı sonuçlar elde edilmiştir. Content-based image indexing and retrieval have become important research problems with the use of large databases in a wide range of areas. In this study, a content-based image retrieval system that is based on scene classification for image indexing is proposed. Instead of using low-level features directly, semantic class information that is obtained as a result of scene classification is used during indexing. The traditional "bag of words" approach is modified for classifying the scenes. In order to minimize the semantic gap, a relevance feedback approach that is based on one-class classification is also integrated. The support vector data description is used for learning during feedback iterations. The experiments using the Corel data set show good results for both classification and retrieval. ©2008 IEEE.
Open Access
İki durumlu bir beyin bilgisayar arayüzünde özellik çıkarımı ve sınıflandırma
(IEEE, 2017-10) Altındiş, Fatih; Yılmaz, B.
Beyin bilgisayar arayüzü (BBA) teknolojisi motor nöronlarının özelliğini kaybeden ve hareket kabiliyeti kısıtlanmış ALS ve felçli hastalar gibi birçok kişinin dış dünya ile iletişimini sağlamaya yönelik kullanılmaktadır. Bu çalışmada, Avusturya’daki Graz Üniversitesi’nde alınmış EEG veri seti kullanılarak gerçek zamanlı EEG işleme simülasyonu ile motor hayal etme sınıflandırılması amaçlanmıştır. Bu veri setinde sağ el ya da sol elin hareket ettirilme hayali esnasında 8 kişiden alınmış iki kanallı EEG sinyalleri bulunmaktadır. Her katılımcıdan 60 sağ ve 60 sol olmak üzere toplamda 120 adet yaklaşık 9 saniyelik motor hayal etme deneme sinyali kayıt edilmiştir. Bu sinyaller filtrelemeye tabi tutulmuştur. Yirmi dört, 32 ve 40 elemanlı özellik vektörü bant geçiren filtreler kullanarak elde edilen göreceli güç değişim değerleridir (GGDD). Bu çalışmada, lineer diskriminant analizi (LDA), k en yakın komşular (KNN) ve destek vektör makinaları (SVM) ile sınıflandırma yapılmış, en iyi sınıflandırma performansının 24 değerli özellik vektörüyle ve LDA sınıflandırma yöntemiyle elde edildiği gösterilmiştir.
Open Access
Image information mining using spatial relationship constraints
(Bilkent University, 2012) Karakuş, Fatih
There is a huge amount of data which is collected from the Earth observation satellites and they are continuously sending data to Earth receiving stations day by day. Therefore, mining of those data becomes more important for effective processing of collected multi-spectral images. The most popular approaches for this problem use the meta-data of the images such as geographical coordinates etc. However, these approaches do not offer a good solution for determining what those images contain. Some researches make a big step from the meta-data based approaches in this area by moving the focus of the study to content based approaches such as utilizing the region information of the sensed images. In this thesis, we propose a novel, generic and extendable image information mining system that uses spatial relationship constraints. In this system, we use not only the region content, but also relationships of those regions. First, we extract the region information of the images and then extract pairwise relationship information of those regions such as left, right, above, below, near, far and distance etc. This feature extraction process is defined as a generic process which is independent from how the region segmentation is obtained. In addition to these, since new features and new approaches are continuously being developed by the image information mining researchers, extendability feature of the our system plays a big role while we are designing our system. In this thesis, we also propose a novel feature vector structure in which a feature vector consists of several sub-feature vectors. In the proposed feature vector structure, each sub-feature vector can be exclusively selected to be used for search process and they can have different distance metrics to be used in comparisons between the same sub-feature vector of the other feature vector structures. Therefore, the system gives ability to users to choose which information about the region and its pairwise relationship with other regions to be used when they perform a search on the system. The proposed system is illustrated by using region based retrieval scenarios on very high spatial resolution satellite images.
Open Access
Image mining using directional spatial constraints
(Institute of Electrical and Electronics Engineers, 2010-01) Aksoy, S.; Cinbiş, R. G.
Spatial information plays a fundamental role in building high-level content models for supporting analysts' interpretations and automating geospatial intelligence. We describe a framework for modeling directional spatial relationships among objects and using this information for contextual classification and retrieval. The proposed model first identifies image areas that have a high degree of satisfaction of a spatial relation with respect to several reference objects. Then, this information is incorporated into the Bayesian decision rule as spatial priors for contextual classification. The model also supports dynamic queries by using directional relationships as spatial constraints to enable object detection based on the properties of individual objects as well as their spatial relationships to other objects. Comparative experiments using high-resolution satellite imagery illustrate the flexibility and effectiveness of the proposed framework in image mining with significant improvements in both classification and retrieval performance.
Open Access
Interactive training of advanced classifiers for mining remote sensing image archives
(ACM, 2004) Aksoy, Selim; Koperski, K.; Tusk, C.; Marchisio G.
Advances in satellite technology and availability of down-loaded images constantly increase the sizes of remote sensing image archives. Automatic content extraction, classification and content-based retrieval have become highly desired goals for the development of intelligent remote sensing databases. The common approach for mining these databases uses rules created by analysts. However, incorporating GIS information and human expert knowledge with digital image processing improves remote sensing image analysis. We developed a system that uses decision tree classifiers for interactive learning of land cover models and mining of image archives. Decision trees provide a promising solution for this problem because they can operate on both numerical (continuous) and categorical (discrete) data sources, and they do not require any assumptions about neither the distributions nor the independence of attribute values. This is especially important for the fusion of measurements from different sources like spectral data, DEM data and other ancillary GIS data. Furthermore, using surrogate splits provides the capability of dealing with missing data during both training and classification, and enables handling instrument malfunctions or the cases where one or more measurements do not exist for some locations. Quantitative and qualitative performance evaluation showed that decision trees provide powerful tools for modeling both pixel and region contents of images and mining of remote sensing image archives.
Open Access
Learning bayesian classifiers for scene classification with a visual grammar
(IEEE, 2005) Aksoy, Selim; Koperski, K.; Tusk, C.; Marchisio, G.; Tilton J. C.
A challenging problem in image content extraction and classification is building a system that automatically learns high-level semantic interpretations of images. We describe a Bayesian framework for a visual grammar that aims to reduce the gap between low-level features and high-level user semantics. Our approach includes modeling image pixels using automatic fusion of their spectral, textural, and other ancillary attributes; segmentation of image regions using an iterative split-and-merge algorithm; and representing scenes by decomposing them into prototype regions and modeling the interactions between these regions in terms of their spatial relationships. Naive Bayes classifiers are used in the learning of models for region segmentation and classification using positive and negative examples for user-defined semantic land cover labels. The system also automatically learns representative region groups that can distinguish different scenes and builds visual grammar models. Experiments using Landsat scenes show that the visual grammar enables creation of high-level classes that cannot be modeled by individual pixels or regions. Furthermore, learning of the classifiers requires only a few training examples. © 2005 IEEE.
Open Access
Learning bayesian classifiers for scene classification with a visual grammar
(IEEE, 2005-03) Aksoy, Selim; Koperski, K.; Tusk, C.; Marchisio, G.; Tilton, J. C.
A challenging problem in image content extraction and classification is building a system that automatically learns high-level semantic interpretations of images. We describe a Bayesian framework for a visual grammar that aims to reduce the gap between low-level features and high-level user semantics. Our approach includes modeling image pixels using automatic fusion of their spectral, textural, and other ancillary attributes; segmentation of image regions using an iterative split-and-merge algorithm; and representing scenes by decomposing them into prototype regions and modeling the interactions between these regions in terms of their spatial relationships. Naive Bayes classifiers are used in the learning of models for region segmentation and classification using positive and negative examples for user-defined semantic land cover labels. The system also automatically learns representative region groups that can distinguish different scenes and builds visual grammar models. Experiments using Landsat scenes show that the visual grammar enables creation of high-level classes that cannot be modeled by individual pixels or regions. Furthermore, learning of the classifiers requires only a few training examples.
Open Access
Learning visual similarity for image retrieval with global descriptors and capsule networks
(Springer, 2023-07-31) Durmuş, Duygu; Güdükbay, Uğur; Ulusoy, Özgür
Finding matching images across large and unstructured datasets is vital in many computer vision applications. With the emergence of deep learning-based solutions, various visual tasks, such as image retrieval, have been successfully addressed. Learning visual similarity is crucial for image matching and retrieval tasks. Capsule Networks enable learning richer information that describes the object without losing the essential spatial relationship between the object and its parts. Besides, global descriptors are widely used for representing images. We propose a framework that combines the power of global descriptors and Capsule Networks by benefiting from the information of multiple views of images to enhance the image retrieval performance. The Spatial Grouping Enhance strategy, which enhances sub-features parallelly, and self-attention layers, which explore global dependencies within internal representations of images, are utilized to empower the image representations. The approach captures resemblances between similar images and differences between non-similar images using triplet loss and cost-sensitive regularized cross-entropy loss. The results are superior to the state-of-the-art approaches for the Stanford Online Products Database with Recall@K of 85.0, 94.4, 97.8, and 99.3, where K is 1, 10, 100, and 1000, respectively.
Open Access
Learning visual similarity for image retrieval with global descriptors and capsule networks
(Bilkent University, 2021-07) Durmuş, Duygu
Finding matching images across large and unstructured datasets plays an im-portant role in many computer vision applications. With the emergence of deep learning-based solutions, various visual tasks such as image retrieval have been successfully addressed. Learning visual similarity is crucial for image matching and retrieval tasks. An alternative deep learning architecture, named capsule networks, enables learning richer information that describes the object without losing the essential spatial relationship between the object and its parts. Besides, global descriptors are widely used for representing images. The proposed architecture combines the power of global descriptors and revised capsule networks to enhance image retrieval performance. It benefits from multi-ple views of object images and highlights the spatial relationship between objects and their parts. Spatial Grouping Enhance strategy, which enhances sub-features parallelly, and self-attention layers, which explore global dependencies within in-ternal representations of images, are utilized to empower the image representa-tions. The approach captures resemblances between similar images and di˙er-ences between the non-similar images using both triplet loss and cost-sensitive regularized cross-entropy loss instead of learning classification for individual im-ages. Based on the experiments, the results are superior to the state-of-the-art approaches for Stanford Online Products.