Bilkent Repository :: Browsing by Subject "Indexing (of information)"

Browsing by Subject "Indexing (of information)"

Now showing 1 - 13 of 13

Open Access
Automatic image captioning
(2004) Pan J.-Y.; Yang H.-J.; Duygulu, Pınar; Faloutsos, C.
In this paper, we examine the problem of automatic image captioning. Given a training set of captioned images, we want to discover correlations between image features and keywords, so that we can automatically find good keywords for a new image. We experiment thoroughly with multiple design alternatives on large datasets of various content styles, and our proposed methods achieve up to a 45% relative improvement on captioning accuracy over the state of the art.
Open Access
Automatic performance evaluation of Web search engines
(Elsevier, 2004) Can, F.; Nuray, R.; Sevdik, A. B.
Measuring the information retrieval effectiveness of World Wide Web search engines is costly because of human relevance judgments involved. However, both for business enterprises and people it is important to know the most effective Web search engines, since such search engines help their users find higher number of relevant Web pages with less effort. Furthermore, this information can be used for several practical purposes. In this study we introduce automatic Web search engine evaluation method as an efficient and effective assessment tool of such systems. The experiments based on eight Web search engines, 25 queries, and binary user relevance judgments show that our method provides results consistent with human-based evaluations. It is shown that the observed consistencies are statistically significant. This indicates that the new method can be successfully used in the evaluation of Web search engines. © 2003 Elsevier Ltd. All rights reserved.
Open Access
Effect of inverted index partitioning schemes on performance of query processing in parallel text retrieval systems
(Springer, 2006-11) Cambazoğlu, B. Barla; Çatal, A.; Aykanat, Cevdet
Shared-nothing, parallel text retrieval systems require an inverted index, representing a document collection, to be partitioned among a number of processors. In general, the index can be partitioned based on either the terms or documents in the collection, and the way the partitioning is done greatly affects the query processing performance of the parallel system. In this work, we investigate the effect of these two index partitioning schemes on query processing. We conduct experiments on a 32-node PC cluster, considering the case where index is completely stored in disk. Performance results are reported for a large (30 GB) document collection using an MPI-based parallel query processing implementation. © Springer-Verlag Berlin Heidelberg 2006.
Open Access
Effective use of space for pivot-based metric indexing structures
(IEEE, 2008-04) Çelik, Cengiz
Among the metric space indexing methods, AESA is known to produce the lowest query costs in terms of the number of distance computations. However, its quadratic construction cost and space consumption makes it infeasiblefor large dataseis. There have been some work on reducing the space requirements of AESA. Instead of keeping all the distances between objects, LAESA appoints a subset of the database as pivots, keeping only the distances between objects and pivots. Kvp uses the idea of prioritizing the pivots based on their distances to objects, only keeping pivot distances that it evaluates as promising. FQA discretizes the distances using a fixed amount of bits per distance instead of using system's floating point types. Varying the number of bits to produce a performance-space trade-off was also studied in Kvp. Recently, BAESA has been proposed based on the same idea, but using different distance ranges for each pivot. The t-spanner based indexing structure compacts the distance matrix by introducing an approximation factor that makes the pivots less effective. In this work, we show that the Kvp prioritization is oriented toward symmetric distance distributions. We offer a new method that evaluates the effectiveness of pivots in a better fashion by making use of the overall distance distribution. We also simulate the performance of our method combined with distance discretization. Our results show that our approach is able to offer very good space-performance trade-offs compared to AESA and tree-based methods. © 2008 IEEE.
Open Access
Efficiency and effectiveness of query processing in cluster-based retrieval
(Elsevier, 2004) Can, F.; Altingövde I.S.; Demir, E.
Our research shows that for large databases, without considerable additional storage overhead, cluster-based retrieval (CBR) can compete with the time efficiency and effectiveness of the inverted index-based full search (FS). The proposed CBR method employs a storage structure that blends the cluster membership information into the inverted file posting lists. This approach significantly reduces the cost of similarity calculations for document ranking during query processing and improves efficiency. For example, in terms of in-memory computations, our new approach can reduce query processing time to 39% of FS. The experiments confirm that the approach is scalable and system performance improves with increasing database size. In the experiments, we use the cover coefficient-based clustering methodology (C3M), and the Financial Times database of TREC containing 210158 documents of size 564 MB defined by 229748 terms with total of 29545234 inverted index elements. This study provides CBR efficiency and effectiveness experiments using the largest corpus in an environment that employs no user interaction or user behavior assumption for clustering. © 2003 Elsevier Ltd. All rights reserved.
Open Access
Exploiting interclass rules for focused crawling
(IEEE, 2004) Altingövde, I. S.; Ulusoy, Özgür
A baseline crawler was developed at the Bilkent University based on a focused-crawling approach. The focused crawler is an agent that targets a particular topic and visits and gathers only a relevant, narrow Web segment while trying not to waste resources on irrelevant materials. The rule-based Web-crawling approach uses linkage statistics among topics to improve a baseline focused crawler's harvest rate and coverage. The crawler also employs a canonical topic taxonomy to train a naïve-Bayesian classifier, which then helps determine the relevancy of crawled pages.
Open Access
İçerik tabanlı görüntü erişimi için sahne sınıflandırması
(IEEE, 2008-04) Çavuş, Özge; Aksoy, Selim
Son yıllarda çok geniş veri tabanlarının kullanımıyla birlikte içerik tabanlı görüntü indekslemesi ve erişimi önemli bir araştırma konusu halini almıştır. Bu çalışmada, görüntü indekslemesi için sahne sınıflandırmasını baz alan bir görüntü erişim sistemi tanımlanmıştır. Görüntülerden çıkarılan alt düzey öznitelikler görüntü indekslemesinde doğrudan kullanılmak yerine, bu öznitelikler sahne sınıflandırması için kullanılmış ve görüntüler sınıflandırma sonucunda elde edilen anlamsal sınıf bilgileriyle indekslenmiştir. Sahne sınıflandırması için “kelime kümesi” (bag of words) dokuman analizi yöntemi olarak bilinen tekniğin bir uyarlaması kullanılmıştır. Görüntü erişim sistemini insan algısıyla desteklemek ve anlambilimsel uçurumu en aza indirgemek için erişim senaryosuna tek sınıf sınıflandırıcı bazlı ilgililik geri beslemesi eklenmiştir. Bunun için, ilgili görüntüleri çok iyi modelleyen, ilgili olmayan görüntülerden de bir o kadar uzak duran bir hiperkure oluşturan destek vektör veri tanımlaması kullanılmıştır. Önerilen yöntemler Corel veri kümesinde denenmiş ve başarılı sonuçlar elde edilmiştir. Content-based image indexing and retrieval have become important research problems with the use of large databases in a wide range of areas. In this study, a content-based image retrieval system that is based on scene classification for image indexing is proposed. Instead of using low-level features directly, semantic class information that is obtained as a result of scene classification is used during indexing. The traditional "bag of words" approach is modified for classifying the scenes. In order to minimize the semantic gap, a relevance feedback approach that is based on one-class classification is also integrated. The support vector data description is used for learning during feedback iterations. The experiments using the Corel data set show good results for both classification and retrieval. ©2008 IEEE.
Open Access
Integration of structural and semantic models for multimedia metadata management
(IEEE, 2007-06) Little, S.; Martinelli, M.; Salvetti, O.; Güdükbay, Uğur; Ulusoy, Özgür; De Chalendar, G.; Grefenstette, G.
The management and exchange of multimedia data is challenging due to the variety of formats, standards and intended applications. In addition, production of multimedia data is rapidly increasing due to the availability of off-the-shelf, modern digital devices that can be used by even inexperienced users. It is likely that this volume of information will only increase in the future. A key goal of the MUSCLE (Multimedia Understanding through Semantics, Computation and Learning) network is to develop tools, technologies and standards to facilitate the interoperability of multimedia content and support the exchange of such data. One approach for achieving this was the creation of a specific "E-Team", composed of the authors, to discuss core questions and practical issues based on the participant's individual work. In this paper, we present the relevant points of view with regards to sharing experiences and to extracting and integrating multimedia data and metadata from different modes (text, images, video). © 2007 IEEE.
Open Access
Key frame selection from MPEG video data
(SPIE, 1997-02) Gerek, Ömer. N.; Altunbaşak, Y.
This paper describes a method for selecting key frames by using a number of parameters extracted from the MPEG video stream. The parameters are directly extracted from the compressed video stream without decompression. A combination of these parameters are then used in a rule based decision system. The computational complexity for extracting the parameters and for key frame decision rule is very small. As a results, the overall operation is very quickly performed and this makes our algorithm handy for practical purposes. The experimental results show that this method can select the distinctive frames of video streams successfully.
Open Access
Matching ottoman words: an image retrieval approach to historical document indexing
(ACM, 2007-07) Ataer, Esra; Duygulu, Pınar
Large archives of Ottoman documents are challenging to many historians all over the world. However, these archives remain inaccessible since manual transcription of such a huge volume is difficult. Automatic transcription is required, but due to the characteristics of Ottoman documents, character recognition based systems may not yield satisfactory results. It is also desirable to store the documents in image form since the documents may contain important drawings, especially the signatures. Due to these reasons, in this study we treat the problem as an image retrieval problem with the view that Ottoman words are images, and we propose a solution based on image matching techniques. The bag-of-visterms approach, which is shown to be successful to classify objects and scenes, is adapted for matching word images. Each word image is represented by a set of visual terms which are obtained by vector quantization of SIFT descriptors extracted from salient points. Similar words are then matched based on the similarity of the distributions of the visual terms. The experiments are carried out on printed and handwritten documents which included over 10,000 words. The results show that, the proposed system is able to retrieve words with high accuracies, and capture the semantic similarities between words. Copyright 2007 ACM.
Open Access
MPEG-7 uyumlu video veri tabanlari için önemli nesnelerin otomatik olarak bulunmasi
(IEEE, 2008-04) Baştan, Muhammed; Güdükbay, Uğur; Ulusoy, Özgür
Bu çalışma, genel olarak nesneye dayalı endekslemeyi destekleyen, özel olarak MPEG-7 uyumlu veritabanları için, videolardan önemli nesnelerin otomatik olarak çıkarılmasını saglayabilecek bir yöntem sunmaktadır. Şimdiye kadar yapılan benzer çalışmalar genellikle resimler üzerinde yoğunlaşmış ve sadece ilk bakışta dikkati çeken alanları bulmaya çalışmıştır. Önerilen yöntem ise videolar üzerinde çalışmak için tasarlanmış olup sadece ilk bakışta dikkat çeken bölgelerin değil, videonun endekslenmesi için önemli sayılabilecek bölgelerin de bulunabilmesini amaçlamaktadır. Bunun için önce video kareleri bölütlere ayrılmakta, sonra her bölüt için yerel ve genel renk, biçim, doku ve hareket bilgileri hesaplanmakta, son olarak bu özellikler kullanılarak eğitilmiş bir destek vektor makinesi (SVM) kullanılarak bölgelerin önemli olup olmadığına karar verilmektedir. İlk deney sonuçları önerilen y öntemin başarılı olduğunu ve elde edilen nesnelerin öncekilere g öre anlamsal olarak daha iyi olduğunu göstermektedir. We describe a method to automatically extract video objects, which are important for object-based indexing of videos in an MPEG-7 compliant video database system. Most of the existing salient object detection approaches detect visually conspicuous image structures, while our method aims to find regions that may be important for indexing in a video database system. Our method works on a shot basis. We first segment each frame to obtain homogeneous regions in terms of color and texture. Then, we extract a set of local and global color, shape, texture and motion features for each region. Finally, the regions are classified as being salient or non-salient using SVMs trained on a few hundreds of example regions. Experimental results from news video segments show that the proposed method is more effective in extracting the important regions in terms of human visual perception. ©2008 IEEE.
Open Access
Retrieval of Ottoman documents
(ACM, 2006-10) Ataer, Esra; Duygulu, Pınar
There is a growing need to access historical Ottoman documents stored in large archives and therefore managing tools for automatic searching, indexing and transcription of these documents is required. In this paper, we present a method for the retrieval of Ottoman documents based on word matching. The method first successfully segments the documents into word images and then uses a hierarchical matching technique to find the similar instances of the word images. The experiments show that even with simple features promising results can be achieved. Copyright 2006 ACM.
Open Access
Segmentation-based extraction of important objects from video for object-based indexing
(IEEE, 2008-06) Baştan, Muhammet; Güdükbay, Uğur; Ulusoy, Özgür
We describe a method to automatically extract important video objects for object-based indexing. Most of the existing salient object detection approaches detect visually conspicuous structures in images, while our method aims to find regions that may be important for indexing in a video database system. Our method works on a shot basis. We first segment each frame to obtain homogeneous regions in terms of color and texture. Then, we extract a set of regional and inter-regional color, shape, texture and motion features for all regions, which are classified as being important or not using SVMs trained on a few hundreds of example regions. Finally, each important region is tracked within each shot for trajectory generation and consistency check. Experimental results from news video sequences show that the proposed approach is effective. © 2008 IEEE.

Browsing by Subject "Indexing (of information)"

Results Per Page

Sort Options