Browsing by Subject "Document retrieval"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item Open Access Keyphrase extraction through query performance prediction(Sage Publications Ltd., 2012) Ercan, G.; Cicekli, I.Previous research shows that keyphrases are useful tools in document retrieval and navigation. While these point to a relation between keyphrases and document retrieval performance, no other work uses this relationship to identify keyphrases of a given document. This work aims to establish a link between the problems of query performance prediction (QPP) and keyphrase extraction. To this end, features used in QPP are evaluated in keyphrase extraction using a naïve Bayes classifier. Our experiments indicate that these features improve the effectiveness of keyphrase extraction in documents of different length. More importantly, commonly used features of frequency and first position in text perform poorly on shorter documents, whereas QPP features are more robust and achieve better results. © 2012 The Author(s).Item Open Access A new representation for matching words(Bilkent University, 2007) Ataer, EsraLarge archives of historical documents are challenging to many researchers all over the world. However, these archives remain inaccessible since manual indexing and transcription of such a huge volume is difficult. In addition, electronic imaging tools and image processing techniques gain importance with the rapid increase in digitalization of materials in libraries and archives. In this thesis, a language independent method is proposed for representation of word images, which leads to retrieval and indexing of documents. While character recognition methods suffer from preprocessing and overtraining, we make use of another method, which is based on extracting words from documents and representing each word image with the features of invariant regions. The bag-of-words approach, which is shown to be successful to classify objects and scenes, is adapted for matching words. Since the curvature or connection points, or the dots are important visual features to distinct two words from each other, we make use of the salient points which are shown to be successful in representing such distinctive areas and heavily used for matching. Difference of Gaussian (DoG) detector, which is able to find scale invariant regions, and Harris Affine detector, which detects affine invariant regions, are used for detection of such areas and detected keypoints are described with Scale Invariant Feature Transform (SIFT) features. Then, each word image is represented by a set of visual terms which are obtained by vector quantization of SIFT descriptors and similar words are matched based on the similarity of these representations by using different distance measures. These representations are used both for document retrieval and word spotting. The experiments are carried out on Arabic, Latin and Ottoman datasets, which included different writing styles and different writers. The results show that the proposed method is successful on retrieval and indexing of documents even if with different scripts and different writers and since it is language independent, it can be easily adapted to other languages as well. Retrieval performance of the system is comparable to the state of the art methods in this field. In addition, the system is succesfull on capturing semantic similarities, which is useful for indexing, and it does not include any supervising step.Item Open Access Stylistic document retrieval for Turkish(IEEE, 2009-09) Zamalieva, Daniya; Kalaycılar, Fırat; Kale, Aslı; Pehlivan, Selen; Can, FazlıIn information retrieval (IR) systems, there are a query and a collection of documents compared with this query and ranked according to a particular similarity measure. Since texts with the same content can be written by different authors, the writing styles of the documents change as well accordingly. This observation brings the idea of investigating text by means of style. In this paper, we analyze text documents in terms of stylistic features of the written text and measure effectiveness of these features in an IR system. Our main focus is on Turkish text documents. Although there are many studies about broadening IR systems with style based enhancement, there is no similar application for Turkish which performs retrieval depending purely on style. © 2009 IEEE.