First large-scale information retrieval experiments on Turkish texts
Author(s)
Date
2006-08Source Title
Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
Publisher
ACM
Pages
627 - 628
Language
English
Type
Conference PaperItem Usage Stats
258
views
views
244
downloads
downloads
Abstract
We present the results of the first large-scale Turkish information retrieval experiments performed on a TREC-like test collection. The test bed, which has been created for this study, contains 95.5 million words, 408,305 documents, 72 ad hoc queries and has a size of about 800MB. All documents come from the Turkish newspaper Milliyet. We implement and apply simple to sophisticated stemmers and various query-document matching fonctions and show that truncating words at a prefix length of 5 creates an effective retrieval environment in Turkish. However, a lemmatizer-based stemmer provides significantly better effectiveness over a variety of matching functions.
Keywords
IR test collection creationLemmatizer
Stemming
Turkish
Data acquisition
Data mining
Information technology
Query languages
Ad hoc networks
Query processing
Text processing
IR test collection creation
Large-scale information retrieval
Matching functions
Information retrieval
Information retrieval systems
Permalink
http://hdl.handle.net/11693/27221Published Version (Please cite this version)
https://doi.org/10.1145/1148170.1148288Collections
Related items
Showing items related by title, author, creator and subject.
-
An analysis of manipulated information and respective alternative costs in information systems and in decision making structures
Güvenen O.; Öztürk, M.H. (International Institute of Informatics and Systemics, IIIS, 2006)Today Information Technologies create base for the most important decision support systems for the practices in academia, business and politics. The effectiveness and success of operations that are supported by information ... -
Topic-Centric Querying of Web Information Resourcest
Altıngövde, İsmail Şengör; Özel, Selma A.; Ulusoy, Özgür; Özsoyoğlu G.; Özsoyoğlu, Z.M. (Springer, Berlin, Heidelberg, 2001)This paper deals with the problem of modeling web information resources using expert knowledge and personalized user information, and querying them in terms of topics and topic relationships. We propose a model for web ... -
Performance analysis of turbo codes over Rician fading channels with impulsive noise
Ali, Syed Amjad; Ince, E.A. (IEEE, 2007)The statistical characteristics of impulsive noise differ greatly from those of Gaussian noise. Hence, the performance of conventional decoders, optimized for additive white Gaussian noise (AWGN) channels is not promising ...