First large-scale information retrieval experiments on Turkish texts
Öcalan, H. Çağdaş
Vursavaş, Onur M.
Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
627 - 628
Item Usage Stats
We present the results of the first large-scale Turkish information retrieval experiments performed on a TREC-like test collection. The test bed, which has been created for this study, contains 95.5 million words, 408,305 documents, 72 ad hoc queries and has a size of about 800MB. All documents come from the Turkish newspaper Milliyet. We implement and apply simple to sophisticated stemmers and various query-document matching fonctions and show that truncating words at a prefix length of 5 creates an effective retrieval environment in Turkish. However, a lemmatizer-based stemmer provides significantly better effectiveness over a variety of matching functions.
KeywordsIR test collection creation
Ad hoc networks
IR test collection creation
Large-scale information retrieval
Information retrieval systems
Published Version (Please cite this version)https://doi.org/10.1145/1148170.1148288
Showing items related by title, author, creator and subject.
An analysis of manipulated information and respective alternative costs in information systems and in decision making structures Güvenen O.; Öztürk, M.H. (International Institute of Informatics and Systemics, IIIS, 2006)Today Information Technologies create base for the most important decision support systems for the practices in academia, business and politics. The effectiveness and success of operations that are supported by information ...
Altıngövde, İsmail Şengör; Özel, Selma A.; Ulusoy, Özgür; Özsoyoğlu G.; Özsoyoğlu, Z.M. (Springer, Berlin, Heidelberg, 2001)This paper deals with the problem of modeling web information resources using expert knowledge and personalized user information, and querying them in terms of topics and topic relationships. We propose a model for web ...
Ali, Syed Amjad; Ince, E.A. (IEEE, 2007)The statistical characteristics of impulsive noise differ greatly from those of Gaussian noise. Hence, the performance of conventional decoders, optimized for additive white Gaussian noise (AWGN) channels is not promising ...