First large-scale information retrieval experiments on Turkish texts
Date
2006-08
Editor(s)
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
BUIR Usage Stats
5
views
views
31
downloads
downloads
Citation Stats
Series
Abstract
We present the results of the first large-scale Turkish information retrieval experiments performed on a TREC-like test collection. The test bed, which has been created for this study, contains 95.5 million words, 408,305 documents, 72 ad hoc queries and has a size of about 800MB. All documents come from the Turkish newspaper Milliyet. We implement and apply simple to sophisticated stemmers and various query-document matching fonctions and show that truncating words at a prefix length of 5 creates an effective retrieval environment in Turkish. However, a lemmatizer-based stemmer provides significantly better effectiveness over a variety of matching functions.
Source Title
Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
Publisher
ACM
Course
Other identifiers
Book Title
Keywords
IR test collection creation, Lemmatizer, Stemming, Turkish, Data acquisition, Data mining, Information technology, Query languages, Ad hoc networks, Query processing, Text processing, IR test collection creation, Large-scale information retrieval, Matching functions, Information retrieval, Information retrieval systems
Degree Discipline
Degree Level
Degree Name
Citation
Permalink
Published Version (Please cite this version)
Language
English