First large-scale information retrieval experiments on Turkish texts

Can, FazlıKoçberber, SeyitBalcık, ErmanKaynak, CihanÖcalan, H. ÇağdaşVursavaş, Onur M.2016-02-082016-02-082006-08http://hdl.handle.net/11693/27221Date of Conference: 06-11 August, 2006Conference name: SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrievalWe present the results of the first large-scale Turkish information retrieval experiments performed on a TREC-like test collection. The test bed, which has been created for this study, contains 95.5 million words, 408,305 documents, 72 ad hoc queries and has a size of about 800MB. All documents come from the Turkish newspaper Milliyet. We implement and apply simple to sophisticated stemmers and various query-document matching fonctions and show that truncating words at a prefix length of 5 creates an effective retrieval environment in Turkish. However, a lemmatizer-based stemmer provides significantly better effectiveness over a variety of matching functions.EnglishIR test collection creationLemmatizerStemmingTurkishData acquisitionData miningInformation technologyQuery languagesAd hoc networksQuery processingText processingIR test collection creationLarge-scale information retrievalMatching functionsInformation retrievalInformation retrieval systemsFirst large-scale information retrieval experiments on Turkish textsConference Paper10.1145/1148170.1148288