First large-scale information retrieval experiments on Turkish texts

Can, Fazlı; Koçberber, Seyit; Balcık, Erman; Kaynak, Cihan; Öcalan, H. Çağdaş; Vursavaş, Onur M.

First large-scale information retrieval experiments on Turkish texts

dc.citation.epage	628	en_US
dc.citation.spage	627	en_US
dc.contributor.author	Can, Fazlı	en_US
dc.contributor.author	Koçberber, Seyit	en_US
dc.contributor.author	Balcık, Erman	en_US
dc.contributor.author	Kaynak, Cihan	en_US
dc.contributor.author	Öcalan, H. Çağdaş	en_US
dc.contributor.author	Vursavaş, Onur M.	en_US
dc.coverage.spatial	Seattle, Washington, USA
dc.date.accessioned	2016-02-08T11:47:53Z
dc.date.available	2016-02-08T11:47:53Z
dc.date.issued	2006-08	en_US
dc.department	Department of Computer Engineering	en_US
dc.description	Date of Conference: 06-11 August, 2006
dc.description	Conference name: SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
dc.description.abstract	We present the results of the first large-scale Turkish information retrieval experiments performed on a TREC-like test collection. The test bed, which has been created for this study, contains 95.5 million words, 408,305 documents, 72 ad hoc queries and has a size of about 800MB. All documents come from the Turkish newspaper Milliyet. We implement and apply simple to sophisticated stemmers and various query-document matching fonctions and show that truncating words at a prefix length of 5 creates an effective retrieval environment in Turkish. However, a lemmatizer-based stemmer provides significantly better effectiveness over a variety of matching functions.	en_US
dc.identifier.doi	10.1145/1148170.1148288	en_US
dc.identifier.uri	http://hdl.handle.net/11693/27221	en_US
dc.language.iso	English	en_US
dc.publisher	ACM	en_US
dc.relation.isversionof	https://doi.org/10.1145/1148170.1148288
dc.source.title	Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval	en_US
dc.subject	IR test collection creation	en_US
dc.subject	Lemmatizer	en_US
dc.subject	Stemming	en_US
dc.subject	Turkish	en_US
dc.subject	Data acquisition	en_US
dc.subject	Data mining	en_US
dc.subject	Information technology	en_US
dc.subject	Query languages	en_US
dc.subject	Ad hoc networks	en_US
dc.subject	Query processing	en_US
dc.subject	Text processing	en_US
dc.subject	IR test collection creation	en_US
dc.subject	Large-scale information retrieval	en_US
dc.subject	Matching functions	en_US
dc.subject	Information retrieval	en_US
dc.subject	Information retrieval systems	en_US
dc.title	First large-scale information retrieval experiments on Turkish texts	en_US
dc.type	Conference Paper	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: First large-scale information retrieval experiments on Turkish texts.pdf
Size:: 117.54 KB
Format:: Adobe Portable Document Format
Description:: Full printable version

Download

Collections

Scholarly Publications - Computer Engineering