First large-scale information retrieval experiments on Turkish texts
dc.citation.epage | 628 | en_US |
dc.citation.spage | 627 | en_US |
dc.contributor.author | Can, Fazlı | en_US |
dc.contributor.author | Koçberber, Seyit | en_US |
dc.contributor.author | Balcık, Erman | en_US |
dc.contributor.author | Kaynak, Cihan | en_US |
dc.contributor.author | Öcalan, H. Çağdaş | en_US |
dc.contributor.author | Vursavaş, Onur M. | en_US |
dc.coverage.spatial | Seattle, Washington, USA | |
dc.date.accessioned | 2016-02-08T11:47:53Z | |
dc.date.available | 2016-02-08T11:47:53Z | |
dc.date.issued | 2006-08 | en_US |
dc.department | Department of Computer Engineering | en_US |
dc.description | Date of Conference: 06-11 August, 2006 | |
dc.description | Conference name: SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval | |
dc.description.abstract | We present the results of the first large-scale Turkish information retrieval experiments performed on a TREC-like test collection. The test bed, which has been created for this study, contains 95.5 million words, 408,305 documents, 72 ad hoc queries and has a size of about 800MB. All documents come from the Turkish newspaper Milliyet. We implement and apply simple to sophisticated stemmers and various query-document matching fonctions and show that truncating words at a prefix length of 5 creates an effective retrieval environment in Turkish. However, a lemmatizer-based stemmer provides significantly better effectiveness over a variety of matching functions. | en_US |
dc.description.provenance | Made available in DSpace on 2016-02-08T11:47:53Z (GMT). No. of bitstreams: 1 bilkent-research-paper.pdf: 70227 bytes, checksum: 26e812c6f5156f83f0e77b261a471b5a (MD5) Previous issue date: 2006 | en |
dc.identifier.doi | 10.1145/1148170.1148288 | en_US |
dc.identifier.uri | http://hdl.handle.net/11693/27221 | en_US |
dc.language.iso | English | en_US |
dc.publisher | ACM | en_US |
dc.relation.isversionof | https://doi.org/10.1145/1148170.1148288 | |
dc.source.title | Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval | en_US |
dc.subject | IR test collection creation | en_US |
dc.subject | Lemmatizer | en_US |
dc.subject | Stemming | en_US |
dc.subject | Turkish | en_US |
dc.subject | Data acquisition | en_US |
dc.subject | Data mining | en_US |
dc.subject | Information technology | en_US |
dc.subject | Query languages | en_US |
dc.subject | Ad hoc networks | en_US |
dc.subject | Query processing | en_US |
dc.subject | Text processing | en_US |
dc.subject | IR test collection creation | en_US |
dc.subject | Large-scale information retrieval | en_US |
dc.subject | Matching functions | en_US |
dc.subject | Information retrieval | en_US |
dc.subject | Information retrieval systems | en_US |
dc.title | First large-scale information retrieval experiments on Turkish texts | en_US |
dc.type | Conference Paper | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- First large-scale information retrieval experiments on Turkish texts.pdf
- Size:
- 117.54 KB
- Format:
- Adobe Portable Document Format
- Description:
- Full printable version