First large-scale information retrieval experiments on Turkish texts

We present the results of the first large-scale Turkish information retrieval experiments performed on a TREC-like test collection. The test bed, which has been created for this study, contains 95.5 million words, 408,305 documents, 72 ad hoc queries and has a size of about 800MB. All documents come from the Turkish newspaper Milliyet. We implement and apply simple to sophisticated stemmers and various query-document matching fonctions and show that truncating words at a prefix length of 5 creates an effective retrieval environment in Turkish. However, a lemmatizer-based stemmer provides significantly better effectiveness over a variety of matching functions.

Source Title

Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

Publisher

ACM

Keywords

IR test collection creation, Lemmatizer, Stemming, Turkish, Data acquisition, Data mining, Information technology, Query languages, Ad hoc networks, Query processing, Text processing, IR test collection creation, Large-scale information retrieval, Matching functions, Information retrieval, Information retrieval systems

Permalink

http://hdl.handle.net/11693/27221

Published Version (Please cite this version)

https://doi.org/10.1145/1148170.1148288

Collections

Scholarly Publications - Computer Engineering

Language

English

Type

Conference Paper

Full item page

First large-scale information retrieval experiments on Turkish texts

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

First large-scale information retrieval experiments on Turkish texts

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type