First large-scale information retrieval experiments on Turkish texts

dc.citation.epage628en_US
dc.citation.spage627en_US
dc.contributor.authorCan, Fazlıen_US
dc.contributor.authorKoçberber, Seyiten_US
dc.contributor.authorBalcık, Ermanen_US
dc.contributor.authorKaynak, Cihanen_US
dc.contributor.authorÖcalan, H. Çağdaşen_US
dc.contributor.authorVursavaş, Onur M.en_US
dc.coverage.spatialSeattle, Washington, USA
dc.date.accessioned2016-02-08T11:47:53Z
dc.date.available2016-02-08T11:47:53Z
dc.date.issued2006-08en_US
dc.departmentDepartment of Computer Engineeringen_US
dc.descriptionDate of Conference: 06-11 August, 2006
dc.descriptionConference name: SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
dc.description.abstractWe present the results of the first large-scale Turkish information retrieval experiments performed on a TREC-like test collection. The test bed, which has been created for this study, contains 95.5 million words, 408,305 documents, 72 ad hoc queries and has a size of about 800MB. All documents come from the Turkish newspaper Milliyet. We implement and apply simple to sophisticated stemmers and various query-document matching fonctions and show that truncating words at a prefix length of 5 creates an effective retrieval environment in Turkish. However, a lemmatizer-based stemmer provides significantly better effectiveness over a variety of matching functions.en_US
dc.identifier.doi10.1145/1148170.1148288
dc.identifier.urihttp://hdl.handle.net/11693/27221
dc.language.isoEnglishen_US
dc.publisherACM
dc.relation.isversionofhttps://doi.org/10.1145/1148170.1148288
dc.source.titleProceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrievalen_US
dc.subjectIR test collection creationen_US
dc.subjectLemmatizeren_US
dc.subjectStemmingen_US
dc.subjectTurkishen_US
dc.subjectData acquisitionen_US
dc.subjectData miningen_US
dc.subjectInformation technologyen_US
dc.subjectQuery languagesen_US
dc.subjectAd hoc networksen_US
dc.subjectQuery processingen_US
dc.subjectText processingen_US
dc.subjectIR test collection creationen_US
dc.subjectLarge-scale information retrievalen_US
dc.subjectMatching functionsen_US
dc.subjectInformation retrievalen_US
dc.subjectInformation retrieval systemsen_US
dc.titleFirst large-scale information retrieval experiments on Turkish textsen_US
dc.typeConference Paperen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
First large-scale information retrieval experiments on Turkish texts.pdf
Size:
117.54 KB
Format:
Adobe Portable Document Format
Description:
Full printable version