dc.contributor.advisor | Can, Fazlı | |
dc.contributor.author | Kardaş, Süleyman | |
dc.date.accessioned | 2016-01-08T18:10:16Z | |
dc.date.available | 2016-01-08T18:10:16Z | |
dc.date.issued | 2009 | |
dc.identifier.uri | http://hdl.handle.net/11693/14880 | |
dc.description | Ankara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2009. | en_US |
dc.description | Thesis (Master's) -- Bilkent University, 2009. | en_US |
dc.description | Includes bibliographical references leaves 66-73. | en_US |
dc.description.abstract | The amount of information and the number of information resources on the Internet
have been growing rapidly for over a decade. This is also true for on-line
news and news providers. To overcome information overload news consumers
prefer to track the topics that they are interested in. Topic detection and tracking
(TDT) applications aim to organize the temporally ordered stories of a news
stream according to the events. Two major problems in TDT are new event
detection (NED) and topic tracking (TT). These problems respectively focus on
finding the first stories of previously unseen new events and all subsequent stories
on a certain topic defined by a small number of initial stories. In this thesis,
the NED and TT problems are investigated in detail using the first large-scale
test collection (BilCol2005) developed by Bilkent Information Retrieval Group.
The collection contains 209,305 documents from the entire year of 2005 and involves
several events in which eighty of them are annotated by humans. The
experimental results show that a simple word truncation stemming method can
statistically compete with a sophisticated stemming approach that pays attention
to the morphological structure of the language. Our statistical findings illustrate
that word stopping and the contents of the associated stopword list are important
and removing the stopwords from content can significantly improve the system
performance. We demonstrate that the confidence scores of two different similarity
measures can be combined in a straightforward manner for improving the
effectiveness. | en_US |
dc.description.statementofresponsibility | Kardaş, Süleyman | en_US |
dc.format.extent | xv, 77 leaves, graphics | en_US |
dc.language.iso | English | en_US |
dc.rights | info:eu-repo/semantics/openAccess | en_US |
dc.subject | New Event Detection | en_US |
dc.subject | Topic Detection and Tracking | en_US |
dc.subject | Turkish | en_US |
dc.subject.lcc | Z699 .K37 2009 | en_US |
dc.subject.lcsh | Information storage and retrieval systems. | en_US |
dc.subject.lcsh | Information retrieval. | en_US |
dc.subject.lcsh | Automatic tracking. | en_US |
dc.title | New event detection and tracking in Turkish | en_US |
dc.type | Thesis | en_US |
dc.department | Department of Computer Engineering | en_US |
dc.publisher | Bilkent University | en_US |
dc.description.degree | M.S. | en_US |
dc.identifier.itemid | BILKUTUPB116269 | |