Novelty detection in topic tracking

Date

2010

Editor(s)

Advisor

Can, Fazlı

Supervisor

Co-Advisor

Co-Supervisor

Instructor

Source Title

Print ISSN

Electronic ISSN

Publisher

Volume

Issue

Pages

Language

English

Journal Title

Journal ISSN

Volume Title

Series

Abstract

News portals provide many services to the news consumers such as information retrieval, personalized information filtering, summarization and news clustering. Additionally, many news portals using multiple sources enable their users to evaluate developments from different perspectives by richening the content. However, increasing number of sources and incoming news makes it difficult for news consumers to find news of their interest in news portals. Different types of organizational operations are applied to ease browsing over the news for this reason. New event detection and tracking (NEDT) is one of these operations which aims to organize news with respect to the events that they report. NEDT may not also be enough by itself to satisfy the news consumers’ needs because of the repetitions of information that may occur in the tracking news of a topic due to usage of multiple sources. In this thesis, we investigate usage of novelty detection (ND) in tracking news of a topic. For this aim, we built a Turkish ND experimental collection, BilNov, consisting of 59 topics with an average of 51 tracking news. We propose usage of three methods; cosine similarity-based ND method, language model-based ND method and cover coefficient-based ND method. Additionally, we experiment on category-based threshold learning which has not been worked on previously in ND literature. We also provide some experimental pointers for ND in Turkish such as restriction of document vector lengths and smoothing methods. Finally, we experiment on TREC Novelty Track 2004 dataset. Experiments conducted by using BilNov show that language model-based ND method outperforms other two methods significantly and category-based threshold learning has promising results when compared to general threshold learning.

Course

Other identifiers

Book Title

Degree Discipline

Computer Engineering

Degree Level

Master's

Degree Name

MS (Master of Science)

Citation

Published Version (Please cite this version)