Novelty detection in topic tracking
Please cite this item using this persistent URLhttp://hdl.handle.net/11693/15148
News portals provide many services to the news consumers such as information retrieval, personalized information filtering, summarization and news clustering. Additionally, many news portals using multiple sources enable their users to evaluate developments from different perspectives by richening the content. However, increasing number of sources and incoming news makes it difficult for news consumers to find news of their interest in news portals. Different types of organizational operations are applied to ease browsing over the news for this reason. New event detection and tracking (NEDT) is one of these operations which aims to organize news with respect to the events that they report. NEDT may not also be enough by itself to satisfy the news consumers’ needs because of the repetitions of information that may occur in the tracking news of a topic due to usage of multiple sources. In this thesis, we investigate usage of novelty detection (ND) in tracking news of a topic. For this aim, we built a Turkish ND experimental collection, BilNov, consisting of 59 topics with an average of 51 tracking news. We propose usage of three methods; cosine similarity-based ND method, language model-based ND method and cover coefficient-based ND method. Additionally, we experiment on category-based threshold learning which has not been worked on previously in ND literature. We also provide some experimental pointers for ND in Turkish such as restriction of document vector lengths and smoothing methods. Finally, we experiment on TREC Novelty Track 2004 dataset. Experiments conducted by using BilNov show that language model-based ND method outperforms other two methods significantly and category-based threshold learning has promising results when compared to general threshold learning.