Bilkent News Portal : a system with new event detection and tracking capabilities
News portal services such as browsing, retrieving, and filtering have become an important research and application area as a result of information explosion on the Internet. In this work, we give implementation details of Bilkent News Portal that contains various novel features ranging from personalization to new event detection and tracking capabilities aiming at addressing the needs of news-consumers. The thesis presents the architecture, data and file structures, and experimental foundations of the news portal. For the implementation and evaluation of the new event detection and tracking component, we developed a test collection: BilCol2005. The collection contains 209,305 documents from the entire year of 2005 and involves several events in which eighty of them are annotated by humans. It enables empirical assessment of new event detection and tracking algorithms on Turkish. For the construction of our test collection, a web application, ETracker, is developed by following the guidelines of the TDT research initiative. Furthermore, we experimentally evaluated the impact of various parameters in information retrieval (IR) that has to be decided during the implementation of a news portal that provides filtering and retrieval capabilities. For this purpose, we investigated the effects of stemming, document length, query length, and scalability issues.