Browsing by Subject "News portal"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item Open Access Bilkent News Portal : a system with new event detection and tracking capabilities(Bilkent University, 2009) Öcalan, Hüseyin ÇağdaşNews portal services such as browsing, retrieving, and filtering have become an important research and application area as a result of information explosion on the Internet. In this work, we give implementation details of Bilkent News Portal that contains various novel features ranging from personalization to new event detection and tracking capabilities aiming at addressing the needs of news-consumers. The thesis presents the architecture, data and file structures, and experimental foundations of the news portal. For the implementation and evaluation of the new event detection and tracking component, we developed a test collection: BilCol2005. The collection contains 209,305 documents from the entire year of 2005 and involves several events in which eighty of them are annotated by humans. It enables empirical assessment of new event detection and tracking algorithms on Turkish. For the construction of our test collection, a web application, ETracker, is developed by following the guidelines of the TDT research initiative. Furthermore, we experimentally evaluated the impact of various parameters in information retrieval (IR) that has to be decided during the implementation of a news portal that provides filtering and retrieval capabilities. For this purpose, we investigated the effects of stemming, document length, query length, and scalability issues.Item Open Access Bilkent news portal: A personalizaba system with new event detection and tracking capabilities(ACM, 2008) Can, Fazlı; Koçberber, Seyit; Bağlıoğlu, Özgür; Kardaş, Süleyman; Öcalan, Hüseyin Çağdaş; Uyar, ErkanItem Open Access A template-independent content extraction approach for new web pages(Bilkent University, 2012) Yeniçağ, AhmetNews web pages contain additional elements such as advertisements, hyperlinks, and reader comments. These elements make the extraction of news contents a challenging task. Current news content extraction (NCE) methods are usually template-dependent. They require regular maintenance, since news providers frequently change their web page templates. Therefore, there is a need for NCE methods that extract news contents accurately without depending on web page templates. In this thesis, a template-independent News content EXTraction approach, called N-EXT, is introduced. It rst parses a web page into its blocks according to the HTML tags. Then, it examines all blocks to detect the one that contains the major part of the news content. For this purpose, it assigns weights to the blocks by considering both their textual sizes and similarities to the news title. For quantifying the importance of these two weight components, we use the k-fold cross validation approach; and for assessing the impact of di erent possible similarity measures, we use a one-way Analysis of Variance (ANOVA) with a Sche e comparison. The block with the highest weight is considered as the news block. Our approach eliminates the sentences in the news block that are not related to the news content by considering similarities of sentences to the news block. Finally, it also examines other blocks to detect the rest of the news content. The experimental results show the accuracy and robustness of our method by using two test collections whose web pages are obtained from several di erent news websites.