Past, present, and future on news streams: discovering story chains, selecting public front-pages, and filtering microblogs for predicting public reactions to news
Please cite this item using this persistent URLhttp://hdl.handle.net/11693/33809
News streams have several research opportunities for the past, present, and future of events. The past hides relations among events and actors; the present re ects needs of news readers; and the future waits to be predicted. The thesis has three studies regarding these time periods: We discover news chains using zigzagged search in the past, select front-page of current news for the public, and lter microblogs for predicting future public reactions to events. In the rst part, given an input document, we develop a framework for discovering story chains in a text collection. A story chain is a set of related news articles that reveal how different events are connected. The framework has three complementary parts that i) scan the collection, ii) measure the similarity between chain-member candidates and the chain, and iii) measure similarity among news articles. For scan- ning, we apply a novel text-mining method that uses a zigzagged search that reinves- tigates past documents based on the updated chain. We also utilize social networks of news actors to reveal connections among news articles. We conduct two user studies in terms of four effectiveness measures: relevance, coverage, coherence, and ability to disclose relations. The rst user study compares several versions of the framework, by varying parameters, to set a guideline for use. The second compares the framework with 3 baselines. The results show that our method provides sta- tistically signi cant improvement in effectiveness in 61% of pairwise comparisons, with medium or large effect size; in the remainder, none of the baselines signi cantly outperforms our method. In the second part, we select news articles for public front pages using raw text, without any meta-attributes such as click counts. Front-page news selection is the task of nding important news articles in news aggregators. A novel algorithm is introduced by jointly considering the importance and diversity of selected news articles and the length of front pages. We estimate the importance of news, based on topic modelling, to provide the required diversity. Then, we select important documents from important topics using a priority-based method that helps in tting news content into the length of the front page. A user study is conducted to measure effectiveness and diversity. Annotation results show that up to 7 of 10 news articles are important, and up to 9 of them are from different topics. Challenges in selecting public front-page news are addressed with an emphasis on future research. In the third part, we lter microblog texts, speci cally tweets, to news events for predicting future public reactions. Microblog environments like Twitter are increas- ingly becoming more important to leverage people's opinion on news events. We create a new collection, called BilPredict-2017 that includes events including terror- ist attacks in Turkey from 2015 to 2017, and also Turkish tweets that are published during these events. We lter tweets by using important keywords, analyze them in terms of several features. Results show that there is a high correlation between time and frequency of tweets. Sentiment and spatial features also re ect the nature of events, thus all of these features can be utilized in predicting the future.