Past, present, and future on news streams: discovering story chains, selecting public front-pages, and filtering microblogs for predicting public reactions to news

Toraman, Çağrı

Past, present, and future on news streams: discovering story chains, selecting public front-pages, and filtering microblogs for predicting public reactions to news

buir.advisor	Can, Fazlı
dc.contributor.author	Toraman, Çağrı
dc.date.accessioned	2017-10-17T07:48:33Z
dc.date.available	2017-10-17T07:48:33Z
dc.date.copyright	2017-09
dc.date.issued	2017-09
dc.date.submitted	2017-10-16
dc.description	Cataloged from PDF version of article.	en_US
dc.description	Includes bibliographical references (leaves 85-97).	en_US
dc.description.abstract	News streams have several research opportunities for the past, present, and future of events. The past hides relations among events and actors; the present re ects needs of news readers; and the future waits to be predicted. The thesis has three studies regarding these time periods: We discover news chains using zigzagged search in the past, select front-page of current news for the public, and lter microblogs for predicting future public reactions to events. In the rst part, given an input document, we develop a framework for discovering story chains in a text collection. A story chain is a set of related news articles that reveal how different events are connected. The framework has three complementary parts that i) scan the collection, ii) measure the similarity between chain-member candidates and the chain, and iii) measure similarity among news articles. For scan- ning, we apply a novel text-mining method that uses a zigzagged search that reinves- tigates past documents based on the updated chain. We also utilize social networks of news actors to reveal connections among news articles. We conduct two user studies in terms of four effectiveness measures: relevance, coverage, coherence, and ability to disclose relations. The rst user study compares several versions of the framework, by varying parameters, to set a guideline for use. The second compares the framework with 3 baselines. The results show that our method provides sta- tistically signi cant improvement in effectiveness in 61% of pairwise comparisons, with medium or large effect size; in the remainder, none of the baselines signi cantly outperforms our method. In the second part, we select news articles for public front pages using raw text, without any meta-attributes such as click counts. Front-page news selection is the task of nding important news articles in news aggregators. A novel algorithm is introduced by jointly considering the importance and diversity of selected news articles and the length of front pages. We estimate the importance of news, based on topic modelling, to provide the required diversity. Then, we select important documents from important topics using a priority-based method that helps in tting news content into the length of the front page. A user study is conducted to measure effectiveness and diversity. Annotation results show that up to 7 of 10 news articles are important, and up to 9 of them are from different topics. Challenges in selecting public front-page news are addressed with an emphasis on future research. In the third part, we lter microblog texts, speci cally tweets, to news events for predicting future public reactions. Microblog environments like Twitter are increas- ingly becoming more important to leverage people's opinion on news events. We create a new collection, called BilPredict-2017 that includes events including terror- ist attacks in Turkey from 2015 to 2017, and also Turkish tweets that are published during these events. We lter tweets by using important keywords, analyze them in terms of several features. Results show that there is a high correlation between time and frequency of tweets. Sentiment and spatial features also re ect the nature of events, thus all of these features can be utilized in predicting the future.	en_US
dc.description.statementofresponsibility	by Çağrı Toraman.	en_US
dc.format.extent	xiv, 106 leaves : charts ; 30 cm	en_US
dc.identifier.itemid	B156908
dc.identifier.uri	http://hdl.handle.net/11693/33809
dc.language.iso	English	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Filtering	en_US
dc.subject	Front-page	en_US
dc.subject	Microblog	en_US
dc.subject	News actor	en_US
dc.subject	News chain	en_US
dc.subject	News selection	en_US
dc.subject	Public reaction	en_US
dc.subject	Text mining	en_US
dc.subject	Topic modeling	en_US
dc.subject	Zigzagged search	en_US
dc.title	Past, present, and future on news streams: discovering story chains, selecting public front-pages, and filtering microblogs for predicting public reactions to news	en_US
dc.title.alternative	Haber akışlarında geçmis, günümüz ve gelecek: haber zincirlerinin keşfi, anasayfaların haber seçimi, habere karşı toplumsal tepkinin tahmini için mikroblog filtrelenmesi	en_US
dc.type	Thesis	en_US
thesis.degree.discipline	Computer Engineering
thesis.degree.grantor	Bilkent University
thesis.degree.level	Doctoral
thesis.degree.name	Ph.D. (Doctor of Philosophy)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 10167629.pdf
Size:: 10.86 MB
Format:: Adobe Portable Document Format
Description:: Full printable version

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Graduate School of Engineering and Science