Past, present, and future on news streams: discovering story chains, selecting public front-pages, and filtering microblogs for predicting public reactions to news

buir.advisorCan, Fazlı
dc.contributor.authorToraman, Çağrı
dc.date.accessioned2017-10-17T07:48:33Z
dc.date.available2017-10-17T07:48:33Z
dc.date.copyright2017-09
dc.date.issued2017-09
dc.date.submitted2017-10-16
dc.descriptionCataloged from PDF version of article.en_US
dc.descriptionThesis (Ph.D.): Bilkent University, Department of Computer Engineering, İhsan Doğramacı Bilkent University, 2017.en_US
dc.descriptionIncludes bibliographical references (leaves 85-97).en_US
dc.description.abstractNews streams have several research opportunities for the past, present, and future of events. The past hides relations among events and actors; the present re ects needs of news readers; and the future waits to be predicted. The thesis has three studies regarding these time periods: We discover news chains using zigzagged search in the past, select front-page of current news for the public, and lter microblogs for predicting future public reactions to events. In the rst part, given an input document, we develop a framework for discovering story chains in a text collection. A story chain is a set of related news articles that reveal how different events are connected. The framework has three complementary parts that i) scan the collection, ii) measure the similarity between chain-member candidates and the chain, and iii) measure similarity among news articles. For scan- ning, we apply a novel text-mining method that uses a zigzagged search that reinves- tigates past documents based on the updated chain. We also utilize social networks of news actors to reveal connections among news articles. We conduct two user studies in terms of four effectiveness measures: relevance, coverage, coherence, and ability to disclose relations. The rst user study compares several versions of the framework, by varying parameters, to set a guideline for use. The second compares the framework with 3 baselines. The results show that our method provides sta- tistically signi cant improvement in effectiveness in 61% of pairwise comparisons, with medium or large effect size; in the remainder, none of the baselines signi cantly outperforms our method. In the second part, we select news articles for public front pages using raw text, without any meta-attributes such as click counts. Front-page news selection is the task of nding important news articles in news aggregators. A novel algorithm is introduced by jointly considering the importance and diversity of selected news articles and the length of front pages. We estimate the importance of news, based on topic modelling, to provide the required diversity. Then, we select important documents from important topics using a priority-based method that helps in tting news content into the length of the front page. A user study is conducted to measure effectiveness and diversity. Annotation results show that up to 7 of 10 news articles are important, and up to 9 of them are from different topics. Challenges in selecting public front-page news are addressed with an emphasis on future research. In the third part, we lter microblog texts, speci cally tweets, to news events for predicting future public reactions. Microblog environments like Twitter are increas- ingly becoming more important to leverage people's opinion on news events. We create a new collection, called BilPredict-2017 that includes events including terror- ist attacks in Turkey from 2015 to 2017, and also Turkish tweets that are published during these events. We lter tweets by using important keywords, analyze them in terms of several features. Results show that there is a high correlation between time and frequency of tweets. Sentiment and spatial features also re ect the nature of events, thus all of these features can be utilized in predicting the future.en_US
dc.description.provenanceSubmitted by Betül Özen (ozen@bilkent.edu.tr) on 2017-10-17T07:48:33Z No. of bitstreams: 1 10167629.pdf: 11383337 bytes, checksum: 04391505b97b565dd1c82f6357d25180 (MD5)en
dc.description.provenanceMade available in DSpace on 2017-10-17T07:48:33Z (GMT). No. of bitstreams: 1 10167629.pdf: 11383337 bytes, checksum: 04391505b97b565dd1c82f6357d25180 (MD5) Previous issue date: 2017-09en
dc.description.statementofresponsibilityby Çağrı Toraman.en_US
dc.format.extentxiv, 106 leaves : charts ; 30 cmen_US
dc.identifier.itemidB156908
dc.identifier.urihttp://hdl.handle.net/11693/33809
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectFilteringen_US
dc.subjectFront-pageen_US
dc.subjectMicroblogen_US
dc.subjectNews actoren_US
dc.subjectNews chainen_US
dc.subjectNews selectionen_US
dc.subjectPublic reactionen_US
dc.subjectText miningen_US
dc.subjectTopic modelingen_US
dc.subjectZigzagged searchen_US
dc.titlePast, present, and future on news streams: discovering story chains, selecting public front-pages, and filtering microblogs for predicting public reactions to newsen_US
dc.title.alternativeHaber akışlarında geçmis, günümüz ve gelecek: haber zincirlerinin keşfi, anasayfaların haber seçimi, habere karşı toplumsal tepkinin tahmini için mikroblog filtrelenmesien_US
dc.typeThesisen_US
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelDoctoral
thesis.degree.namePh.D. (Doctor of Philosophy)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
10167629.pdf
Size:
10.86 MB
Format:
Adobe Portable Document Format
Description:
Full printable version

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: