Past, present, and future on news streams: discovering story chains, selecting public front-pages, and filtering microblogs for predicting public reactions to news
Author(s)
Advisor
Can, FazlıDate
2017-09Publisher
Bilkent University
Language
English
Type
ThesisItem Usage Stats
216
views
views
197
downloads
downloads
Abstract
News streams have several research opportunities for the past, present, and future of
events. The past hides relations among events and actors; the present re
ects needs
of news readers; and the future waits to be predicted. The thesis has three studies
regarding these time periods: We discover news chains using zigzagged search in
the past, select front-page of current news for the public, and lter microblogs for
predicting future public reactions to events.
In the rst part, given an input document, we develop a framework for discovering
story chains in a text collection. A story chain is a set of related news articles that
reveal how different events are connected. The framework has three complementary
parts that i) scan the collection, ii) measure the similarity between chain-member
candidates and the chain, and iii) measure similarity among news articles. For scan-
ning, we apply a novel text-mining method that uses a zigzagged search that reinves-
tigates past documents based on the updated chain. We also utilize social networks
of news actors to reveal connections among news articles. We conduct two user
studies in terms of four effectiveness measures: relevance, coverage, coherence, and
ability to disclose relations. The rst user study compares several versions of the
framework, by varying parameters, to set a guideline for use. The second compares
the framework with 3 baselines. The results show that our method provides sta-
tistically signi cant improvement in effectiveness in 61% of pairwise comparisons,
with medium or large effect size; in the remainder, none of the baselines signi cantly
outperforms our method. In the second part, we select news articles for public front pages using raw text,
without any meta-attributes such as click counts. Front-page news selection is the
task of nding important news articles in news aggregators. A novel algorithm
is introduced by jointly considering the importance and diversity of selected news
articles and the length of front pages. We estimate the importance of news, based
on topic modelling, to provide the required diversity. Then, we select important
documents from important topics using a priority-based method that helps in tting
news content into the length of the front page. A user study is conducted to measure
effectiveness and diversity. Annotation results show that up to 7 of 10 news articles
are important, and up to 9 of them are from different topics. Challenges in selecting
public front-page news are addressed with an emphasis on future research.
In the third part, we lter microblog texts, speci cally tweets, to news events for
predicting future public reactions. Microblog environments like Twitter are increas-
ingly becoming more important to leverage people's opinion on news events. We
create a new collection, called BilPredict-2017 that includes events including terror-
ist attacks in Turkey from 2015 to 2017, and also Turkish tweets that are published
during these events. We lter tweets by using important keywords, analyze them in
terms of several features. Results show that there is a high correlation between time
and frequency of tweets. Sentiment and spatial features also re
ect the nature of
events, thus all of these features can be utilized in predicting the future.
Keywords
FilteringFront-page
Microblog
News actor
News chain
News selection
Public reaction
Text mining
Topic modeling
Zigzagged search