A front-page news-selection algorithm based on topic modelling using raw text

Toroman, C.; Can, F.

A front-page news-selection algorithm based on topic modelling using raw text

Files

A_front_page_news_selection_algorithm_based_on_topic_modelling_using_raw_text.pdf (508.69 KB)

Date

2015

Authors

Toroman, C.

Can, F.

BUIR Usage Stats

3
views

62
downloads

Citation Stats

Attention Stats

Abstract

Front-page news selection is the task of finding important news articles in news aggregators. In this study, we examine news selection for public front pages using raw text, without any meta-attributes such as click counts. A novel algorithm is introduced by jointly considering the importance and diversity of selected news articles and the length of front pages. We estimate the importance of news, based on topic modelling, to provide the required diversity. Then we select important documents from important topics using a priority-based method that helps in fitting news content into the length of the front page. A user study is subsequently conducted to measure effectiveness and diversity, using our newly-generated annotation program. Annotation results show that up to seven of 10 news articles are important and up to nine of them are from different topics. Challenges in selecting public front-page news are addressed with an emphasis on future research.

Source Title

Journal of Information Science

Publisher

Sage Publications Ltd.

Keywords

Diversity, Document importance, Front page, LDA, News selection, Priority scheduling, Topic importance, Topic modelling

Permalink

http://hdl.handle.net/11693/48286

Published Version (Please cite this version)

https://journals.sagepub.com/doi/pdf/10.1177/0165551515589069

Collections

Scholarly Publications - Computer Engineering

Language

English

Type

Article

Full item page

A front-page news-selection algorithm based on topic modelling using raw text

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Attention Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

A front-page news-selection algorithm based on topic modelling using raw text

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Attention Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type