Sentence based topic modeling

Date

2014

Editor(s)

Advisor

Ulusoy, Özgür

Supervisor

Co-Advisor

Co-Supervisor

Instructor

Source Title

Print ISSN

Electronic ISSN

Publisher

Volume

Issue

Pages

Language

English

Type

Journal Title

Journal ISSN

Volume Title

Attention Stats
Usage Stats
0
views
7
downloads

Series

Abstract

Fast augmentation of large text collections in digital world makes inevitable to automatically extract short descriptions of those texts. Even if a lot of studies have been done on detecting hidden topics in text corpora, almost all models follow the bag-of-words assumption. This study presents a new unsupervised learning method that reveals topics in a text corpora and the topic distribution of each text in the corpora. The texts in the corpora are described by a generative graphical model, in which each sentence is generated by a single topic and the topics of consecutive sentences follow a hidden Markov chain. In contrast to bagof-words paradigm, the model assumes each sentence as a unit block and builds on a memory of topics slowly changing in a meaningful way as the text flows. The results are evaluated both qualitatively by examining topic keywords from particular text collections and quantitatively by means of perplexity, a measure of generalization of the model.

Course

Other identifiers

Book Title

Degree Discipline

Computer Engineering

Degree Level

Master's

Degree Name

MS (Master of Science)

Citation

Published Version (Please cite this version)