Sentence based topic modeling
Date
Authors
Editor(s)
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
Source Title
Print ISSN
Electronic ISSN
Publisher
Volume
Issue
Pages
Language
Type
Journal Title
Journal ISSN
Volume Title
Attention Stats
Usage Stats
views
downloads
Series
Abstract
Fast augmentation of large text collections in digital world makes inevitable to automatically extract short descriptions of those texts. Even if a lot of studies have been done on detecting hidden topics in text corpora, almost all models follow the bag-of-words assumption. This study presents a new unsupervised learning method that reveals topics in a text corpora and the topic distribution of each text in the corpora. The texts in the corpora are described by a generative graphical model, in which each sentence is generated by a single topic and the topics of consecutive sentences follow a hidden Markov chain. In contrast to bagof-words paradigm, the model assumes each sentence as a unit block and builds on a memory of topics slowly changing in a meaningful way as the text flows. The results are evaluated both qualitatively by examining topic keywords from particular text collections and quantitatively by means of perplexity, a measure of generalization of the model.