Sarı, Can Taylan2016-07-012016-07-012014http://hdl.handle.net/11693/30000Cataloged from PDF version of article.Fast augmentation of large text collections in digital world makes inevitable to automatically extract short descriptions of those texts. Even if a lot of studies have been done on detecting hidden topics in text corpora, almost all models follow the bag-of-words assumption. This study presents a new unsupervised learning method that reveals topics in a text corpora and the topic distribution of each text in the corpora. The texts in the corpora are described by a generative graphical model, in which each sentence is generated by a single topic and the topics of consecutive sentences follow a hidden Markov chain. In contrast to bagof-words paradigm, the model assumes each sentence as a unit block and builds on a memory of topics slowly changing in a meaningful way as the text flows. The results are evaluated both qualitatively by examining topic keywords from particular text collections and quantitatively by means of perplexity, a measure of generalization of the model.ix, 67 leaves, tables, graphicsEnglishinfo:eu-repo/semantics/openAccessprobabilistic graphical modeltopic modelhidden Markov modelMarkov chain Monte CarloQA279 .S27 2014Graphical modeling (Statistics)Sentence based topic modelingThesisB138031