Sentence based topic modeling

buir.advisorUlusoy, Özgür
dc.contributor.authorSarı, Can Taylan
dc.date.accessioned2016-07-01T11:10:22Z
dc.date.available2016-07-01T11:10:22Z
dc.date.issued2014
dc.descriptionCataloged from PDF version of article.en_US
dc.description.abstractFast augmentation of large text collections in digital world makes inevitable to automatically extract short descriptions of those texts. Even if a lot of studies have been done on detecting hidden topics in text corpora, almost all models follow the bag-of-words assumption. This study presents a new unsupervised learning method that reveals topics in a text corpora and the topic distribution of each text in the corpora. The texts in the corpora are described by a generative graphical model, in which each sentence is generated by a single topic and the topics of consecutive sentences follow a hidden Markov chain. In contrast to bagof-words paradigm, the model assumes each sentence as a unit block and builds on a memory of topics slowly changing in a meaningful way as the text flows. The results are evaluated both qualitatively by examining topic keywords from particular text collections and quantitatively by means of perplexity, a measure of generalization of the model.en_US
dc.description.provenanceMade available in DSpace on 2016-07-01T11:10:22Z (GMT). No. of bitstreams: 1 0006635.pdf: 766827 bytes, checksum: 463a8876500e99c2e96eb74540f28bf5 (MD5) Previous issue date: 2014en
dc.description.statementofresponsibilitySarı, Can Taylanen_US
dc.format.extentix, 67 leaves, tables, graphicsen_US
dc.identifier.itemidB138031
dc.identifier.urihttp://hdl.handle.net/11693/30000
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectprobabilistic graphical modelen_US
dc.subjecttopic modelen_US
dc.subjecthidden Markov modelen_US
dc.subjectMarkov chain Monte Carloen_US
dc.subject.lccQA279 .S27 2014en_US
dc.subject.lcshGraphical modeling (Statistics)en_US
dc.titleSentence based topic modelingen_US
dc.typeThesisen_US
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
0006635.pdf
Size:
748.85 KB
Format:
Adobe Portable Document Format
Description:
Full printable version