Text categorization using syllables and recurrent neural networks

Yar, Ersin

Text categorization using syllables and recurrent neural networks

buir.advisor	Kozat, Süleyman Serdar
dc.contributor.author	Yar, Ersin
dc.date.accessioned	2017-07-25T12:27:12Z
dc.date.available	2017-07-25T12:27:12Z
dc.date.copyright	2017-07
dc.date.issued	2017-07
dc.date.submitted	2017-07-24
dc.description	Cataloged from PDF version of article.	en_US
dc.description	Thesis (M.S.): Bilkent University, Department of Electrical and Electronics Engineering, İhsan Doğramacı Bilkent University, 2017.	en_US
dc.description	Includes bibliographical references (leaves 48-54).	en_US
dc.description.abstract	We investigate multi class categorization of short texts. To this end, in the third chapter, we introduce highly efficient dimensionality reduction techniques suitable for online processing of high dimensional feature vectors generated from freely-worded text. Although text processing and classification are highly important due to many applications such as emotion recognition, advertisement selection, etc., online classification and regression algorithms over text are limited due to need for high dimensional vectors to represent natural text inputs. We overcome such limitations by showing that randomized projections and piecewise linear models can be efficiently leveraged to significantly reduce the computational cost for feature vector extraction from the tweets. We demonstrate our results over tweets collected from a real life case study where the tweets are freely-worded and unstructured. We implement several well-known machine learning algorithms as well as novel regression methods and demonstrate that we can significantly reduce the computational complexity with insignificant change in the classification and regression performance.Furthermore, in the fourth chapter, we introduce a simple and novel technique for short text classification based on LSTM neural networks. Our algorithm obtains two distributed representations for a short text to be used in classification task. We derive one representation by processing vector embeddings corresponding to words consecutively in LSTM structure and taking average of the produced outputs at each time step of the network. We also take average of distributed representations of the words in the short text to obtain the other representation. For classification, weighted combination of both representations are calculated. Moreover, for the first time in literature we propose to use syllables to exploit the sequential nature of the data in a better way. We derive distributed representations of the syllables and feed them to an LSTM network to obtain the distributed representation for the short text. Softmax layer is used to calculate categorical distribution at the end. Classification performance is evaluated in terms of AUC measure. Experiments show that utilizing two distributed representations improves classification performance by 2%. Furthermore, we demonstrate that using distributed representations of syllables in short text categorization also provides performance improvements.	en_US
dc.description.provenance	Submitted by Betül Özen (ozen@bilkent.edu.tr) on 2017-07-25T12:27:12Z No. of bitstreams: 1 10157445.pdf: 602870 bytes, checksum: e73d9904ae823cec967f223fd22cef3d (MD5)	en
dc.description.provenance	Made available in DSpace on 2017-07-25T12:27:12Z (GMT). No. of bitstreams: 1 10157445.pdf: 602870 bytes, checksum: e73d9904ae823cec967f223fd22cef3d (MD5) Previous issue date: 2017-07	en
dc.description.statementofresponsibility	by Ersin Yar.	en_US
dc.embargo.release	2018-07-24
dc.format.extent	xiii, 59 leaves : charts ; 29 cm	en_US
dc.identifier.itemid	B156052
dc.identifier.uri	http://hdl.handle.net/11693/33508
dc.language.iso	English	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Sentiment analysis	en_US
dc.subject	Text categorization	en_US
dc.subject	Distributed representation	en_US
dc.subject	Long short term memory	en_US
dc.subject	Fully connected layer	en_US
dc.title	Text categorization using syllables and recurrent neural networks	en_US
dc.title.alternative	Tekrarlamalı sinir ağları ve heceleri kullanarak metin sınıflandırma	en_US
dc.type	Thesis	en_US
thesis.degree.discipline	Electrical and Electronic Engineering
thesis.degree.grantor	Bilkent University
thesis.degree.level	Master's
thesis.degree.name	MS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 10157445.pdf
Size:: 588.74 KB
Format:: Adobe Portable Document Format
Description:: Full printable version

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Graduate School of Engineering and Science