Aspect based opinion mining on Turkish tweets
Please cite this item using this persistent URLhttp://hdl.handle.net/11693/15687
Understanding opinions about entities or brands is instrumental in reputation management and decision making. With the advent of social media, more people are willing to publicly share their recommendations and opinions. As the type and amount of such venues increase, automated analysis of sentiment on textual resources has become an essential data mining task. Sentiment classification aims to identify the polarity of sentiment in text. The polarity is predicted on either a binary (positive, negative) or a multi-variant scale as the strength of sentiment expressed. Text often contains a mix of positive and negative sentiments, hence it is often necessary to detect both simultaneously. While classifying text based on sentiment polarity is a major task, analyzing sentiments separately for each aspect can be more useful in many applications. In this thesis, we investigate the problem of mining opinions by extracting aspects of entities/topics on collection of short texts. We focus on Turkish tweets that contain informal short messages. Most of the available resources such as lexicons and labeled corpus in the literature of opinion mining are for the English language. Our approach would help enhance the sentiment analyses to other languages where such rich sources do not exist. After a set of preprocessing steps, we extract the aspects of the product(s) from the data and group the tweets based on the extracted aspects. In addition to our manually constructed Turkish opinion word list, an automated generation of the words with their sentiment strengths is proposed using a word selection algorithm. Then, we propose a new representation of the text according to sentiment strength of the words, which we refer to as sentiment based text representation. The feature vectors of the text are constructed according to this new representation. We adapt machine learning methods to generate classifiers based on the multi-variant scale feature vectors to detect mixture of positive and negative sentiments and to test their performance on Turkish tweets.