Browsing by Subject "Text mining"
Now showing 1 - 6 of 6
Results Per Page
Sort Options
Item Open Access Aspect based opinion mining on Turkish tweets(Bilkent University, 2012) Akbaş, EsraUnderstanding opinions about entities or brands is instrumental in reputation management and decision making. With the advent of social media, more people are willing to publicly share their recommendations and opinions. As the type and amount of such venues increase, automated analysis of sentiment on textual resources has become an essential data mining task. Sentiment classification aims to identify the polarity of sentiment in text. The polarity is predicted on either a binary (positive, negative) or a multi-variant scale as the strength of sentiment expressed. Text often contains a mix of positive and negative sentiments, hence it is often necessary to detect both simultaneously. While classifying text based on sentiment polarity is a major task, analyzing sentiments separately for each aspect can be more useful in many applications. In this thesis, we investigate the problem of mining opinions by extracting aspects of entities/topics on collection of short texts. We focus on Turkish tweets that contain informal short messages. Most of the available resources such as lexicons and labeled corpus in the literature of opinion mining are for the English language. Our approach would help enhance the sentiment analyses to other languages where such rich sources do not exist. After a set of preprocessing steps, we extract the aspects of the product(s) from the data and group the tweets based on the extracted aspects. In addition to our manually constructed Turkish opinion word list, an automated generation of the words with their sentiment strengths is proposed using a word selection algorithm. Then, we propose a new representation of the text according to sentiment strength of the words, which we refer to as sentiment based text representation. The feature vectors of the text are constructed according to this new representation. We adapt machine learning methods to generate classifiers based on the multi-variant scale feature vectors to detect mixture of positive and negative sentiments and to test their performance on Turkish tweets.Item Open Access Çağrı merkezi metin madenciliği yaklaşımı(IEEE, 2017-05) Yiğit, İ. O.; Ateş, A. F.; Güvercin, Mehmet; Ferhatosmanoğlu, Hakan; Gedik, BuğraGünümüzde çağrı merkezlerindeki görüşme kayıtlarının sesten metne dönüştürülebilmesi görüşme kaydı metinleri üzerinde metin madenciliği yöntemlerinin uygulanmasını mümkün kılmaktadır. Bu çalışma kapsamında görüşme kaydı metinleri kullanarak görüşmenin içeriğinin duygu yönünden (olumlu/olumsuz) değerlendirilmesi, müşteri memnuniyetinin ve müşteri temsilcisi performansının ölçülmesi amaçlanmaktadır. Yapılan çalışmada görüşme kaydı metinlerinden metin madenciliği yöntemleri ile yeni özellikler çıkarılmıştır. Metinlerden elde edilen özelliklerden yararlanılarak sınıflandırma ve regresyon yöntemleriyle görüşme kayıtlarının içeriklerinin değerlendirilmesini sağlayacak tahmin modelleri oluşturulmuştur. Bu çalışma sonucunda ortaya çıkarılan tahmin modellerinin Türk Telekom bünyesindeki çağrı merkezlerinde kullanılması hedeflenmektedir.Item Open Access Discovering story chains: a framework based on zigzagged search and news actors(John Wiley and Sons Inc., 2017) Toraman C.; Can, F.A story chain is a set of related news articles that reveal how different events are connected. This study presents a framework for discovering story chains, given an input document, in a text collection. The framework has 3 complementary parts that i) scan the collection, ii) measure the similarity between chain-member candidates and the chain, and iii) measure similarity among news articles. For scanning, we apply a novel text-mining method that uses a zigzagged search that reinvestigates past documents based on the updated chain. We also utilize social networks of news actors to reveal connections among news articles. We conduct 2 user studies in terms of 4 effectiveness measures—relevance, coverage, coherence, and ability to disclose relations. The first user study compares several versions of the framework, by varying parameters, to set a guideline for use. The second compares the framework with 3 baselines. The results show that our method provides statistically significant improvement in effectiveness in 61% of pairwise comparisons, with medium or large effect size; in the remainder, none of the baselines significantly outperforms our method. © 2017 ASIS&T.Item Open Access Past, present, and future on news streams: discovering story chains, selecting public front-pages, and filtering microblogs for predicting public reactions to news(Bilkent University, 2017-09) Toraman, ÇağrıNews streams have several research opportunities for the past, present, and future of events. The past hides relations among events and actors; the present re ects needs of news readers; and the future waits to be predicted. The thesis has three studies regarding these time periods: We discover news chains using zigzagged search in the past, select front-page of current news for the public, and lter microblogs for predicting future public reactions to events. In the rst part, given an input document, we develop a framework for discovering story chains in a text collection. A story chain is a set of related news articles that reveal how different events are connected. The framework has three complementary parts that i) scan the collection, ii) measure the similarity between chain-member candidates and the chain, and iii) measure similarity among news articles. For scan- ning, we apply a novel text-mining method that uses a zigzagged search that reinves- tigates past documents based on the updated chain. We also utilize social networks of news actors to reveal connections among news articles. We conduct two user studies in terms of four effectiveness measures: relevance, coverage, coherence, and ability to disclose relations. The rst user study compares several versions of the framework, by varying parameters, to set a guideline for use. The second compares the framework with 3 baselines. The results show that our method provides sta- tistically signi cant improvement in effectiveness in 61% of pairwise comparisons, with medium or large effect size; in the remainder, none of the baselines signi cantly outperforms our method. In the second part, we select news articles for public front pages using raw text, without any meta-attributes such as click counts. Front-page news selection is the task of nding important news articles in news aggregators. A novel algorithm is introduced by jointly considering the importance and diversity of selected news articles and the length of front pages. We estimate the importance of news, based on topic modelling, to provide the required diversity. Then, we select important documents from important topics using a priority-based method that helps in tting news content into the length of the front page. A user study is conducted to measure effectiveness and diversity. Annotation results show that up to 7 of 10 news articles are important, and up to 9 of them are from different topics. Challenges in selecting public front-page news are addressed with an emphasis on future research. In the third part, we lter microblog texts, speci cally tweets, to news events for predicting future public reactions. Microblog environments like Twitter are increas- ingly becoming more important to leverage people's opinion on news events. We create a new collection, called BilPredict-2017 that includes events including terror- ist attacks in Turkey from 2015 to 2017, and also Turkish tweets that are published during these events. We lter tweets by using important keywords, analyze them in terms of several features. Results show that there is a high correlation between time and frequency of tweets. Sentiment and spatial features also re ect the nature of events, thus all of these features can be utilized in predicting the future.Item Open Access Text mining analysis of translation, social communication and literary writing for Turkish(Bilkent University, 2020-12) Çalışkan, SevilText mining is an important research area considering the increase in text generation and the need for analysis. Text mining in Turkish is still not a wellinvested research area, compared to the other languages. In this thesis, we analyze different types of Turkish text from different points of views, having an overall review on text mining in Turkish at the end. First, we analyze the translation quality of a Turkish novel, My Names is Red novel, to English, French, and Spanish with the features generated for each chapter. With the proposed method, translation loyalties to the original text can be quantified without any parallel comparisons. Then, we analyze the Turkish spoken texts of 98 people in different age groups in terms of gender and age attributes of the speakers. We also analyze the difference between written and spoken texts in Turkish. Results show that it is possible to predict the attributes of the speaker from the spoken text and written and spoken texts are significantly different in terms of stylometric measures. Later on, we make an assessment on cross-lingual transferring performances of multilingual networks from English to Turkish. We see that transferring is possible; however zero-shot cross-lingual transferring still has its way to be competitive with monolingual networks for Turkish. Lastly, we conduct a time-based stylometric analysis of Ahmet Hamdi Tanpınar’s works. We see that Ahmet Hamdi Tanpınar shows some differences compared to his contemporaries.Item Open Access Türkçe metinler üzerine yapılan sayısal üslup araştırmalarını inceleyen ve Benim Adım Kırmızı çevirilerinin aslına alan sadakatini ölçen bir çalışma(Türk Kütüphaneciler Derneği, 2018) Çalışkan, Sevil; Can, FazlıBu makalede bilişimin beşerî bilimlerdeki önemli bir uygulaması olan sayısal üslup analizi yönteminin tanıtılması hedeflenmiş ve çevirilerin aslına sadakatini ölçen özgün bir araştırma sunulmuştur. Sayısal üslup analizi, bilgi ve belge yönetiminde çeşitli sınıflama işlemlerini gerçekleştiren ve edebiyat araştırmalarında yakın okuma sırasında görülmesi mümkün olmayan gözlemleri sağlayan yaklaşımlardan oluşmaktadır. Makalede, öncelikle Türkçe metinler üzerinde çalışmak isteyen araştırmacılar için, üslup analizinin Türkçeye nasıl uyarlanacağı anlatılmış ve bu konuda Türkçe metinler üzerinde yapılan çalışmaları inceleyen kapsamlı bir kaynak taraması sunulmuştur. Üslup analizinin uygulama amaçları örneklerle incelenmiş, ön işleme ve öznitelik çıkarımı, sınıflandırma yaklaşımları, başarı düzeyi değerlendirmesi ve yardımcı bilişim araçları konularına yer verilmiştir. Orhan Pamuk’un Benim Adım Kırmızı isimli romanı ve çevirilerindeki üslup uyumuna ilişkin sunulan özgün araştırma, roman kahramanlarının temel bileşenler düzlemindeki dağılımlarını inceleyen yeni bir yaklaşım kullanmaktadır. İstatistiksel olarak kayda değer olan gözlemler yazar üslubunun çevirilerde korunduğunu gösteren niteliktedir.