Text mining analysis of translation, social communication and literary writing for Turkish
Author(s)
Advisor
Can, FazlıDate
2020-12Publisher
Bilkent University
Language
English
Type
ThesisItem Usage Stats
191
views
views
92
downloads
downloads
Abstract
Text mining is an important research area considering the increase in text
generation and the need for analysis. Text mining in Turkish is still not a wellinvested research area, compared to the other languages. In this thesis, we analyze different types of Turkish text from different points of views, having an
overall review on text mining in Turkish at the end. First, we analyze the translation quality of a Turkish novel, My Names is Red novel, to English, French,
and Spanish with the features generated for each chapter. With the proposed
method, translation loyalties to the original text can be quantified without any
parallel comparisons. Then, we analyze the Turkish spoken texts of 98 people in
different age groups in terms of gender and age attributes of the speakers. We
also analyze the difference between written and spoken texts in Turkish. Results show that it is possible to predict the attributes of the speaker from the
spoken text and written and spoken texts are significantly different in terms of
stylometric measures. Later on, we make an assessment on cross-lingual transferring performances of multilingual networks from English to Turkish. We see that transferring is possible; however zero-shot cross-lingual transferring still has its
way to be competitive with monolingual networks for Turkish. Lastly, we conduct
a time-based stylometric analysis of Ahmet Hamdi Tanpınar’s works. We see that
Ahmet Hamdi Tanpınar shows some differences compared to his contemporaries.
Keywords
Text miningStylometric analysis
Spoken text analysis
Discourse analysis
Cross-lingual learning
Transfer learning
Multi-lingual data