Change of word characteristics in 20th-century Turkish literature: a statistical analysis

Date
2010
Authors
Can, F.
Patton, J. M.
Editor(s)
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
Source Title
Journal of Quantitative Linguistics
Print ISSN
0929-6174
Electronic ISSN
Publisher
Routledge
Volume
17
Issue
3
Pages
167 - 190
Language
English
Journal Title
Journal ISSN
Volume Title
Series
Abstract

This article provides a century-wide quantitative analysis of the Turkish literature using 40 novels of 40 authors. We divide the century into four eras or quarter-centuries; allocate 10 novels to each era, and partition each novel into equal-sized blocks. Using cross-validation-based discriminant analysis, with the most frequent words as discriminators, we achieve a classification rate with a relatively high accuracy when the novel blocks are classified according to their eras. We show that, by using statistical stylistic methods, the author gender of Turkish texts can be accurately identified. We also study the gender differences regarding the use of most frequent words. Using weighted least squares regression and a sliding window approach we show that as time passes, words, both in terms of tokens (in text) and types (in vocabulary), have become longer. The findings of this work have implications for the historical linguistic analysis of the Turkish language. © 2010 Taylor & Francis.

Course
Other identifiers
Book Title
Keywords
Citation
Published Version (Please cite this version)