Change of word characteristics in 20th-century Turkish literature: a statistical analysis
Patton, J. M.
Journal of Quantitative Linguistics
167 - 190
Item Usage Stats
MetadataShow full item record
This article provides a century-wide quantitative analysis of the Turkish literature using 40 novels of 40 authors. We divide the century into four eras or quarter-centuries; allocate 10 novels to each era, and partition each novel into equal-sized blocks. Using cross-validation-based discriminant analysis, with the most frequent words as discriminators, we achieve a classification rate with a relatively high accuracy when the novel blocks are classified according to their eras. We show that, by using statistical stylistic methods, the author gender of Turkish texts can be accurately identified. We also study the gender differences regarding the use of most frequent words. Using weighted least squares regression and a sliding window approach we show that as time passes, words, both in terms of tokens (in text) and types (in vocabulary), have become longer. The findings of this work have implications for the historical linguistic analysis of the Turkish language. © 2010 Taylor & Francis.