Browsing by Author "Can, Ethem F."

Now showing 1 - 4 of 4

Open Access
Automatic categorization of ottoman literary texts by poet and time period
(Springer, London, 2012) Can, Ethem F.; Can, Fazlı; Duygulu, Pınar; Kalpaklı, Mehmet
Millions of manuscripts and printed texts are available in the Ottoman language. The automatic categorization of Ottoman texts would make these documents much more accessible in various applications ranging from historical investigations to literary analyses. In this work, we use transcribed version of Ottoman literary texts in the Latin alphabet and show that it is possible to develop effective Automatic Text Categorization techniques that can be applied to the Ottoman language. For this purpose, we use two fundamentally different machine learning methods: Naïve Bayes and Support Vector Machines, and employ four style markers: most frequent words, token lengths, two-word collocations, and type lengths. In the experiments, we use the collected works (divans) of ten different poets: two poets from five different hundred-year periods ranging from the 15th to 19th century. The experimental results show that it is possible to obtain highly accurate classifications in terms of poet and time period. By using statistical analysis we are able to recommend which style marker and machine learning method are to be used in future studies. © 2012 Springer-Verlag London Limited.
Open Access
El yazısı belgelerde kelime tabanlı arama
(IEEE, 2008-04) Can, Ethem F.; Duygulu, Pınar
Bu çalışmada el yazısı belgelerde arama yapabilmek için yeni yöntemler önerilmiştir. Bu çalışmadaki en temel varsayım ve yola çıkış noktası; her bir kelimenin resim gibi ele alınabileceği ve dolayısıyla resim arama teknikleri ile sorgulama yapılabileceğidir. Özel olarak resim üzerindeki kenar noktalarının eğimlerinin yönlerinin dağılımı ve korelasyon katsayısı tabanlı iki yöntem önerilmiş, ayrıca bu iki yöntemin nasıl birleştirilebileceği anlatılmıştır. Deneyler George Washington'un el yazmaları veri kümesi üzerinde yapılmıştır. We present new methods to retrieve words in historical handwritten documents. With the assumption that the words can be seen as images, we used the word spotting idea and search for the words in the documents using image retrieval techniques. Specifically, we proposed two methods, one based on the histogram of gradient orientations and one based on the correlation coefficient. We also proposed a new method by combining these two methods. In the experiments the data set consisting of George Washington's handwritings is used. ©2008 IEEE.
Open Access
Redif extraction in handwritten Ottoman literary texts
(IEEE, 2010) Can, Ethem F.; Duygulu, Pınar; Can, Fazlı; Kalpaklı, Mehmet
Repeated patterns, rhymes and redifs, are among the fundamental building blocks of Ottoman Divan poetry. They provide integrity of a poem by connecting its parts and bring a melody to its voice. In Ottoman literature, poets wrote their works by making use of the rhymes and redifs of previous poems according to the nazire (creative imitation) tradition either to prove their expertise or to show respect towards old masters. Automatic recognition of redifs would provide important data mining opportunities in literary analyses of Ottoman poetry where the majority of it is in handwritten form. In this study, we propose a matching criterion and method, Redif Extraction using Contour Segments (RECS) using the proposed matching criterion, that detects redifs in handwritten Ottoman literary texts using only visual analysis. Our method provides a success rate of 0.682 in a test collection of 100 poems. © 2010 IEEE.
Open Access
Translation relationship quantification: A cluster-based approach and its application to Shakespeare's sonnets
(Springer, Dordrecht, 2010) Can, Fazlı; Can, Ethem F.; Karbeyaz, Ceyhun
We introduce a method for quantifying translation relation-ship between source and target texts.In this method, we partition source and target texts into corresponding blocks and cluster them separately using word phrases extracted by a suffx tree approach. We quantify the translation relationship by examining the similarity between source and target clustering structures. In this comparison we aim to observe that their similarity is meaningful, i.e., it is significantly different from random. The method is based on the hypothesis that similarities and dis-similarities among the source blocks will not be lost in translation and reappear among target blocks. For testing we use Shakespeare's sonnets and its translation in Turkish. The results show that our method suc-cessfully quantifies translation relationships. © 2011 Springer Science+Business Media B.V.