Browsing by Subject "Historical documents"

Now showing 1 - 3 of 3

Open Access
Cross-document word matching for segmentation and retrieval of Ottoman divans
(Springer U K, 2016) Duygulu, P.; Arifoglu, D.; Kalpakli, M.
Motivated by the need for the automatic indexing and analysis of huge number of documents in Ottoman divan poetry, and for discovering new knowledge to preserve and make alive this heritage, in this study we propose a novel method for segmenting and retrieving words in Ottoman divans. Documents in Ottoman are difficult to segment into words without a prior knowledge of the word. In this study, using the idea that divans have multiple copies (versions) by different writers in different writing styles, and word segmentation in some of those versions may be relatively easier to achieve than in other versions, segmentation of the versions (which are difficult, if not impossible, with traditional techniques) is performed using information carried from the simpler version. One version of a document is used as the source dataset and the other version of the same document is used as the target dataset. Words in the source dataset are automatically extracted and used as queries to be spotted in the target dataset for detecting word boundaries. We present the idea of cross-document word matching for a novel task of segmenting historical documents into words. We propose a matching scheme based on possible combinations of sequence of sub-words. We improve the performance of simple features through considering the words in a context. The method is applied on two versions of Layla and Majnun divan by Fuzuli. The results show that, the proposed word-matching-based segmentation method is promising in finding the word boundaries and in retrieving the words across documents. © 2014, Springer-Verlag London.
Open Access
El yazısı belgelerde kelime tabanlı arama
(IEEE, 2008-04) Can, Ethem F.; Duygulu, Pınar
Bu çalışmada el yazısı belgelerde arama yapabilmek için yeni yöntemler önerilmiştir. Bu çalışmadaki en temel varsayım ve yola çıkış noktası; her bir kelimenin resim gibi ele alınabileceği ve dolayısıyla resim arama teknikleri ile sorgulama yapılabileceğidir. Özel olarak resim üzerindeki kenar noktalarının eğimlerinin yönlerinin dağılımı ve korelasyon katsayısı tabanlı iki yöntem önerilmiş, ayrıca bu iki yöntemin nasıl birleştirilebileceği anlatılmıştır. Deneyler George Washington'un el yazmaları veri kümesi üzerinde yapılmıştır. We present new methods to retrieve words in historical handwritten documents. With the assumption that the words can be seen as images, we used the word spotting idea and search for the words in the documents using image retrieval techniques. Specifically, we proposed two methods, one based on the histogram of gradient orientations and one based on the correlation coefficient. We also proposed a new method by combining these two methods. In the experiments the data set consisting of George Washington's handwritings is used. ©2008 IEEE.
Open Access
Ottoman archives explorer: a retrieval system for digital Ottoman archives
(Association for Computing Machinery, 2009-12) Yalniz, I. Z.; Altingovde, I. S.; Güdükbay, Uğur; Ulusoy, Özgür
This article presents Ottoman Archives Explorer, a Content-Based Retrieval (CBR) system based on character recognition for printed and handwritten historical documents. Several methods for character segmentation and recognition stages are investigated. In particular, sliding-window and histogram segmentation methods are coupled with recognition approaches using spatial features, neural networks, and a graph-based model. The prototype system provides CBR of document images using both example-based queries and a virtual keyboard to construct query words. © 2009 ACM.