Cross-document word matching for segmentation and retrieval of Ottoman divans

Duygulu, P.; Arifoglu, D.; Kalpakli, M.

Cross-document word matching for segmentation and retrieval of Ottoman divans

dc.citation.epage	663	en_US
dc.citation.issueNumber	3	en_US
dc.citation.spage	647	en_US
dc.citation.volumeNumber	19	en_US
dc.contributor.author	Duygulu, P.	en_US
dc.contributor.author	Arifoglu, D.	en_US
dc.contributor.author	Kalpakli, M.	en_US
dc.date.accessioned	2018-04-12T10:57:38Z
dc.date.available	2018-04-12T10:57:38Z
dc.date.issued	2016	en_US
dc.department	Department of Computer Engineering	en_US
dc.department	Department of History	en_US
dc.description.abstract	Motivated by the need for the automatic indexing and analysis of huge number of documents in Ottoman divan poetry, and for discovering new knowledge to preserve and make alive this heritage, in this study we propose a novel method for segmenting and retrieving words in Ottoman divans. Documents in Ottoman are difficult to segment into words without a prior knowledge of the word. In this study, using the idea that divans have multiple copies (versions) by different writers in different writing styles, and word segmentation in some of those versions may be relatively easier to achieve than in other versions, segmentation of the versions (which are difficult, if not impossible, with traditional techniques) is performed using information carried from the simpler version. One version of a document is used as the source dataset and the other version of the same document is used as the target dataset. Words in the source dataset are automatically extracted and used as queries to be spotted in the target dataset for detecting word boundaries. We present the idea of cross-document word matching for a novel task of segmenting historical documents into words. We propose a matching scheme based on possible combinations of sequence of sub-words. We improve the performance of simple features through considering the words in a context. The method is applied on two versions of Layla and Majnun divan by Fuzuli. The results show that, the proposed word-matching-based segmentation method is promising in finding the word boundaries and in retrieving the words across documents. © 2014, Springer-Verlag London.	en_US
dc.identifier.doi	10.1007/s10044-014-0420-8	en_US
dc.identifier.issn	1433-7541	en_US
dc.identifier.uri	http://hdl.handle.net/11693/36930	en_US
dc.language.iso	English	en_US
dc.publisher	Springer U K	en_US
dc.relation.isversionof	http://dx.doi.org/10.1007/s10044-014-0420-8	en_US
dc.source.title	Pattern Analysis and Applications	en_US
dc.subject	Historical documents	en_US
dc.subject	Matching	en_US
dc.subject	Ottoman divans	en_US
dc.subject	Retrieval	en_US
dc.subject	Segmentation	en_US
dc.title	Cross-document word matching for segmentation and retrieval of Ottoman divans	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Cross-document word matching for segmentation and retrieval.pdf
Size:: 5 MB
Format:: Adobe Portable Document Format
Description:: Full Printable Version

Download

Collections

Scholarly Publications - Computer Engineering
Scholarly Publications - History