Show simple item record

dc.contributor.authorDuygulu, P.en_US
dc.contributor.authorArifoglu, D.en_US
dc.contributor.authorKalpakli, M.en_US
dc.date.accessioned2015/07/28en_US
dc.date.accessioned2015-07-28T12:01:49Z
dc.date.available2015-07-28T12:01:49Z
dc.date.issued2014en_US
dc.identifier.citationDuygulu, P., Arifoglu, D., & Kalpakli, M. (2014). Cross-document word matching for segmentation and retrieval of Ottoman divans. Pattern Analysis and Applications, 1-17.en_US
dc.identifier.issn(print)1433-7541en_US
dc.identifier.issn(online)1433-755Xen-US
dc.identifier.urihttp://hdl.handle.net/11693/12537
dc.descriptionCataloged from PDF version of article.en_US
dc.description.abstractMotivated by the need for the automatic indexing and analysis of huge number of documents in Ottoman divan poetry, and for discovering new knowledge to preserve and make alive this heritage, in this study we propose a novel method for segmenting and retrieving words in Ottoman divans. Documents in Ottoman are dif- ficult to segment into words without a prior knowledge of the word. In this study, using the idea that divans have multiple copies (versions) by different writers in different writing styles, and word segmentation in some of those versions may be relatively easier to achieve than in other versions, segmentation of the versions (which are difficult, if not impossible, with traditional techniques) is performed using information carried from the simpler version. One version of a document is used as the source dataset and the other version of the same document is used as the target dataset. Words in the source dataset are automatically extracted and used as queries to be spotted in the target dataset for detecting word boundaries. We present the idea of cross-document word matching for a novel task of segmenting historical documents into words. We propose a matching scheme based on possible combinations of sequence of sub-words. We improve the performance of simple features through considering the words in a context. The method is applied on two versions of Layla and Majnun divan by Fuzuli. The results show that, the proposed word-matching-based segmentation method is promising in finding the word boundaries and in retrieving the words across documents.en_US
dc.language.isoEnglishen_US
dc.source.titlePattern Analysis and Applicationsen_US
dc.relation.isversionofhttp://dx.doi.org/10.1007/s10044-014-0420-8en_US
dc.rightsCopyright © Springer-Verlag London 2014.en_US
dc.subjectSegmentationen_US
dc.subjectRetrievalen_US
dc.subjectMatchingen_US
dc.subjectHistorical Documentsen_US
dc.subjectOttoman Divans 1en_US
dc.titleCross-document word matching for segmentation and retrieval of Ottoman divansen_US
dc.typeArticleen_US
dc.departmentDepartment of Historyen_US
dc.citation.spage1en_US
dc.citation.epage17en_US
dc.identifier.doi10.1007/s10044-014-0420-8en_US
dc.publisherSpringer Londonen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record