Matching ottoman words: an image retrieval approach to historical document indexing

dc.citation.epage347en_US
dc.citation.spage341en_US
dc.contributor.authorAtaer, Esraen_US
dc.contributor.authorDuygulu, Pınaren_US
dc.coverage.spatialAmsterdam, The Netherlands
dc.date.accessioned2016-02-08T11:39:25Z
dc.date.available2016-02-08T11:39:25Z
dc.date.issued2007-07en_US
dc.departmentDepartment of Computer Engineeringen_US
dc.descriptionDate of Conference: 09 - 11 July, 2007
dc.descriptionConference name: CIVR '07 Proceedings of the 6th ACM international conference on Image and video retrieval
dc.description.abstractLarge archives of Ottoman documents are challenging to many historians all over the world. However, these archives remain inaccessible since manual transcription of such a huge volume is difficult. Automatic transcription is required, but due to the characteristics of Ottoman documents, character recognition based systems may not yield satisfactory results. It is also desirable to store the documents in image form since the documents may contain important drawings, especially the signatures. Due to these reasons, in this study we treat the problem as an image retrieval problem with the view that Ottoman words are images, and we propose a solution based on image matching techniques. The bag-of-visterms approach, which is shown to be successful to classify objects and scenes, is adapted for matching word images. Each word image is represented by a set of visual terms which are obtained by vector quantization of SIFT descriptors extracted from salient points. Similar words are then matched based on the similarity of the distributions of the visual terms. The experiments are carried out on printed and handwritten documents which included over 10,000 words. The results show that, the proposed system is able to retrieve words with high accuracies, and capture the semantic similarities between words. Copyright 2007 ACM.en_US
dc.identifier.doi10.1145/1282280.1282332en_US
dc.identifier.urihttp://hdl.handle.net/11693/26913
dc.language.isoEnglishen_US
dc.publisherACM
dc.relation.isversionofhttp://dx.doi.org/10.1145/1282280.1282332en_US
dc.source.titleProceedings of the 6th ACM International Conference on Image and Video Retrieval, CIVR 2007en_US
dc.subjectBag-of-featuresen_US
dc.subjectIndexingen_US
dc.subjectWord-image matchingen_US
dc.subjectCharacter recognition equipmenten_US
dc.subjectHistoric preservationen_US
dc.subjectImage matchingen_US
dc.subjectIndexing (of information)en_US
dc.subjectSemanticsen_US
dc.subjectVector quantizationen_US
dc.subjectAutomatic transcriptionen_US
dc.subjectHistorical document indexingen_US
dc.subjectManual transcriptionen_US
dc.subjectOttoman documentsen_US
dc.subjectRecognition based systemsen_US
dc.subjectImage retrievalen_US
dc.titleMatching ottoman words: an image retrieval approach to historical document indexingen_US
dc.typeConference Paperen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Matching ottoman words An image retrieval approach to historical document indexing.pdf
Size:
612.68 KB
Format:
Adobe Portable Document Format
Description:
Full printable version