Matching ottoman words: an image retrieval approach to historical document indexing

Ataer, Esra; Duygulu, Pınar

Matching ottoman words: an image retrieval approach to historical document indexing

Files

Matching ottoman words An image retrieval approach to historical document indexing.pdf (612.68 KB)

Date

2007-07

Authors

Ataer, Esra

Duygulu, Pınar

BUIR Usage Stats

2
views

38
downloads

Citation Stats

Attention Stats

Abstract

Large archives of Ottoman documents are challenging to many historians all over the world. However, these archives remain inaccessible since manual transcription of such a huge volume is difficult. Automatic transcription is required, but due to the characteristics of Ottoman documents, character recognition based systems may not yield satisfactory results. It is also desirable to store the documents in image form since the documents may contain important drawings, especially the signatures. Due to these reasons, in this study we treat the problem as an image retrieval problem with the view that Ottoman words are images, and we propose a solution based on image matching techniques. The bag-of-visterms approach, which is shown to be successful to classify objects and scenes, is adapted for matching word images. Each word image is represented by a set of visual terms which are obtained by vector quantization of SIFT descriptors extracted from salient points. Similar words are then matched based on the similarity of the distributions of the visual terms. The experiments are carried out on printed and handwritten documents which included over 10,000 words. The results show that, the proposed system is able to retrieve words with high accuracies, and capture the semantic similarities between words. Copyright 2007 ACM.

Source Title

Proceedings of the 6th ACM International Conference on Image and Video Retrieval, CIVR 2007

Publisher

ACM

Keywords

Bag-of-features, Indexing, Word-image matching, Character recognition equipment, Historic preservation, Image matching, Indexing (of information), Semantics, Vector quantization, Automatic transcription, Historical document indexing, Manual transcription, Ottoman documents, Recognition based systems, Image retrieval

Permalink

http://hdl.handle.net/11693/26913

Published Version (Please cite this version)

http://dx.doi.org/10.1145/1282280.1282332

Collections

Scholarly Publications - Computer Engineering

Language

English

Type

Conference Paper

Full item page

Matching ottoman words: an image retrieval approach to historical document indexing

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Attention Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Matching ottoman words: an image retrieval approach to historical document indexing

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Attention Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type