Content-based retrieval of historical Ottoman documents stored as textual images

There is an accelerating demand to access the visual content of documents stored in historical and cultural archives. Availability of electronic imaging tools and effective image processing techniques makes it feasible to process the multimedia data in large databases. In this paper, a framework for content-based retrieval of historical documents in the Ottoman Empire archives is presented. The documents are stored as textual images, which are compressed by constructing a library of symbols occurring in a document, and the symbols in the original image are then replaced with pointers into the codebook to obtain a compressed representation of the image. The features in wavelet and spatial domain based on angular and distance span of shapes are used to extract the symbols. In order to make content-based retrieval in historical archives, a query is specified as a rectangular region in an input image and the same symbol-extraction process is applied to the query region. The queries are processed on the codebook of documents and the query images are identified in the resulting documents using the pointers in textual images. The querying process does not require decompression of images. The new content-based retrieval framework is also applicable to many other document archives using different scripts.

Source Title

IEEE Transactions on Image Processing

Publisher

IEEE

Keywords

Angular and Distance Span, Binary Wavelet Decomposition, Content-Based Retrieval, Historical Document Compression, Partial Symbol-Wise Matching, Database Systems, Feature Extraction, Image Analysis, Image Compression, Imaging Techniques, Multimedia Systems, Wavelet Transforms, Binary Wavelet Decomposition, Historical document compression, Partial Symbol wise matching, Textual Image Compression, Content Based Retrieval, Algorithm, Archeology, Art, Article, Automated Pattern Recognition, Comparative Study, Computer Assisted Diagnosis, Computer Graphics, Computer Interface, Computer Program, Cultural Anthropology, Database, Documentation, Evaluation, Factual Database, Hypermedia, Image Enhancement, Information Center, Information Dissemination, Information Processing, Internet, Methodology, Natural Language Processing, Reproducibility, Sensitivity and Specificity, Signal Processing, Validation Study, Abstracting and Indexing, Algorithms, Archaeology, Archives, Art, Automatic Data Processing, Computer Graphics, Culture, Data Compression, Database Management Systems, Databases, Factual, Hypermedia, Image Enhancement, Image Interpretation, Computer-Assisted, Information Dissemination, Internet, Natural Language Processing, Pattern Recognition, Automated, Reproducibility of Results, Sensitivity and Specificity, Signal Processing, Computer-Assisted, Software, User-Computer Interface

Permalink

http://hdl.handle.net/11693/24313

Published Version (Please cite this version)

http://dx.doi.org/10.1109/TIP.2003.821114

Collections

Scholarly Publications - Computer Engineering
Scholarly Publications - Electrical and Electronics Engineering

Language

English

Type

Article

Full item page

Content-based retrieval of historical Ottoman documents stored as textual images

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Content-based retrieval of historical Ottoman documents stored as textual images

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type