Content-based retrieval of historical Ottoman documents stored as textual images

Şaykol, E.; Sinop, A. K.; Güdükbay, Uğur; Ulusoy, Özgür; Çetin, A. Enis

Content-based retrieval of historical Ottoman documents stored as textual images

buir.contributor.author	Ulusoy, Özgür
buir.contributor.author	Güdükbay, Uğur
buir.contributor.author	Çetin, A. Enis
buir.contributor.orcid	Çetin, A. Enis\|0000-0002-3449-1958
dc.citation.epage	325	en_US
dc.citation.issueNumber	3	en_US
dc.citation.spage	314	en_US
dc.citation.volumeNumber	13	en_US
dc.contributor.author	Şaykol, E.	en_US
dc.contributor.author	Sinop, A. K.	en_US
dc.contributor.author	Güdükbay, Uğur	en_US
dc.contributor.author	Ulusoy, Özgür	en_US
dc.contributor.author	Çetin, A. Enis	en_US
dc.date.accessioned	2016-02-08T10:27:26Z
dc.date.available	2016-02-08T10:27:26Z	en_US
dc.date.issued	2004	en_US
dc.department	Department of Computer Engineering	en_US
dc.department	Department of Electrical and Electronics Engineering	en_US
dc.description.abstract	There is an accelerating demand to access the visual content of documents stored in historical and cultural archives. Availability of electronic imaging tools and effective image processing techniques makes it feasible to process the multimedia data in large databases. In this paper, a framework for content-based retrieval of historical documents in the Ottoman Empire archives is presented. The documents are stored as textual images, which are compressed by constructing a library of symbols occurring in a document, and the symbols in the original image are then replaced with pointers into the codebook to obtain a compressed representation of the image. The features in wavelet and spatial domain based on angular and distance span of shapes are used to extract the symbols. In order to make content-based retrieval in historical archives, a query is specified as a rectangular region in an input image and the same symbol-extraction process is applied to the query region. The queries are processed on the codebook of documents and the query images are identified in the resulting documents using the pointers in textual images. The querying process does not require decompression of images. The new content-based retrieval framework is also applicable to many other document archives using different scripts.	en_US
dc.identifier.doi	10.1109/TIP.2003.821114	en_US
dc.identifier.issn	1057-7149	en_US
dc.identifier.issn	1941-0042	en_US
dc.identifier.uri	http://hdl.handle.net/11693/24313	en_US
dc.language.iso	English	en_US
dc.publisher	IEEE	en_US
dc.relation.isversionof	http://dx.doi.org/10.1109/TIP.2003.821114	en_US
dc.source.title	IEEE Transactions on Image Processing	en_US
dc.subject	Angular and Distance Span	en_US
dc.subject	Binary Wavelet Decomposition	en_US
dc.subject	Content-Based Retrieval	en_US
dc.subject	Historical Document Compression	en_US
dc.subject	Partial Symbol-Wise Matching	en_US
dc.subject	Database Systems	en_US
dc.subject	Feature Extraction	en_US
dc.subject	Image Analysis	en_US
dc.subject	Image Compression	en_US
dc.subject	Imaging Techniques	en_US
dc.subject	Multimedia Systems	en_US
dc.subject	Wavelet Transforms	en_US
dc.subject	Binary Wavelet Decomposition	en_US
dc.subject	Historical document compression	en_US
dc.subject	Partial Symbol wise matching	en_US
dc.subject	Textual Image Compression	en_US
dc.subject	Content Based Retrieval	en_US
dc.subject	Algorithm	en_US
dc.subject	Archeology	en_US
dc.subject	Art	en_US
dc.subject	Article	en_US
dc.subject	Automated Pattern Recognition	en_US
dc.subject	Comparative Study	en_US
dc.subject	Computer Assisted Diagnosis	en_US
dc.subject	Computer Graphics	en_US
dc.subject	Computer Interface	en_US
dc.subject	Computer Program	en_US
dc.subject	Cultural Anthropology	en_US
dc.subject	Database	en_US
dc.subject	Documentation	en_US
dc.subject	Evaluation	en_US
dc.subject	Factual Database	en_US
dc.subject	Hypermedia	en_US
dc.subject	Image Enhancement	en_US
dc.subject	Information Center	en_US
dc.subject	Information Dissemination	en_US
dc.subject	Information Processing	en_US
dc.subject	Internet	en_US
dc.subject	Methodology	en_US
dc.subject	Natural Language Processing	en_US
dc.subject	Reproducibility	en_US
dc.subject	Sensitivity and Specificity	en_US
dc.subject	Signal Processing	en_US
dc.subject	Validation Study	en_US
dc.subject	Abstracting and Indexing	en_US
dc.subject	Algorithms	en_US
dc.subject	Archaeology	en_US
dc.subject	Archives	en_US
dc.subject	Art	en_US
dc.subject	Automatic Data Processing	en_US
dc.subject	Computer Graphics	en_US
dc.subject	Culture	en_US
dc.subject	Data Compression	en_US
dc.subject	Database Management Systems	en_US
dc.subject	Databases, Factual	en_US
dc.subject	Hypermedia	en_US
dc.subject	Image Enhancement	en_US
dc.subject	Image Interpretation, Computer-Assisted	en_US
dc.subject	Information Dissemination	en_US
dc.subject	Internet	en_US
dc.subject	Natural Language Processing	en_US
dc.subject	Pattern Recognition, Automated	en_US
dc.subject	Reproducibility of Results	en_US
dc.subject	Sensitivity and Specificity	en_US
dc.subject	Signal Processing, Computer-Assisted	en_US
dc.subject	Software	en_US
dc.subject	User-Computer Interface	en_US
dc.title	Content-based retrieval of historical Ottoman documents stored as textual images	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Content-based retrieval of historical Ottoman documents stored as textual images.pdf
Size:: 844.89 KB
Format:: Adobe Portable Document Format
Description:: Full printable version

Download

Collections

Scholarly Publications - Computer Engineering
Scholarly Publications - Electrical and Electronics Engineering