Content-based retrieval of historical Ottoman documents stored as textual images
Sinop, A. K.
Çetin, A. E.
IEEE Transactions on Image Processing
314 - 325
MetadataShow full item record
Please cite this item using this persistent URLhttp://hdl.handle.net/11693/24313
There is an accelerating demand to access the visual content of documents stored in historical and cultural archives. Availability of electronic imaging tools and effective image processing techniques makes it feasible to process the multimedia data in large databases. In this paper, a framework for content-based retrieval of historical documents in the Ottoman Empire archives is presented. The documents are stored as textual images, which are compressed by constructing a library of symbols occurring in a document, and the symbols in the original image are then replaced with pointers into the codebook to obtain a compressed representation of the image. The features in wavelet and spatial domain based on angular and distance span of shapes are used to extract the symbols. In order to make content-based retrieval in historical archives, a query is specified as a rectangular region in an input image and the same symbol-extraction process is applied to the query region. The queries are processed on the codebook of documents and the query images are identified in the resulting documents using the pointers in textual images. The querying process does not require decompression of images. The new content-based retrieval framework is also applicable to many other document archives using different scripts.
Showing items related by title, author, creator and subject.
Gerek O.N.; Altunbasak, Y. (1997)This paper describes a method for selecting key frames by using a number of parameters extracted from the MPEG video stream. The parameters are directly extracted from the compressed video stream without decompression. A ...
Dönderler M.E.; Ulusoy Ö.; Güdükbay U. (2004)In our earlier work, we proposed an architecture for a Web-based video database management system (VDBMS) providing an integrated support for spatiotemporal and semantic queries. In this paper, we focus on the task of ...
PATIKA : An integrated visual environment for collaborative construction and analysis of cellular pathways Demir, E.; Babur, O.; Dogrusoz, U.; Gursoy, A.; Nisanci, G.; Cetin-Atalay, R.; Ozturk, M. (Oxford University Press, 2002)Motivation: Availability of the sequences of entire genomes shifts the scientific curiosity towards the identification of function of the genomes in large scale as in genome studies. In the near future, data produced about ...