Content-based retrieval of historical Ottoman documents stored as textual images

buir.contributor.authorUlusoy, Özgür
buir.contributor.authorGüdükbay, Uğur
buir.contributor.authorÇetin, A. Enis
buir.contributor.orcidÇetin, A. Enis|0000-0002-3449-1958
dc.citation.epage325en_US
dc.citation.issueNumber3en_US
dc.citation.spage314en_US
dc.citation.volumeNumber13en_US
dc.contributor.authorŞaykol, E.en_US
dc.contributor.authorSinop, A. K.en_US
dc.contributor.authorGüdükbay, Uğuren_US
dc.contributor.authorUlusoy, Özgüren_US
dc.contributor.authorÇetin, A. Enisen_US
dc.date.accessioned2016-02-08T10:27:26Z
dc.date.available2016-02-08T10:27:26Zen_US
dc.date.issued2004en_US
dc.departmentDepartment of Computer Engineeringen_US
dc.departmentDepartment of Electrical and Electronics Engineeringen_US
dc.description.abstractThere is an accelerating demand to access the visual content of documents stored in historical and cultural archives. Availability of electronic imaging tools and effective image processing techniques makes it feasible to process the multimedia data in large databases. In this paper, a framework for content-based retrieval of historical documents in the Ottoman Empire archives is presented. The documents are stored as textual images, which are compressed by constructing a library of symbols occurring in a document, and the symbols in the original image are then replaced with pointers into the codebook to obtain a compressed representation of the image. The features in wavelet and spatial domain based on angular and distance span of shapes are used to extract the symbols. In order to make content-based retrieval in historical archives, a query is specified as a rectangular region in an input image and the same symbol-extraction process is applied to the query region. The queries are processed on the codebook of documents and the query images are identified in the resulting documents using the pointers in textual images. The querying process does not require decompression of images. The new content-based retrieval framework is also applicable to many other document archives using different scripts.en_US
dc.description.provenanceMade available in DSpace on 2016-02-08T10:27:26Z (GMT). No. of bitstreams: 1 bilkent-research-paper.pdf: 70227 bytes, checksum: 26e812c6f5156f83f0e77b261a471b5a (MD5) Previous issue date: 2004en_US
dc.identifier.doi10.1109/TIP.2003.821114en_US
dc.identifier.issn1057-7149en_US
dc.identifier.issn1941-0042en_US
dc.identifier.urihttp://hdl.handle.net/11693/24313en_US
dc.language.isoEnglishen_US
dc.publisherIEEEen_US
dc.relation.isversionofhttp://dx.doi.org/10.1109/TIP.2003.821114en_US
dc.source.titleIEEE Transactions on Image Processingen_US
dc.subjectAngular and Distance Spanen_US
dc.subjectBinary Wavelet Decompositionen_US
dc.subjectContent-Based Retrievalen_US
dc.subjectHistorical Document Compressionen_US
dc.subjectPartial Symbol-Wise Matchingen_US
dc.subjectDatabase Systemsen_US
dc.subjectFeature Extractionen_US
dc.subjectImage Analysisen_US
dc.subjectImage Compressionen_US
dc.subjectImaging Techniquesen_US
dc.subjectMultimedia Systemsen_US
dc.subjectWavelet Transformsen_US
dc.subjectBinary Wavelet Decompositionen_US
dc.subjectHistorical document compressionen_US
dc.subjectPartial Symbol wise matchingen_US
dc.subjectTextual Image Compressionen_US
dc.subjectContent Based Retrievalen_US
dc.subjectAlgorithmen_US
dc.subjectArcheologyen_US
dc.subjectArten_US
dc.subjectArticleen_US
dc.subjectAutomated Pattern Recognitionen_US
dc.subjectComparative Studyen_US
dc.subjectComputer Assisted Diagnosisen_US
dc.subjectComputer Graphicsen_US
dc.subjectComputer Interfaceen_US
dc.subjectComputer Programen_US
dc.subjectCultural Anthropologyen_US
dc.subjectDatabaseen_US
dc.subjectDocumentationen_US
dc.subjectEvaluationen_US
dc.subjectFactual Databaseen_US
dc.subjectHypermediaen_US
dc.subjectImage Enhancementen_US
dc.subjectInformation Centeren_US
dc.subjectInformation Disseminationen_US
dc.subjectInformation Processingen_US
dc.subjectInterneten_US
dc.subjectMethodologyen_US
dc.subjectNatural Language Processingen_US
dc.subjectReproducibilityen_US
dc.subjectSensitivity and Specificityen_US
dc.subjectSignal Processingen_US
dc.subjectValidation Studyen_US
dc.subjectAbstracting and Indexingen_US
dc.subjectAlgorithmsen_US
dc.subjectArchaeologyen_US
dc.subjectArchivesen_US
dc.subjectArten_US
dc.subjectAutomatic Data Processingen_US
dc.subjectComputer Graphicsen_US
dc.subjectCultureen_US
dc.subjectData Compressionen_US
dc.subjectDatabase Management Systemsen_US
dc.subjectDatabases, Factualen_US
dc.subjectHypermediaen_US
dc.subjectImage Enhancementen_US
dc.subjectImage Interpretation, Computer-Assisteden_US
dc.subjectInformation Disseminationen_US
dc.subjectInterneten_US
dc.subjectNatural Language Processingen_US
dc.subjectPattern Recognition, Automateden_US
dc.subjectReproducibility of Resultsen_US
dc.subjectSensitivity and Specificityen_US
dc.subjectSignal Processing, Computer-Assisteden_US
dc.subjectSoftwareen_US
dc.subjectUser-Computer Interfaceen_US
dc.titleContent-based retrieval of historical Ottoman documents stored as textual imagesen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Content-based retrieval of historical Ottoman documents stored as textual images.pdf
Size:
844.89 KB
Format:
Adobe Portable Document Format
Description:
Full printable version