Historical document analysis based on word matching

buir.advisorDuygulu, Pınar
dc.contributor.authorArifoğlu, Damla
dc.date.accessioned2016-01-08T18:21:35Z
dc.date.available2016-01-08T18:21:35Z
dc.date.issued2011
dc.descriptionAnkara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2011.en_US
dc.descriptionThesis (Master's) -- Bilkent University, 2011.en_US
dc.descriptionIncludes bibliographical references leaves 67-76.en_US
dc.description.abstractHistorical documents constitute a heritage which should be preserved and providing automatic retrieval and indexing scheme for these archives would be beneficial for researchers from several disciplines and countries. Unfortunately, applying ordinary Optical Character Recognition (OCR) techniques on these documents is nearly impossible, since these documents are degraded and deformed. Recently, word matching methods are proposed to access these documents. In this thesis, two historical document analysis problems, word segmentation in historical documents and Islamic pattern matching in kufic images are tackled based on word matching. In the first task, a cross document word matching based approach is proposed to segment historical documents into words. A version of a document, in which word segmentation is easy, is used as a source data set and another version in a different writing style, which is more difficult to segment into words, is used as a target data set. The source data set is segmented into words by a simple method and extracted words are used as queries to be spotted in the target data set. Experiments on an Ottoman data set show that cross document word matching is a promising method to segment historical documents into words. In the second task, firstly lines are extracted and sub-patterns are automatically detected in the images. Then sub-patterns are matched based on a line representation in two ways: by their chain code representation and by their shape contexts. Promising results are obtained for finding the instances of a query pattern and for fully automatic detection of repeating patterns on a square kufic image collection.en_US
dc.description.provenanceMade available in DSpace on 2016-01-08T18:21:35Z (GMT). No. of bitstreams: 1 0006348.pdf: 34744303 bytes, checksum: 407b23cf4bea086b24c8fb5112dbca5e (MD5)en
dc.description.statementofresponsibilityArifoğlu, Damlaen_US
dc.format.extentxv, 76 leavesen_US
dc.identifier.itemidB128919
dc.identifier.urihttp://hdl.handle.net/11693/15625
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectHistorical Manuscriptsen_US
dc.subjectOttoman Documentsen_US
dc.subjectWord Image Matchingen_US
dc.subjectWord Spottingen_US
dc.subjectWord Segmentationen_US
dc.subjectIslamic Pattern Matchingen_US
dc.subject.lccQA76.9.D33 A75 2011en_US
dc.subject.lcshData compression (Computer science)en_US
dc.subject.lcshInformation retrieval.en_US
dc.subject.lcshArchives--Data processing.en_US
dc.subject.lcshInformation storage and retrieval systems.en_US
dc.titleHistorical document analysis based on word matchingen_US
dc.typeThesisen_US
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
0006348.pdf
Size:
33.13 MB
Format:
Adobe Portable Document Format