A Line-based representation for matching words

buir.advisorDuygulu, Pınar
dc.contributor.authorCan, Ethem Fatih
dc.date.accessioned2016-01-08T18:10:48Z
dc.date.available2016-01-08T18:10:48Z
dc.date.issued2009
dc.descriptionAnkara : The Department of Computer Engineering and the Institute of Engineering and Science of Bilkent University, 2009.en_US
dc.descriptionThesis (Master's) -- Bilkent University, 2009.en_US
dc.descriptionIncludes bibliographical references leaves 46-49.en_US
dc.description.abstractWith the increase of the number of documents available in the digital environment, efficient access to the documents becomes crucial. Manual indexing of the documents is costly; however, and can be carried out only in limited amounts. Therefore, automatic analysis of documents is crucial. Although plenty of effort has been spent on optical character recognition (OCR), most of the existing OCR systems fail to address the challenge of recognizing characters in historical documents on account of the poor quality of old documents, the high level of noise factors, and the variety of scripts. More importantly, OCR systems are usually language dependent and not available for all languages. Word spotting techniques have been recently proposed to access the historical documents with the idea that humans read whole words at a time. In these studies the words rather than the characters are considered as the basic units. Due to the poor quality of historical documents, the representation and matching of words continue to be challenging problems for word spotting. In this study we address these challenges and propose a simple but effective method for the representation of word images by a set of line descriptors. Then, two different matching criteria making use of the line-based representation are proposed. We apply our methods on the word spotting and redif extraction tasks. The proposed line-based representation does not require any specific pre-processing steps, and is applicable to different languages and scripts. In word spotting task, our results provide higher scores than the existing word spotting studies in terms of retrieval and recognition performances. In the redif extraction task, we obtain promising results providing a motivation for further and advanced studies on Ottoman literary texts.en_US
dc.description.provenanceMade available in DSpace on 2016-01-08T18:10:48Z (GMT). No. of bitstreams: 1 0003869.pdf: 1350913 bytes, checksum: 8a7e696c23525fee927ee8f63d89fa31 (MD5)en
dc.description.statementofresponsibilityCan, Ethem Fatihen_US
dc.format.extentxi, 49 leavesen_US
dc.identifier.itemidBILKUTUPB120014
dc.identifier.urihttp://hdl.handle.net/11693/14910
dc.language.isoEnglishen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectHistorical Manuscriptsen_US
dc.subjectOttoman Textsen_US
dc.subjectWord Image Matchingen_US
dc.subjectWord Retrievalen_US
dc.subjectWord Spottingen_US
dc.subject.lccQA76.9.D33 C36 2009en_US
dc.subject.lcshData compression (Computer Science)en_US
dc.subject.lcshInformation storage and retrieval systems.en_US
dc.titleA Line-based representation for matching wordsen_US
dc.typeThesisen_US
thesis.degree.disciplineComputer Engineering
thesis.degree.grantorBilkent University
thesis.degree.levelMaster's
thesis.degree.nameMS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
0003869.pdf
Size:
1.29 MB
Format:
Adobe Portable Document Format