A Line-based representation for matching words

Can, Ethem Fatih

A Line-based representation for matching words

buir.advisor	Duygulu, Pınar
dc.contributor.author	Can, Ethem Fatih
dc.date.accessioned	2016-01-08T18:10:48Z
dc.date.available	2016-01-08T18:10:48Z
dc.date.issued	2009
dc.description	Cataloged from PDF version of article.	en_US
dc.description	Includes bibliographical references leaves 46-49.	en_US
dc.description.abstract	With the increase of the number of documents available in the digital environment, efficient access to the documents becomes crucial. Manual indexing of the documents is costly; however, and can be carried out only in limited amounts. Therefore, automatic analysis of documents is crucial. Although plenty of effort has been spent on optical character recognition (OCR), most of the existing OCR systems fail to address the challenge of recognizing characters in historical documents on account of the poor quality of old documents, the high level of noise factors, and the variety of scripts. More importantly, OCR systems are usually language dependent and not available for all languages. Word spotting techniques have been recently proposed to access the historical documents with the idea that humans read whole words at a time. In these studies the words rather than the characters are considered as the basic units. Due to the poor quality of historical documents, the representation and matching of words continue to be challenging problems for word spotting. In this study we address these challenges and propose a simple but effective method for the representation of word images by a set of line descriptors. Then, two different matching criteria making use of the line-based representation are proposed. We apply our methods on the word spotting and redif extraction tasks. The proposed line-based representation does not require any specific pre-processing steps, and is applicable to different languages and scripts. In word spotting task, our results provide higher scores than the existing word spotting studies in terms of retrieval and recognition performances. In the redif extraction task, we obtain promising results providing a motivation for further and advanced studies on Ottoman literary texts.	en_US
dc.description.statementofresponsibility	Can, Ethem Fatih	en_US
dc.format.extent	xi, 49 leaves	en_US
dc.identifier.itemid	BILKUTUPB120014
dc.identifier.uri	http://hdl.handle.net/11693/14910
dc.language.iso	English	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Historical Manuscripts	en_US
dc.subject	Ottoman Texts	en_US
dc.subject	Word Image Matching	en_US
dc.subject	Word Retrieval	en_US
dc.subject	Word Spotting	en_US
dc.subject.lcc	QA76.9.D33 C36 2009	en_US
dc.subject.lcsh	Data compression (Computer Science)	en_US
dc.subject.lcsh	Information storage and retrieval systems.	en_US
dc.title	A Line-based representation for matching words	en_US
dc.type	Thesis	en_US
thesis.degree.discipline	Computer Engineering
thesis.degree.grantor	Bilkent University
thesis.degree.level	Master's
thesis.degree.name	MS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 0003869.pdf
Size:: 1.29 MB
Format:: Adobe Portable Document Format

Download

Collections

Graduate School of Engineering and Science