Historical document analysis based on word matching

Arifoğlu, Damla

Historical document analysis based on word matching

buir.advisor	Duygulu, Pınar
dc.contributor.author	Arifoğlu, Damla
dc.date.accessioned	2016-01-08T18:21:35Z
dc.date.available	2016-01-08T18:21:35Z
dc.date.issued	2011
dc.description	Cataloged from PDF version of article.	en_US
dc.description	Includes bibliographical references leaves 67-76.	en_US
dc.description.abstract	Historical documents constitute a heritage which should be preserved and providing automatic retrieval and indexing scheme for these archives would be beneficial for researchers from several disciplines and countries. Unfortunately, applying ordinary Optical Character Recognition (OCR) techniques on these documents is nearly impossible, since these documents are degraded and deformed. Recently, word matching methods are proposed to access these documents. In this thesis, two historical document analysis problems, word segmentation in historical documents and Islamic pattern matching in kufic images are tackled based on word matching. In the first task, a cross document word matching based approach is proposed to segment historical documents into words. A version of a document, in which word segmentation is easy, is used as a source data set and another version in a different writing style, which is more difficult to segment into words, is used as a target data set. The source data set is segmented into words by a simple method and extracted words are used as queries to be spotted in the target data set. Experiments on an Ottoman data set show that cross document word matching is a promising method to segment historical documents into words. In the second task, firstly lines are extracted and sub-patterns are automatically detected in the images. Then sub-patterns are matched based on a line representation in two ways: by their chain code representation and by their shape contexts. Promising results are obtained for finding the instances of a query pattern and for fully automatic detection of repeating patterns on a square kufic image collection.	en_US
dc.description.statementofresponsibility	Arifoğlu, Damla	en_US
dc.format.extent	xv, 76 leaves	en_US
dc.identifier.itemid	B128919
dc.identifier.uri	http://hdl.handle.net/11693/15625
dc.language.iso	English	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Historical Manuscripts	en_US
dc.subject	Ottoman Documents	en_US
dc.subject	Word Image Matching	en_US
dc.subject	Word Spotting	en_US
dc.subject	Word Segmentation	en_US
dc.subject	Islamic Pattern Matching	en_US
dc.subject.lcc	QA76.9.D33 A75 2011	en_US
dc.subject.lcsh	Data compression (Computer science)	en_US
dc.subject.lcsh	Information retrieval.	en_US
dc.subject.lcsh	Archives--Data processing.	en_US
dc.subject.lcsh	Information storage and retrieval systems.	en_US
dc.title	Historical document analysis based on word matching	en_US
dc.type	Thesis	en_US
thesis.degree.discipline	Computer Engineering
thesis.degree.grantor	Bilkent University
thesis.degree.level	Master's
thesis.degree.name	MS (Master of Science)

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 0006348.pdf
Size:: 33.13 MB
Format:: Adobe Portable Document Format

Download

Collections

Graduate School of Engineering and Science