Browsing by Subject "Word retrieval"

Now showing 1 - 2 of 2

Open Access
A line-based representation for matching words in historical manuscripts
(Elsevier BV, 2011) Can, E. F.; Duygulu, P.
In this study, we propose a new method for retrieving and recognizing words in historical documents. We represent word images with a set of line segments. Then we provide a criterion for word matching based on matching the lines. We carry out experiments on a benchmark dataset consisting of manuscripts by George Washington, as well as on Ottoman manuscripts. © 2011 Elsevier B.V. All rights reserved.
Open Access
Osmanlica belgelerde kelime erişimi
(IEEE, 2011-04) Arifoǧlu, Damla; Duygulu, Pınar
Bu çalışmada, Osmanlıca arşivlerinin analizi amacıyla, kelime erişimi problemi iki farklı resim eşleme yöntemi ile çözülmeye çalışılmaktadır. Bu amaçla (1) Dinamik Zaman Bükmesi (DZB) tabanlı kelime eşleme yöntemi [7] ve (2) Şekil İçeriği (shape context) tanımlayıcısı [10] Osmanlıca belgeler üzerinde uyarlanmıştır. Öncelikle, verilen bir Osmanlıca belgedeki tüm alt-kelimeler bulunmuştur. Birinci yöntemde, her alt-kelime grubu için, üst ve alt kelime profili, siyah pikselden beyaz piksele geçiş sayısı ve dikey izdüşüm özniteliklerinden oluşturulmuş 4 parçalı öznitelik vektörü çıkartılmış, bu özniteliklerin birbirine olan uzaklığı DZB algoritmasıyla bulunmuştur. İkinci yöntemde ise, Şekil İçeriği tanımlayıcısı kullanılarak, alt-kelimelerin birbirine olan uzaklıkları hesaplanmıştır. Uygulanan yöntemler, Fuzuli’nin Leyla ve Mecnun divanının 10 sayfasından oluşan bir Osmanlıca veri kümesi üzerinde denenmiştir. In this paper, two image matching methods are adapted to retrieve words in Ottoman documents. The first method is based on Dynamic Time Warping (DTW) method proposed in [7], while the second method is based on the Shape Context descriptor [10]. Firstly, all sub-words in a given Ottoman document are extracted. In the first method, a 4-variant feature vector (upper and lower word profiles, background to ink transition, vertical projection) is calculated for each subword and feature vectors' distance to each other is found by DTW algorithm. In the second method, shape context descriptor is used to calculate the distances of sub-word images. The methods are tested on an Ottoman data set, which consists of 10 pages of Leyla and Mecnun Divan of Fuzuli. © 2011 IEEE.