Browsing by Author "Duygulu, P."
Now showing 1 - 17 of 17
- Results Per Page
- Sort Options
Item Open Access Automated discrimination of psychotropic drugs in mice via computer vision-based analysis(Elsevier BV, 2009) Yucel, Z.; Sara, Y.; Duygulu, P.; Onur, R.; Esen E.; Ozguler, A. B.We developed an inexpensive computer vision-based method utilizing an algorithm which differentiates drug-induced behavioral alterations. The mice were observed in an open-field arena and their activity was recorded for 100 min. For each animal the first 50 min of observation were regarded as the drug-free period. Each animal was exposed to only one drug and they were injected (i.p.) with either amphetamine or cocaine as the stimulant drugs or morphine or diazepam as the inhibitory agents. The software divided the arena into virtual grids and calculated the number of visits (sojourn counts) to the grids and instantaneous speeds within these grids by analyzing video data. These spatial distributions of sojourn counts and instantaneous speeds were used to construct feature vectors which were fed to the classifier algorithms for the final step of matching the animals and the drugs. The software decided which of the animals were drug-treated at a rate of 96%. The algorithm achieved 92% accuracy in sorting the data according to the increased or decreased activity and then determined which drug was delivered. The method differentiated the type of psychostimulant or inhibitory drugs with a success ratio of 70% and 80%, respectively. This method provides a new way to automatically evaluate and classify drug-induced behaviors in mice. Crown Copyright © 2009.Item Open Access Automatic categorization of Ottoman poems(De Gruyter Akademie Forschung, 2014) Can, E. F.; Can, F.; Duygulu, P.; Kalpakli, M.Authorship attribution and identifying time period of literary works are fundamental problems in quantitative analysis of languages. We investigate two fundamentally different machine learning text categorization methods, Support Vector Machines (SVM) and Naïve Bayes (NB), and several style markers in the categorization of Ottoman poems according to their poets and time periods. We use the collected works (divans) of ten different Ottoman poets: two poets from each of the five different hundred-year periods ranging from the 15th to 19 th century. Our experimental evaluation and statistical assessments show that it is possible to obtain highly accurate and reliable classifications and to distinguish the methods and style markers in terms of their effectiveness.Item Open Access Automatic tag expansion using visual similarity for photo sharing websites(Springer New York LLC, 2010) Sevil, S. G.; Kucuktunc, O.; Duygulu, P.; Can, F.In this paper we present an automatic photo tag expansion method designed for photo sharing websites. The purpose of the method is to suggest tags that are relevant to the visual content of a given photo at upload time. Both textual and visual cues are used in the process of tag expansion. When a photo is to be uploaded, the system asks for a couple of initial tags from the user. The initial tags are used to retrieve relevant photos together with their tags. These photos are assumed to be potentially content related to the uploaded target photo. The tag sets of the relevant photos are used to form the candidate tag list, and visual similarities between the target photo and relevant photos are used to give weights to these candidate tags. Tags with the highest weights are suggested to the user. The method is applied on Flickr (http://www.flickr. com ). Results show that including visual information in the process of photo tagging increases accuracy with respect to text-based methods. © 2009 Springer Science+Business Media, LLC.Item Open Access Cross-document word matching for segmentation and retrieval of Ottoman divans(Springer U K, 2016) Duygulu, P.; Arifoglu, D.; Kalpakli, M.Motivated by the need for the automatic indexing and analysis of huge number of documents in Ottoman divan poetry, and for discovering new knowledge to preserve and make alive this heritage, in this study we propose a novel method for segmenting and retrieving words in Ottoman divans. Documents in Ottoman are difficult to segment into words without a prior knowledge of the word. In this study, using the idea that divans have multiple copies (versions) by different writers in different writing styles, and word segmentation in some of those versions may be relatively easier to achieve than in other versions, segmentation of the versions (which are difficult, if not impossible, with traditional techniques) is performed using information carried from the simpler version. One version of a document is used as the source dataset and the other version of the same document is used as the target dataset. Words in the source dataset are automatically extracted and used as queries to be spotted in the target dataset for detecting word boundaries. We present the idea of cross-document word matching for a novel task of segmenting historical documents into words. We propose a matching scheme based on possible combinations of sequence of sub-words. We improve the performance of simple features through considering the words in a context. The method is applied on two versions of Layla and Majnun divan by Fuzuli. The results show that, the proposed word-matching-based segmentation method is promising in finding the word boundaries and in retrieving the words across documents. © 2014, Springer-Verlag London.Item Open Access A graph based approach for naming faces in news photos(I E E E Computer Society, 2006) Ozkan, D.; Duygulu, P.We propose a method to associate names and faces for querying people in large news photo collections. On the assumption that a person's face is likely to appear when his/her name is mentioned in the caption, first all the faces associated with the query name are selected. Among these faces, there could be many faces corresponding to the queried person in different conditions, poses and times, but there could also be other faces corresponding to other people in the caption or some non-face images due to the errors in the face detection method used. However, in most cases, the number of corresponding faces of the queried person will be large, and these faces will be more similar to each other than to others. In this study, we propose a graph based method to find the most similar subset among the set of possible faces associated with the query name, where the most similar subset is likely to correspond to the faces of the queried person. When the similarity of faces are represented in a graph structure, the set of most similar faces will be the densest component in the graph. We represent the similarity of faces using SIFT descriptors. The matching interest points on two faces are decided after the application of two constraints, namely the geometrical constraint and the unique match constraint. The average distance of the matching points are used to construct the similarity graph. The most similar set of faces is then found based on a greedy densest component algorithm. The experiments are performed on thousands of news photographs taken in real life conditions and, therefore, having a large variety of poses, illuminations and expressions. © 2006 IEEE.Item Open Access Hareket geçmişi görüntüsü yöntemi ile Türkçe işaret dilini tanima uygulaması(IEEE, 2016-05) Yalçınkaya, Özge; Atvar, A.; Duygulu, P.İşitme ve konuşma engelli bireylerin toplum içerisinde diger bireylerle sağlıklı şekilde iletişim kurabilmeleri açısından işaret dili çok önemli bir role sahiptir. Ne yazık ki işaret dilinin toplumda sadece duyarlı insanlar tarafından bilindiği ve bu sayının da azlıgı dikkat çekmektedir. Yaptığımız çalışma kapsamındaki amaç, geliştirdiğimiz sistem sayesinde işitme veya konuşma engeli mevcut olan bireylerin diğer bireylerle olan iletişiminde iyileşme sağlamaktır. Bu amaç doğrultusunda kameradan alınan işaret diline ait hareket bilgisi tanınabilmekte ve o hareketin ne anlama geldiği daha önceden eğitilen işaret diline ait hareket bilgileri ile karşılaştırılarak bulunabilmektedir. Hareket bilgilerinin kameradan alınan görüntülerden çıkarılması aşamasında "Hareket Geçmişi Görüntüsü" yöntemi kullanılmıştır. Bu bağlamdaki sınıflandırma işlemi için de "En Yakın Komşuluk" algoritması kullanılmıştır. Sonuç olarak geliştirilen sistem, eğitim kümesini kullanarak işaret dili hareketi için bir metin tahmin etmektedir. Toplamdaki sınıflandırma başarısı %95 olarak hesaplanmıştır.Item Open Access Histogram of oriented rectangles: a new pose descriptor for human action recognition(Elsevier BV, 2009-09-02) İkizler, N.; Duygulu, P.Most of the approaches to human action recognition tend to form complex models which require lots of parameter estimation and computation time. In this study, we show that, human actions can be simply represented by pose without dealing with the complex representation of dynamics. Based on this idea, we propose a novel pose descriptor which we name as Histogram-of-Oriented-Rectangles (HOR) for representing and recognizing human actions in videos. We represent each human pose in an action sequence by oriented rectangular patches extracted over the human silhouette. We then form spatial oriented histograms to represent the distribution of these rectangular patches. We make use of several matching strategies to carry the information from the spatial domain described by the HOR descriptor to temporal domain. These are (i) nearest neighbor classification, which recognizes the actions by matching the descriptors of each frame, (ii) global histogramming, which extends the idea of Motion Energy Image proposed by Bobick and Davis to rectangular patches, (iii) a classifier-based approach using Support Vector Machines, and (iv) adaptation of Dynamic Time Warping on the temporal representation of the HOR descriptor. For the cases when pose descriptor is not sufficiently strong alone, such as to differentiate actions "jogging" and "running", we also incorporate a simple velocity descriptor as a prior to the pose based classification step. We test our system with different configurations and experiment on two commonly used action datasets: the Weizmann dataset and the KTH dataset. Results show that our method is superior to other methods on Weizmann dataset with a perfect accuracy rate of 100%, and is comparable to the other methods on KTH dataset with a very high success rate close to 90%. These results prove that with a simple and compact representation, we can achieve robust recognition of human actions, compared to complex representations. © 2009 Elsevier B.V. All rights reserved.Item Open Access Interesting faces: a graph-based approach for finding people in news(Elsevier, 2010-05) Ozkan, D.; Duygulu, P.In this study, we propose a method for finding people in large news photograph and video collections. Our method exploits the multi-modal nature of these data sets to recognize people and does not require any supervisory input. It first uses the name of the person to populate an initial set of candidate faces. From this set, which is likely to include the faces of other people, it selects the group of most similar faces corresponding to the queried person in a variety of conditions. Our main contribution is to transform the problem of recognizing the faces of the queried person in a set of candidate faces to the problem of finding the highly connected sub-graph (the densest component) in a graph representing the similarities of faces. We also propose a novel technique for finding the similarities of faces by matching interest points extracted from the faces. The proposed method further allows the classification of new faces without needing to re-build the graph. The experiments are performed on two data sets: thousands of news photographs from Yahoo! news and over 200 news videos from TRECVid2004. The results show that the proposed method provides significant improvements over textbased methods. (C) 2009 Elsevier Ltd. All rights reservedItem Open Access A line based pose representation for human action recognition(2013) Baysal, S.; Duygulu, P.In this paper, we utilize a line based pose representation to recognize human actions in videos. We represent the pose in each frame by employing a collection of line-pairs, so that limb and joint movements are better described and the geometrical relationships among the lines forming the human figure are captured. We contribute to the literature by proposing a new method that matches line-pairs of two poses to compute the similarity between them. Moreover, to encapsulate the global motion information of a pose sequence, we introduce line-flow histograms, which are extracted by matching line segments in consecutive frames. Experimental results on Weizmann and KTH datasets emphasize the power of our pose representation, and show the effectiveness of using pose ordering and line-flow histograms together in grasping the nature of an action and distinguishing one from the others. © 2013 Elsevier B.V. All rights reserved.Item Open Access A line-based representation for matching words in historical manuscripts(Elsevier BV, 2011) Can, E. F.; Duygulu, P.In this study, we propose a new method for retrieving and recognizing words in historical documents. We represent word images with a set of line segments. Then we provide a criterion for word matching based on matching the lines. We carry out experiments on a benchmark dataset consisting of manuscripts by George Washington, as well as on Ottoman manuscripts. © 2011 Elsevier B.V. All rights reserved.Item Open Access Matching Islamic patterns in Kufic images(Springer-Verlag London Ltd, 2015) Arifoglu, D.; Sahin, E.; Adiguzel, H.; Duygulu, P.; Kalpakli, M.In this study, we address the problem of matching patterns in Kufic calligraphy images. Being used as a decorative element, Kufic images have been designed in a way that makes it difficult to be read by non-experts. Therefore, available methods for handwriting recognition are not easily applicable to the recognition of Kufic patterns. In this study, we propose two new methods for Kufic pattern matching. The first method approximates the contours of connected components into lines and then utilizes chain code representation. Sequence matching techniques with a penalty for gaps are exploited for handling the variations between different instances of sub-patterns. In the second method, skeletons of connected components are represented as a graph where junction and end points are considered as nodes. Graph isomorphism techniques are then relaxed for partial graph matching. Methods are evaluated over a collection of 270 square Kufic images with 8,941 sub-patterns. Experimental results indicate that, besides retrieval and indexing of known patterns, our method also allows the discovery of new patterns.Item Open Access Multimedia translation for linking visual data to semantics in videos(Springer, 2011-01) Duygulu, P.; Baştan M.The semantic gap problem, which can be referred to as the disconnection between low-level multimedia data and high-level semantics, is an important obstacle to build real-world multimedia systems. The recently developed methods that can use large volumes of loosely labeled data to provide solutions for automatic image annotation stand as promising approaches toward solving this problem. In this paper, we are interested in how some of these methods can be applied to semantic gap problems that appear in other application domains beyond image annotation. Specifically, we introduce new problems that appear in videos, such as the linking of keyframes with speech transcript text and the linking of faces with names. In a common framework, we formulate these problems as the problem of finding missing correspondences between visual and semantic data and apply the multimedia translation method. We evaluate the performance of the multimedia translation method on these problems and compare its performance against other auto-annotation and classifier-based methods. The experiments, carried out on over 300 h of news videos from TRECVid 2004 and TRECVid 2006 corpora, show that the multimedia translation method provides a performance that is comparable to the other auto-annotation methods and superior performance compared to other classifier-based methods. © 2009 Springer-Verlag.Item Open Access A new pose-based representation for recognizing actions from multiple cameras(Academic Press, 2011-02) Pehlivan, S.; Duygulu, P.We address the problem of recognizing actions from arbitrary views for a multi-camera system. We argue that poses are important for understanding human actions and the strength of the pose representation affects the overall performance of the action recognition system. Based on this idea, we present a new view-independent representation for human poses. Assuming that the data is initially provided in the form of volumetric data, the volume of the human body is first divided into a sequence of horizontal layers, and then the intersections of the body segments with each layer are coded with enclosing circles. The circular features in all layers (i) the number of circles, (ii) the area of the outer circle, and (iii) the area of the inner circle are then used to generate a pose descriptor. The pose descriptors of all frames in an action sequence are further combined to generate corresponding motion descriptors. Action recognition is then performed with a simple nearest neighbor classifier. Experiments performed on the benchmark IXMAS multi-view dataset demonstrate that the performance of our method is comparable to the other methods in the literature. © 2010 Elsevier Inc. All rights reserved.Item Open Access Sentioscope: a soccer player tracking system using model field particles(Institute of Electrical and Electronics Engineers, 2016) Baysal, S.; Duygulu, P.Tracking multiple players is crucial to analyze soccer videos in real time. Yet, rapid illumination changes and occlusions among players who look similar from a distance make tracking in soccer very difficult. Particle-filter-based approaches have been utilized for their ability in tracking under occlusion and rapid motions. Unlike the common practice of choosing particles on targets, we introduce the notion of shared particles densely sampled at fixed positions on the model field. We globally evaluate targets' likelihood of being on the model field particles using our combined appearance and motion model. This allows us to encapsulate the interactions among the targets in the state-space model and track players through challenging occlusions. The proposed tracking algorithm is embedded into a real-life soccer player tracking system called Sentioscope. We describe the complete steps of the system and evaluate our approach on large-scale video data gathered from professional soccer league matches. The experimental results show that the proposed algorithm is more successful, compared with the previous methods, in multiple-object tracking with similar appearances and unpredictable motion patterns such as in team sports. © 1991-2012 IEEE.Item Open Access Smart computing for large scale visual data sensing and processing(Elsevier, 2016) Zhang, L.; Duygulu, P.; Zuo, W.; Shan, S.; Hauptmann, A.Item Open Access Translating images to words for recognizing objects in large image and video collections(Springer, 2006) Duygulu, P.; Baştan M.; Forsyth, D.We present a new approach to the object recognition problem, motivated by the recent availability of large annotated image and video collections. This approach considers object recognition as the translation of visual elements to words, similar to the translation of text from one language to another. The visual elements represented in feature space are categorized into a finite set of blobs. The correspondences between the blobs and the words are learned, using a method adapted from Statistical Machine Translation. Once learned, these correspondences can be used to predict words corresponding to particular image regions (region naming), to predict words associated with the entire images (autoannotation), or to associate the speech transcript text with the correct video frames (video alignment). We present our results on the Corel data set which consists of annotated images and on the TRECVID 2004 data set which consists of video frames associated with speech transcript text and manual annotations.Item Open Access What's news, what's not? Associating news videos with words(Springer, 2004) Duygulu, P.; Hauptmann, A.Text retrieval from broadcast news video is unsatisfactory, because a transcript word frequently does not directly 'describe' the shot when it was spoken. Extending the retrieved region to a window around the matching keyword provides better recall, but low precision. We improve on text retrieval using the following approach: First we segment the visual stream into coherent story-like units, using a set of visual news story delimiters. After filtering out clearly irrelevant classes of shots, we are still left with an ambiguity of how words in the transcript relate to the visual content in the remaining shots of the story. Using a limited set of visual features at different semantic levels ranging from color histograms, to faces, cars, and outdoors, an association matrix captures the correlation of these visual features to specific transcript words. This matrix is then refined using an EM approach. Preliminary results show that this approach has the potential to significantly improve retrieval performance from text queries. © Springer-Verlag 2004.