Multimedia translation for linking visual data to semantics in videos
Machine Vision and Applications
Please cite this item using this persistent URLhttp://hdl.handle.net/11693/22054
The semantic gap problem, which can be referred to as the disconnection between low-level multimedia data and high-level semantics, is an important obstacle to build real-world multimedia systems. The recently developed methods that can use large volumes of loosely labeled data to provide solutions for automatic image annotation stand as promising approaches toward solving this problem. In this paper, we are interested in how some of these methods can be applied to semantic gap problems that appear in other application domains beyond image annotation. Specifically, we introduce new problems that appear in videos, such as the linking of keyframes with speech transcript text and the linking of faces with names. In a common framework, we formulate these problems as the problem of finding missing correspondences between visual and semantic data and apply the multimedia translation method. We evaluate the performance of the multimedia translation method on these problems and compare its performance against other auto-annotation and classifier-based methods. The experiments, carried out on over 300 h of news videos from TRECVid 2004 and TRECVid 2006 corpora, show that the multimedia translation method provides a performance that is comparable to the other auto-annotation methods and superior performance compared to other classifier-based methods. © 2009 Springer-Verlag.
- Research Paper 
Showing items related by title, author, creator and subject.
Aksoy, C.; Bugdayci, A.; Gur, T.; Uysal I.; Can F. (2009)Semantic Role Labeling (SRL) aims to identify the constituents of a sentence, together with their roles with respect to the sentence predicates. In this paper, we introduce and assess the idea of using SRL on generic ...
Semantic similarity between Turkish and European languages using word embeddings [Türkçe ile Avrupa Dilleri Arasindaki Anlamsal Benzerliǧin Kelime Temsilleri ile Gösterimi] Sjenel L.K.; Yucesoy V.; Koc A.; Cukur T. (Institute of Electrical and Electronics Engineers Inc., 2017)Representation of words coming from vocabulary of a language as real vectors in a high dimensional space is called as word embeddings. Word embeddings are proven to be successful in modelling semantic relations between ...
Çavuş Ö.; Aksoy, S. (2008)We describe an annotation and retrieval framework that uses a semantic image representation by contextual modeling of images using occurrence probabilities of concepts and objects. First, images are segmented into regions ...