Multimedia translation for linking visual data to semantics in videos

dc.citation.epage115en_US
dc.citation.issueNumber1en_US
dc.citation.spage99en_US
dc.citation.volumeNumber22en_US
dc.contributor.authorDuygulu, P.en_US
dc.contributor.authorBaştan M.en_US
dc.date.accessioned2016-02-08T09:54:52Z
dc.date.available2016-02-08T09:54:52Z
dc.date.issued2011-01en_US
dc.departmentDepartment of Computer Engineeringen_US
dc.description.abstractThe semantic gap problem, which can be referred to as the disconnection between low-level multimedia data and high-level semantics, is an important obstacle to build real-world multimedia systems. The recently developed methods that can use large volumes of loosely labeled data to provide solutions for automatic image annotation stand as promising approaches toward solving this problem. In this paper, we are interested in how some of these methods can be applied to semantic gap problems that appear in other application domains beyond image annotation. Specifically, we introduce new problems that appear in videos, such as the linking of keyframes with speech transcript text and the linking of faces with names. In a common framework, we formulate these problems as the problem of finding missing correspondences between visual and semantic data and apply the multimedia translation method. We evaluate the performance of the multimedia translation method on these problems and compare its performance against other auto-annotation and classifier-based methods. The experiments, carried out on over 300 h of news videos from TRECVid 2004 and TRECVid 2006 corpora, show that the multimedia translation method provides a performance that is comparable to the other auto-annotation methods and superior performance compared to other classifier-based methods. © 2009 Springer-Verlag.en_US
dc.description.provenanceMade available in DSpace on 2016-02-08T09:54:52Z (GMT). No. of bitstreams: 1 bilkent-research-paper.pdf: 70227 bytes, checksum: 26e812c6f5156f83f0e77b261a471b5a (MD5) Previous issue date: 2011en
dc.identifier.doi10.1007/s00138-009-0217-8en_US
dc.identifier.issn0932-8092
dc.identifier.urihttp://hdl.handle.net/11693/22054
dc.language.isoEnglishen_US
dc.publisherSpringeren_US
dc.relation.isversionofhttp://dx.doi.org/10.1007/s00138-009-0217-8en_US
dc.source.titleMachine Vision & Applications: an international journalen_US
dc.subjectMachine translationen_US
dc.subjectAutomatic speech recognitionen_US
dc.subjectVisual data imageen_US
dc.subjectAnnotation visual contenten_US
dc.titleMultimedia translation for linking visual data to semantics in videosen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Multimedia translation for linking visual data to semantics in videos.pdf
Size:
1.45 MB
Format:
Adobe Portable Document Format
Description:
Full printable version