Multimedia translation for linking visual data to semantics in videos

Duygulu, P.; Baştan M.

Multimedia translation for linking visual data to semantics in videos

Files

Multimedia translation for linking visual data to semantics in videos.pdf (1.45 MB)

Date

2011-01

Authors

Duygulu, P.

Baştan M.

BUIR Usage Stats

2
views

19
downloads

Citation Stats

Abstract

The semantic gap problem, which can be referred to as the disconnection between low-level multimedia data and high-level semantics, is an important obstacle to build real-world multimedia systems. The recently developed methods that can use large volumes of loosely labeled data to provide solutions for automatic image annotation stand as promising approaches toward solving this problem. In this paper, we are interested in how some of these methods can be applied to semantic gap problems that appear in other application domains beyond image annotation. Specifically, we introduce new problems that appear in videos, such as the linking of keyframes with speech transcript text and the linking of faces with names. In a common framework, we formulate these problems as the problem of finding missing correspondences between visual and semantic data and apply the multimedia translation method. We evaluate the performance of the multimedia translation method on these problems and compare its performance against other auto-annotation and classifier-based methods. The experiments, carried out on over 300 h of news videos from TRECVid 2004 and TRECVid 2006 corpora, show that the multimedia translation method provides a performance that is comparable to the other auto-annotation methods and superior performance compared to other classifier-based methods. © 2009 Springer-Verlag.

Source Title

Machine Vision & Applications: an international journal

Publisher

Springer

Keywords

Machine translation, Automatic speech recognition, Visual data image, Annotation visual content

Permalink

http://hdl.handle.net/11693/22054

Published Version (Please cite this version)

http://dx.doi.org/10.1007/s00138-009-0217-8

Collections

Scholarly Publications - Computer Engineering

Language

English

Type

Article

Full item page

Multimedia translation for linking visual data to semantics in videos

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Multimedia translation for linking visual data to semantics in videos

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type