Translating images to words for recognizing objects in large image and video collections

Duygulu, P.; Baştan M.; Forsyth, D.

Translating images to words for recognizing objects in large image and video collections

Files

Translating_Images_to_Words_for_Recognizing.pdf (831.16 KB)

Date

2006

Authors

Duygulu, P.

Baştan M.

Forsyth, D.

BUIR Usage Stats

3
views

58
downloads

Citation Stats

Abstract

We present a new approach to the object recognition problem, motivated by the recent availability of large annotated image and video collections. This approach considers object recognition as the translation of visual elements to words, similar to the translation of text from one language to another. The visual elements represented in feature space are categorized into a finite set of blobs. The correspondences between the blobs and the words are learned, using a method adapted from Statistical Machine Translation. Once learned, these correspondences can be used to predict words corresponding to particular image regions (region naming), to predict words associated with the entire images (autoannotation), or to associate the speech transcript text with the correct video frames (video alignment). We present our results on the Corel data set which consists of annotated images and on the TRECVID 2004 data set which consists of video frames associated with speech transcript text and manual annotations.

Source Title

Lecture Notes in Computer Science

Publisher

Springer

Keywords

Machine translation, Automatic speech recognition, News video, Statistical machine translation, Correspondence problem

Permalink

http://hdl.handle.net/11693/49255

Published Version (Please cite this version)

https://doi.org/10.1007/11957959_14

Collections

Scholarly Publications - Computer Engineering

Language

English

Type

Article

Full item page

Translating images to words for recognizing objects in large image and video collections

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type

Translating images to words for recognizing objects in large image and video collections

Files

Date

Authors

Editor(s)

Advisor

Supervisor

Co-Advisor

Co-Supervisor

Instructor

BUIR Usage Stats

Citation Stats

Share

Series

Abstract

Source Title

Publisher

Course

Other identifiers

Book Title

Keywords

Degree Discipline

Degree Level

Degree Name

Citation

Permalink

Published Version (Please cite this version)

Collections

Language

Type