Translating images to words for recognizing objects in large image and video collections
dc.citation.epage | 276 | en_US |
dc.citation.spage | 258 | en_US |
dc.citation.volumeNumber | 4170 | en_US |
dc.contributor.author | Duygulu, P. | en_US |
dc.contributor.author | Baştan M. | en_US |
dc.contributor.author | Forsyth, D. | en_US |
dc.date.accessioned | 2019-02-11T12:46:06Z | |
dc.date.available | 2019-02-11T12:46:06Z | |
dc.date.issued | 2006 | en_US |
dc.department | Department of Computer Engineering | en_US |
dc.description.abstract | We present a new approach to the object recognition problem, motivated by the recent availability of large annotated image and video collections. This approach considers object recognition as the translation of visual elements to words, similar to the translation of text from one language to another. The visual elements represented in feature space are categorized into a finite set of blobs. The correspondences between the blobs and the words are learned, using a method adapted from Statistical Machine Translation. Once learned, these correspondences can be used to predict words corresponding to particular image regions (region naming), to predict words associated with the entire images (autoannotation), or to associate the speech transcript text with the correct video frames (video alignment). We present our results on the Corel data set which consists of annotated images and on the TRECVID 2004 data set which consists of video frames associated with speech transcript text and manual annotations. | en_US |
dc.description.provenance | Submitted by Betül Özen (ozen@bilkent.edu.tr) on 2019-02-11T12:46:06Z No. of bitstreams: 1 Translating_Images_to_Words_for_Recognizing.pdf: 851111 bytes, checksum: b4d5e1c86ad3cf438588ccfa2a783476 (MD5) | en |
dc.description.provenance | Made available in DSpace on 2019-02-11T12:46:06Z (GMT). No. of bitstreams: 1 Translating_Images_to_Words_for_Recognizing.pdf: 851111 bytes, checksum: b4d5e1c86ad3cf438588ccfa2a783476 (MD5) Previous issue date: 2006 | en |
dc.identifier.doi | 10.1007/11957959_14 | en_US |
dc.identifier.issn | 0302-9743 | en_US |
dc.identifier.uri | http://hdl.handle.net/11693/49255 | en_US |
dc.language.iso | English | en_US |
dc.publisher | Springer | en_US |
dc.relation.isversionof | https://doi.org/10.1007/11957959_14 | en_US |
dc.source.title | Lecture Notes in Computer Science | en_US |
dc.subject | Machine translation | en_US |
dc.subject | Automatic speech recognition | en_US |
dc.subject | News video | en_US |
dc.subject | Statistical machine translation | en_US |
dc.subject | Correspondence problem | en_US |
dc.title | Translating images to words for recognizing objects in large image and video collections | en_US |
dc.type | Article | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Translating_Images_to_Words_for_Recognizing.pdf
- Size:
- 831.16 KB
- Format:
- Adobe Portable Document Format
- Description:
- Full printable version
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description: