Combining textual and visual information for semantic labeling of images and videos
dc.citation.epage | 225 | en_US |
dc.citation.spage | 205 | en_US |
dc.contributor.author | Duygulu, Pınar | en_US |
dc.contributor.author | Baştan, Muhammet | en_US |
dc.contributor.author | Özkan, Derya | en_US |
dc.contributor.editor | Cord, M. | |
dc.contributor.editor | Cunningham, P. | |
dc.date.accessioned | 2019-04-22T10:15:18Z | |
dc.date.available | 2019-04-22T10:15:18Z | |
dc.date.issued | 2008 | en_US |
dc.department | Department of Computer Engineering | en_US |
dc.description | Chapter 9 | en_US |
dc.description.abstract | Semantic labeling of large volumes of image and video archives is difficult, if not impossible, with the traditional methods due to the huge amount of human effort required for manual labeling used in a supervised setting. Recently, semi-supervised techniques which make use of annotated image and video collections are proposed as an alternative to reduce the human effort. In this direction, different techniques, which are mostly adapted from information retrieval literature, are applied to learn the unknown one-to-one associations between visual structures and semantic descriptions. When the links are learned, the range of application areas is wide including better retrieval and automatic annotation of images and videos, labeling of image regions as a way of large-scale object recognition and association of names with faces as a way of large-scale face recognition. In this chapter, after reviewing and discussing a variety of related studies, we present two methods in detail, namely, the so called “translation approach” which translates the visual structures to semantic descriptors using the idea of statistical machine translation techniques, and another approach which finds the densest component of a graph corresponding to the largest group of similar visual structures associated with a semantic description. | en_US |
dc.description.provenance | Submitted by Onur Emek (onur.emek@bilkent.edu.tr) on 2019-04-22T10:15:17Z No. of bitstreams: 1 Combining textual and visual information for semantic labeling of images and videos.pdf: 13135490 bytes, checksum: 868f7594930e9b3507cac48d63d7bf6f (MD5) | en |
dc.description.provenance | Made available in DSpace on 2019-04-22T10:15:18Z (GMT). No. of bitstreams: 1 Combining textual and visual information for semantic labeling of images and videos.pdf: 13135490 bytes, checksum: 868f7594930e9b3507cac48d63d7bf6f (MD5) Previous issue date: 2008 | en |
dc.identifier.doi | 10.1007/978-3-540-75171-7_9 | en_US |
dc.identifier.doi | 10.1007/978-3-540-75171-7 | en_US |
dc.identifier.isbn | 9783540751700 | |
dc.identifier.issn | 1611-2482 | |
dc.identifier.uri | http://hdl.handle.net/11693/50869 | |
dc.language.iso | English | en_US |
dc.publisher | Springer, Berlin, Heidelberg | en_US |
dc.relation.ispartof | Machine learning techniques for multimedia | en_US |
dc.relation.ispartofseries | Cognitive Technologies; | |
dc.relation.isversionof | https://doi.org/10.1007/978-3-540-75171-7_9 | en_US |
dc.relation.isversionof | https://doi.org/10.1007/978-3-540-75171-7 | en_US |
dc.subject | Machine translation | en_US |
dc.subject | Automatic speech recognition | en_US |
dc.subject | Mean average precision | en_US |
dc.subject | Image annotation | en_US |
dc.subject | News video | en_US |
dc.title | Combining textual and visual information for semantic labeling of images and videos | en_US |
dc.type | Book Chapter | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Combining textual and visual information for semantic labeling of images and videos.pdf
- Size:
- 12.53 MB
- Format:
- Adobe Portable Document Format
- Description:
- View / Download
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description: