What's news, what's not? Associating news videos with words

Date
2004
Authors
Duygulu, P.
Hauptmann, A.
Advisor
Supervisor
Co-Advisor
Co-Supervisor
Instructor
Source Title
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Print ISSN
0302-9743
Electronic ISSN
Publisher
Springer
Volume
3115
Issue
Pages
132 - 140
Language
English
Type
Article
Journal Title
Journal ISSN
Volume Title
Series
Abstract

Text retrieval from broadcast news video is unsatisfactory, because a transcript word frequently does not directly 'describe' the shot when it was spoken. Extending the retrieved region to a window around the matching keyword provides better recall, but low precision. We improve on text retrieval using the following approach: First we segment the visual stream into coherent story-like units, using a set of visual news story delimiters. After filtering out clearly irrelevant classes of shots, we are still left with an ambiguity of how words in the transcript relate to the visual content in the remaining shots of the story. Using a limited set of visual features at different semantic levels ranging from color histograms, to faces, cars, and outdoors, an association matrix captures the correlation of these visual features to specific transcript words. This matrix is then refined using an EM approach. Preliminary results show that this approach has the potential to significantly improve retrieval performance from text queries. © Springer-Verlag 2004.

Course
Other identifiers
Book Title
Keywords
Semantics, Association matrix, Broadcast news video, Color histogram, Retrieval performance, Semantic levels, Text retrieval, Visual content, Visual feature, Information retrieval
Citation
Published Version (Please cite this version)