Automatic multimedia cross-modal correlation discovery

Date
2004-08
Advisor
Instructor
Source Title
KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Print ISSN
Electronic ISSN
Publisher
ACM
Volume
Issue
Pages
653 - 658
Language
English
Type
Conference Paper
Journal Title
Journal ISSN
Volume Title
Abstract

Given an image (or video clip, or audio song), how do we automatically assign keywords to it? The general problem is to find correlations across the media in a collection of multimedia objects like video clips, with colors, and/or motion, and/or audio, and/or text scripts. We propose a novel, graph-based approach, "MMG", to discover such cross-modal correlations. Our "MMG" method requires no tuning, no clustering, no user-determined constants; it can be applied to any multi-media collection, as long as we have a similarity function for each medium; and it scales linearly with the database size. We report auto-captioning experiments on the "standard" Corel image database of 680 MB, where it outperforms domain specific, fine-tuned methods by up to 10 percentage points in captioning accuracy (50% relative improvement).

Course
Other identifiers
Book Title
Keywords
Automatic image captioning, Cross-modal correlation, Graph-based model, Approximation theory, Correlation methods, Database systems, Graph theory, Image analysis, Mathematical models, Motion estimation, Probability, Problem solving, Automatic image captioning, Cross-modal correlation, Graph-based models, Video motion, Multimedia systems
Citation
Published Version (Please cite this version)