Browsing by Subject "Multimedia systems"

Now showing 1 - 12 of 12

Open Access
Automatic multimedia cross-modal correlation discovery
(ACM, 2004-08) Pan, J.-Y.; Yang, H.-J.; Faloutsos, C.; Duygulu, Pınar
Given an image (or video clip, or audio song), how do we automatically assign keywords to it? The general problem is to find correlations across the media in a collection of multimedia objects like video clips, with colors, and/or motion, and/or audio, and/or text scripts. We propose a novel, graph-based approach, "MMG", to discover such cross-modal correlations. Our "MMG" method requires no tuning, no clustering, no user-determined constants; it can be applied to any multi-media collection, as long as we have a similarity function for each medium; and it scales linearly with the database size. We report auto-captioning experiments on the "standard" Corel image database of 680 MB, where it outperforms domain specific, fine-tuned methods by up to 10 percentage points in captioning accuracy (50% relative improvement).
Open Access
COST292 experimental framework for TRECVID 2006
(National Institute of Standards and Technology, 2006) Ćalić J.; Krämer P.; Naci, U.; Vrochidis, S.; Aksoy, S.; Zhangk Q.; Benois-Pineau J.; Saracoglu, A.; Doulaverakis, C.; Jarina, R.; Campbell, N.; Mezaris V.; Kompatsiaris I.; Spyrou, E.; Koumoulos G.; Avrithis, Y.; Dalkilic, A.; Alatan, A.; Hanjalic, A.; Izquierdo, E.
In this paper we give an overview of the four TRECVID tasks submitted by COST292, European network of institutions in the area of semantic multimodal analysis and retrieval of digital video media. Initially, we present shot boundary evaluation method based on results merged using a confidence measure. The two SB detectors user here are presented, one of the Technical University of Delft and one of the LaBRI, University of Bordeaux 1, followed by the description of the merging algorithm. The high-level feature extraction task comprises three separate systems. The first system, developed by the National Technical University of Athens (NTUA) utilises a set of MPEG-7 low-level descriptors and Latent Semantic Analysis to detect the features. The second system, developed by Bilkent University, uses a Bayesian classifier trained with a "bag of subregions" for each keyframe. The third system by the Middle East Technical University (METU) exploits textual information in the video using character recognition methodology. The system submitted to the search task is an interactive retrieval application developed by Queen Mary, University of London, University of Zilina and ITI from Thessaloniki, combining basic retrieval functionalities in various modalities (i.e. visual, audio, textual) with a user interface supporting the submission of queries using any combination of the available retrieval tools and the accumulation of relevant retrieval results over all queries submitted by a single user during a specified time interval. Finally, the rushes task submission comprises a video summarisation and browsing system specifically designed to intuitively and efficiently presents rushes material in video production environment. This system is a result of joint work of University of Bristol, Technical University of Delft and LaBRI, University of Bordeaux 1.
Open Access
Database research at Bilkent University
(ACM, 2005) Ulusoy, Özgür
The research activities of the Database Research Group of Bilkent University are discussed. The research is mainly focused on the topics of multimedia databases, Web databases, and mobile computing. The Ottoman Archive Content-Based Retrieval system is a Web-based program that provides electronic access to digitally stored Ottoman document images. The issues involved in adding a native score management system to object-relational databases, to be used in querying web metadata are also discussed.
Open Access
Finding people frequently appearing in news
(Springer, 2006-07) Özkan, Derya; Duygulu, Pınar
We propose a graph based method to improve the performance of person queries in large news video collections. The method benefits from the multi-modal structure of videos and integrates text and face information. Using the idea that a person appears more frequently when his/her name is mentioned, we first use the speech transcript text to limit our search space for a query name. Then, we construct a similarity graph with nodes corresponding to all of the faces in the search space, and the edges corresponding to similarity of the faces. With the assumption that the images of the query name will be more similar to each other than to other images, the problem is then transformed into finding the densest component in the graph corresponding to the images of the query name. The same graph algorithm is applied for detecting and removing the faces of the anchorpeople in an unsupervised way. The experiments are conducted on 229 news videos provided by NIST for TRECVID 2004. The results show that proposed method outperforms the text only based methods and provides cues for recognition of faces on the large scale. © Springer-Verlag Berlin Heidelberg 2006.
Open Access
Image sequence analysis for emerging interactive multimedia services-the European COST 211 framework
(Institute of Electrical and Electronics Engineers, 1998-11) Alatan, A. A.; Onural, L.; Wollborn, M.; Mech, R.; Tuncel, E.; Sikora, T.
Flexibility and efficiency of coding, content extraction, and content-based search are key research topics in the field of interactive multimedia. Ongoing ISO MPEG-4 and MPEG-7 activities are targeting standardization to facilitate such services. European COST Telecommunications activities provide a framework for research collaboration. COST 211 bis and COST 211 tcr activities have been instrumental in the definition and development of the ITU-T H.261 and H.263 standards for video-conferencing over ISDN and videophony over regular phone lines, respectively. The group has also contributed significantly to the ISO MPEG-4 activities. At present a significant effort of the COST 211 tcr group activities is dedicated toward image and video sequence analysis and segmentation - an important technological aspect for the success of emerging object-based MPEG-4 and MPEG-7 multimedia applications. The current work of COST 211 is centered around the test model, called the Analysis Model (AM). The essential feature of the AM is its ability to fuse information from different sources to achieve a high-quality object segmentation. The current information sources are the intermediate results from frame-based (still) color segmentation, motion vector based segmentation, and change-detection-based segmentation. Motion vectors, which form the basis for the motion vector based intermediate segmentation, are estimated from consecutive frames. A recursive shortest spanning tree (RSST) algorithm is used to obtain intermediate color and motion vector based segmentation results. A rule-based region processor fuses the intermediate results; a postprocessor further refines the final segmentation output. The results of the current AM are satisfactory; it is expected that there will be further improvements of the AM within the COST 211 project.
Open Access
Intracavity optical trapping with ytterbium doped fiber
(SPIE, 2013) Laser, R.; Sayed, R.; Kalantarifard, Fatemeh; Elahi P.; İlday, F. Ömer; Volpe, Giovanni; Marago O.M.
We propose a novel approach for trapping micron-sized particles and living cells based on optical feedback. This approach can be implemented at low numerical aperture (NA=0.5, 20X) and long working distance. In this configuration, an optical tweezers is constructed inside a ring cavity fiber laser and the optical feedback in the ring cavity is controlled by the light scattered from a trapped particle. In particular, once the particle is trapped, the laser operation, optical feedback and intracavity power are affected by the particle motion. We demonstrate that using this configuration is possible to stably hold micron-sized particles and single living cells in the focal spot of the laser beam. The calibration of the optical forces is achieved by tracking the Brownian motion of a trapped particle or cell and analysing its position distribution. © 2013 SPIE.
Open Access
Making peace with your multimedia
(1998) Adali, S.
[No abstract available]
Open Access
A performance evaluation framework of a rate-controlled MPEG video transmission over UMTS networks
(IEEE, 2007-07) Akar, Nail; Barbera, M.; Budzisz, L.; Ferrùs, R.; Kankaya, Emre; Schembra, G.
UMTS is designed to offer high bandwidth radio access with QoS assurances for multimedia communications. In particular, real-time video communications services are expected to become a successful experience under UMTS networks. In this context, a video transmission service can be designed over the basis that UMTS can provide either a constant bit rate data channel or a dynamic variable bit rate data channel adapted to load conditions. In this latter approach, which is more efficient for both the user and the service provider, multimedia sources have to be timely designed in order to adapt their output rate to the instantaneous allowed channel rate. The target of this paper is to define an analytical model of adaptive real-time video sources in a UMTS network where system resources are dynamically shared among active users. © 2007 IEEE.
Open Access
Recognizing objects and scenes in news videos
(Springer, 2006-07) Baştan, Muhammet; Duygulu, Pınar
We propose a new approach to recognize objects and scenes in news videos motivated by the availability of large video collections. This approach considers the recognition problem as the translation of visual elements to words. The correspondences between visual elements and words are learned using the methods adapted from statistical machine translation and used to predict words for particular image regions (region naming), for entire images (auto-annotation), or to associate the automatically generated speech transcript text with the correct video frames (video alignment). Experimental results are presented on TRECVID 2004 data set, which consists of about 150 hours of news videos associated with manual annotations and speech transcript text. The results show that the retrieval performance can be improved by associating visual and textual elements. Also, extensive analysis of features are provided and a method to combine features are proposed. © Springer-Verlag Berlin Heidelberg 2006.
Open Access
A simple and effective mechanism for stored video streaming with TCP transport and server-side adaptive frame discard
(Elsevier, 2005) Gürses, E.; Akar, G. B.; Akar, N.
Transmission control protocol (TCP) with its well-established congestion control mechanism is the prevailing transport layer protocol for non-real time data in current Internet Protocol (IP) networks. It would be desirable to transmit any type of multimedia data using TCP in order to take advantage of the extensive operational experience behind TCP in the Internet. However, some features of TCP including retransmissions and variations in throughput and delay, although not catastrophic for non-real time data, may result in inefficiencies for video streaming applications. In this paper, we propose an architecture which consists of an input buffer at the server side, coupled with the congestion control mechanism of TCP at the transport layer, for efficiently streaming stored video in the best-effort Internet. The proposed buffer management scheme selectively discards low priority frames from its head-end, which otherwise would jeopardize the successful playout of high priority frames. Moreover, the proposed discarding policy is adaptive to changes in the bandwidth available to the video stream. © 2004 Elsevier B.V. All rights reserved.
Open Access
Towards auto-documentary: Tracking the evolution of news stories
(ACM, 2004) Duygulu, Pınar; Pan J.-Y.; Forsyth, D.A.
News videos constitute an important source of information for tracking and documenting important events. In these videos, news stories are often accompanied by short video shots that tend to be repeated during the course of the event. Automatic detection of such repetitions is essential for creating auto-documentaries, for alleviating the limitation of traditional textual topic detection methods. In this paper, we propose novel methods for detecting and tracking the evolution of news over time. The proposed method exploits both visual cues and textual information to summarize evolving news stories. Experiments are carried on the TREC-VID data set consisting of 120 hours of news videos from two different channels.
Open Access
Utilization of the recursive shortest spanning tree algorithm for video-object segmentation by 2-D affine motion modeling
(IEEE, 2000) Tuncel, E.; Onural, L.
A novel video-object segmentation algorithm is proposed, which takes the previously estimated 2-D dense motion vector field as input and uses the generalized recursive shortest spanning tree method to approximate each component of the motion vector field as a piecewise planar function. The algorithm is successful in capturing 3-D planar objects in the scene correctly, with acceptable accuracy at the boundaries. The proposed algorithm is fast and requires no initial guess about the segmentation mask. Moreover, it is a hierarchical scheme which gives finest to coarsest segmentation results. The only external parameter needed by the algorithm is the number of segmented regions that essentially control the level at which the coarseness the algorithm would stop. The proposed algorithm improves the `analysis model' developed in the European COST211 framework.