Browsing by Subject "Video retrieval"
Now showing 1 - 8 of 8
- Results Per Page
- Sort Options
Item Open Access Bilkent University at TRECVID 2005(National Institute of Standards and Technology, 2005-11) Aksoy, Selim; Avcı, Akın; Balçık, Erman; Çavuş, Özge; Duygulu, Pınar; Karaman, Zeynep; Kavak, Pınar; Kaynak, Cihan; Küçükayvaz, Emre; Öcalan, Çağdaş; Yıldız, PınarWe describe our second-time participation, that includes one high-level feature extraction run, and three manual and one interactive search runs, to the TRECVID video retrieval evaluation. All of these runs have used a system trained on the common development collection. Only visual and textual information were used where visual information consisted of color, texture and edgebased low-level features and textual information consisted of the speech transcript provided in the collection. With the experience gained with our second-time participation, we are in the process of building a system for automatic classification and indexing of video archives.Item Open Access Bilkent university at TRECVID 2006(National Institute of Standards and Technology, 2006-11) Aksoy, Selim; Duygulu, Pınar; Akçay, Hüseyin Gökhan; Ataer, Esra; Baştan, Muhammet; Can, Tolga; Çavuş, Özge; Doǧgrusöz, Emel; Gökalp, Demir; Akaydın, Ateş; Akoǧlu, Leman; Angın, Pelin; Cinbiş, R. Gökberk; Gür, Tunay; Ünlü, MehmetWe describe our third participation, that includes one high-level feature extraction run, and two manual and one interactive search runs, to the TRECVID video retrieval evaluation. All of these runs have used a system trained on the common development collection. Only visual and textual information were used where visual information consisted of color, texture and edge-based low-level features and textual information consisted of the speech transcript provided in the collection.Item Open Access Bilkent University at TRECVID 2007(National Institute of Standards and Technology, 2007) Aksoy, Selim; Duygulu, Pınar; Aksoy, C.; Aydin, E.; Gunaydin, D.; Hadimli, K.; Koç L.; Olgun, Y.; Orhan, C.; Yakin G.We describe our fourth participation, that includes two high-level feature extraction runs, and one manual search run, to the TRECVID video retrieval evaluation. All of these runs have used a system trained on the common development collection. Only visual information, consisting of color, texture and edge-based low-level features, was used.Item Open Access HandVR: a hand-gesture-based interface to a video retrieval system(Springer U K, 2015) Genç, S.; Baştan M.; Güdükbay, Uğur; Atalay, V.; Ulusoy, ÖzgürUsing one’s hands in human–computer interaction increases both the effectiveness of computer usage and the speed of interaction. One way of accomplishing this goal is to utilize computer vision techniques to develop hand-gesture-based interfaces. A video database system is one application where a hand-gesture-based interface is useful, because it provides a way to specify certain queries more easily. We present a hand-gesture-based interface for a video database system to specify motion and spatiotemporal object queries. We use a regular, low-cost camera to monitor the movements and configurations of the user’s hands and translate them to video queries. We conducted a user study to compare our gesture-based interface with a mouse-based interface on various types of video queries. The users evaluated the two interfaces in terms of different usability parameters, including the ease of learning, ease of use, ease of remembering (memory), naturalness, comfortable use, satisfaction, and enjoyment. The user study showed that querying video databases is a promising application area for hand-gesture-based interfaces, especially for queries involving motion and spatiotemporal relations.Item Open Access An MPEG-7 compatible video retrieval system with integrated support for complex multimodal queries(IEEE Computer Society, 2019) Baştan, Muhammet; Çam, Hayati; Güdükbay, Uğur; Ulusoy, ÖzgürWe present BilVideo-7, an MPEG-7 compatible, video indexing and retrieval system that supports complex multimodal queries in a unified framework. An MPEG-7 profile is developed to represent the videos by decomposing them into Shots, Keyframes, Still Regions and Moving Regions. The MPEG-7 compatible XML representations of videos according to this profile are obtained by the MPEG-7 compatible video feature extraction and annotation tool of BilVideo-7, and stored in a native XML database. Users can formulate text-based semantic, color, texture, shape, location, motion and spatio-temporal queries on an intuitive, easy-to-use Visual Query Interface, whose Composite Query Interface can be used to specify very complex queries containing any type and number of video segments with their descriptors. The multi-threaded Query Processing Server parses incoming queries into subqueries and executes each subquery in a separate thread. Then, it fuses subquery results in a bottom-up manner to obtain the final query result. The whole system is unique in that it provides very powerful querying capabilities with a wide range of descriptors and multimodal query processing in an MPEG-7 compatible interoperable environment. We present sample queries to demonstrate the capabilities of the system.Item Open Access A relevance feedback technique for multimodal retrieval of news videos(IEEE, 2005-11) Aksoy, Selim; Çavuş ÖzgeContent-based retrieval in news video databases has become an important task with the availability of large quantities of data in both public and proprietary archives. We describe a relevance feedback technique that captures the significance of different features at different spatial locations in an image. Spatial content is modeled by partitioning images into non-overlapping grid cells. Contributions of different features at different locations are modeled using weights defined for each feature in each grid cell. These weights are iteratively updated based on user's feedback in terms of positive and negative labeling of retrieval results. Given this labeling, the weight updating scheme uses the ratios of standard deviations of the distances between relevant and irrelevant images to the standard deviations of the distances between relevant images. The proposed technique is quantitatively and qualitatively evaluated using shots related to several sports from the news video collection of the TRECVID video retrieval evaluation where the weights could capture relative contributions of different features and spatial locations. © 2005 IEEE.Item Open Access Searching for complex human activities with no visual examples(2008) Ikizler, N.; Forsyth, D.A.We describe a method of representing human activities that allows a collection of motions to be queried without examples, using a simple and effective query language. Our approach is based on units of activity at segments of the body, that can be composed across space and across the body to produce complex queries. The presence of search units is inferred automatically by tracking the body, lifting the tracks to 3D and comparing to models trained using motion capture data. Our models of short time scale limb behaviour are built using labelled motion capture set. We show results for a large range of queries applied to a collection of complex motion and activity. We compare with discriminative methods applied to tracker data; our method offers significantly improved performance. We show experimental evidence that our method is robust to view direction and is unaffected by some important changes of clothing. © 2008 Springer Science+Business Media, LLC.Item Open Access Two-person interaction recognition via spatial multiple instance embedding(Academic Press Inc., 2015) Sener F.; Ikizler-Cinbis, N.Abstract In this work, we look into the problem of recognizing two-person interactions in videos. Our method integrates multiple visual features in a weakly supervised manner by utilizing an embedding-based multiple instance learning framework. In our proposed method, first, several visual features that capture the shape and motion of the interacting people are extracted from each detected person region in a video. Then, two-person visual descriptors are formed. Since the relative spatial locations of interacting people are likely to complement the visual descriptors, we propose to use spatial multiple instance embedding, which implicitly incorporates the distances between people into the multiple instance learning process. Experimental results on two benchmark datasets validate that using two-person visual descriptors together with spatial multiple instance learning offers an effective way for inferring the type of the interaction. © 2015 Elsevier Inc.