Browsing by Subject "Motion estimation"

Now showing 1 - 20 of 29

Open Access
2-D triangular mesh-based mosaicking for object tracking in the presence of occlusion
(SPIE, 1997) Toklu, C.; Tekalp, A. M.; Erdem, A. Tanju
In this paper, we describe a method for temporal tracking of video objects in video clips. We employ a 2D triangular mesh to represent each video object, which allows us to describe the motion of the object by the displacements of the node points of the mesh, and to describe any intensity variations by the contrast and brightness parameters estimated for each node point. Using the temporal history of the node point locations, we continue tracking the nodes of the 2D mesh even when they become invisible because of self-occlusion or occlusion by another object. Uncovered parts of the object in the subsequent frames of the sequence are detected by means of an active contour which contains a novel shape preserving energy term. The proposed shape preserving energy term is found to be successful in tracking the boundary of an object in video sequences with complex backgrounds. By adding new nodes or updating the 2D triangular mesh we incrementally append the uncovered parts of the object detected during the tracking process to the one of the objects to generate a static mosaic of the object. Also, by texture mapping the covered pixels into the current frame of the video clip we can generate a dynamic mosaic of the object. The proposed mosaicing technique is more general than those reported in the literature because it allows for local motion and out-of-plane rotations of the object that results in self-occlusions. Experimental results demonstrate the successful tracking of the objects with deformable boundaries in the presence of occlusion.
Open Access
3-D motion estimation and wireframe adaptation including photometric effects for model-based coding of facial image sequences
(IEEE, 1994-06) Bozdağı, G.; Tekalp, A. M.; Onural, L.
We propose a novel formulation where 3-D global and local motion estimation and the adaptation of a generic wireframe model to a particular speaker are considered simultaneously within an optical flow based framework including the photometric effects of the motion. We use a flexible wireframe model whose local structure is characterized by the normal vectors of the patches which are related to the coordinates of the nodes. Geometrical constraints that describe the propagation of the movement of the nodes are introduced, which are then efficiently utilized to reduce the number of independent structure parameters. A stochastic relaxation algorithm has been used to determine optimum global motion estimates and the parameters describing the structure of the wireframe model. Results with both simulated and real facial image sequences are provided.
Open Access
Automatic multimedia cross-modal correlation discovery
(ACM, 2004-08) Pan, J.-Y.; Yang, H.-J.; Faloutsos, C.; Duygulu, Pınar
Given an image (or video clip, or audio song), how do we automatically assign keywords to it? The general problem is to find correlations across the media in a collection of multimedia objects like video clips, with colors, and/or motion, and/or audio, and/or text scripts. We propose a novel, graph-based approach, "MMG", to discover such cross-modal correlations. Our "MMG" method requires no tuning, no clustering, no user-determined constants; it can be applied to any multi-media collection, as long as we have a similarity function for each medium; and it scales linearly with the database size. We report auto-captioning experiments on the "standard" Corel image database of 680 MB, where it outperforms domain specific, fine-tuned methods by up to 10 percentage points in captioning accuracy (50% relative improvement).
Open Access
Correlation tracking based on wavelet domain information
(SPIE, 2004) İpek, H. L.; Yılmaz, İ.; Yardımcı, Y. C.; Çetin, A. Enis
Tracking moving objects in video can be carried out by correlating a template containing object pixels with pixels of the current frame. This approach may produce erroneous results under noise. We determine a set of significant pixels on the object by analyzing the wavelet transform of the template and correlate only these pixels with the current frame to determine the next position of the object. These significant pixels are easily trackable features of the image and increase the performance of the tracker.
Open Access
Cylindrical model based head pose estimation for drivers
(IEEE, 2009) Yücel, Zeynep; Valenti, R.; Sebe, N.
The application of action recognition algorithms onto driving safety systems is still an open area of research. In terms of driving safety, identification of head movements present more significant information in comparison to other actions of the driver. Therefore, in this study, we developed a cylindrical model based head pose estimator to track drivers' head movements. The experiments indicate that the proposed scheme presents significant accuracy in estimation of head pose.
Open Access
Dynamic texture detection, segmentation and analysis
(ACM, 2007-07) Töreyin, Behçet Uğur; Dedeoğlu, Yiğithan; Çetin, A. Enis; Fazekas, S.; Chetverikov, D.; Amiaz, T.; Kiryati, N.
Dynamic textures are common in natural scenes. Examples of dynamic textures in video include fire, smoke, clouds, trees in the wind, sky, sea and ocean waves etc. In this showcase, (i) we develop real-time dynamic texture detection methods in video and (ii) present solutions to video object classification based on motion information. Copyright 2007 ACM.
Open Access
A fast algorithm for subpixel accuracy image stabilization for digital film and video
(SPIE, 1998) Eroğlu, Çiğdem; Erdem, A. T.
This paper introduces a novel method for subpixel accuracy stabilization of unsteady digital films and video sequences. The proposed method offers a near-closed-form solution to the estimation of the global subpixel displacement between two frames, that causes the misregistration of them. The criterion function used is the mean-squared error over the displaced frames, in which image intensities at subpixel locations are evaluated using bilinear interpolation. The proposed algorithm is both faster and more accurate than the search-based solutions found in the literature. Experimental results demonstrate the superiority of the proposed method to the spatio-temporal differentiation and surface fitting algorithms, as well. Furthermore, the proposed algorithm is designed so that it is insensitive to frame-to-frame intensity variations. It is also possible to estimate any affine motion between two frames by applying the proposed algorithm on three non-collinear points in the unsteady frame.
Open Access
Flame detection in video using hidden Markov models
(IEEE, 2005) Töreyin, B. Uğur; Dedeoğlu, Yiğithan; Çetin, A. Enis
This paper proposes a novel method to detect flames in video by processing the data generated by an ordinary camera monitoring a scene. In addition to ordinary motion and color clues, flame flicker process is also detected by using a hidden Markov model. Markov models representing the flame and flame colored ordinary moving objects are used to distinguish flame flicker process from motion of flame colored moving objects. Spatial color variations in flame are also evaluated by the same Markov models, as well. These clues are combined to reach a final decision. False alarms due to ordinary motion of flame colored moving objects are greatly reduced when compared to the existing video based fire detection systems.
Open Access
Gibbs model based 3D motion and structure estimation for object-based video coding applications
(Springer, 1997) Onural, Levent; Alatan, A. A.; Li, H. H.; Sun, S.; Derin, H.
Motion analysis is essential for any video coding scheme. A moving object in a 3D environment can be analyzed better by a 3D motion model instead of 2D models, and better modeling might lead to improved coding efficiency. Gibbs formulated joint segmentation and estimation of 2D motion not only improves the performance of each stage, but also generates robust point correspondences which are necessary for rigid 3D motion estimation algorithms. Estimated rigid 3D motion parameters of a segmented object are used to find the 3D structure of those objects by minimizing another Gibbs energy. Such an approach achieves error immunity compared to linear algorithms. A more general (non-rigid) motion model can also be proposed using Gibbs formulation which permits local elastic interactions in contrast to ultimately tight rigidity between object points. Experimental results are promising for both rigid and non-rigid 3D motion models and put these models forward as strong candidates to be used in object-based coding algorithms.
Open Access
Gibbs random field model based 3-D motion estimation from video sequences
(IEEE, 1994) Alatan, A. A.; Levent, O.
In contrast to previous global 3D motion concept, a Gibbs random field based method, which models local interactions between motion parameters defined at each point on the object, is proposed. An energy function which gives the joint probability distribution of motion vectors, is constructed. The energy function is minimized in order to find the most likely motion vector set. Some convergence problems, due to ill-posedness of the problem, are overcome by using the concept of hierarchical rigidity. In hierarchical rigidity, the objects are assumed to be almost rigid in the coarsest level and this rigidness is weakened at each level until the finest level is reached. The propagation of motion information between levels, is encouraged. At the finest level, each point have a motion vector associated with it and the interaction between these vectors are described by the energy function. The minimization of the energy function is achieved by using hierarchical rigidity, without trapping into a local minimum. The results are promising.
Open Access
Hareket geçmişi görüntüsü yöntemi ile Türkçe işaret dilini tanima uygulaması
(IEEE, 2016-05) Yalçınkaya, Özge; Atvar, A.; Duygulu, P.
İşitme ve konuşma engelli bireylerin toplum içerisinde diger bireylerle sağlıklı şekilde iletişim kurabilmeleri açısından işaret dili çok önemli bir role sahiptir. Ne yazık ki işaret dilinin toplumda sadece duyarlı insanlar tarafından bilindiği ve bu sayının da azlıgı dikkat çekmektedir. Yaptığımız çalışma kapsamındaki amaç, geliştirdiğimiz sistem sayesinde işitme veya konuşma engeli mevcut olan bireylerin diğer bireylerle olan iletişiminde iyileşme sağlamaktır. Bu amaç doğrultusunda kameradan alınan işaret diline ait hareket bilgisi tanınabilmekte ve o hareketin ne anlama geldiği daha önceden eğitilen işaret diline ait hareket bilgileri ile karşılaştırılarak bulunabilmektedir. Hareket bilgilerinin kameradan alınan görüntülerden çıkarılması aşamasında "Hareket Geçmişi Görüntüsü" yöntemi kullanılmıştır. Bu bağlamdaki sınıflandırma işlemi için de "En Yakın Komşuluk" algoritması kullanılmıştır. Sonuç olarak geliştirilen sistem, eğitim kümesini kullanarak işaret dili hareketi için bir metin tahmin etmektedir. Toplamdaki sınıflandırma başarısı %95 olarak hesaplanmıştır.
Open Access
An improvement to MBASIC algorithm for 3-D motion and depth estimation
(IEEE, 1994) Bozdağı, G.; Tekalp, A. M.; Onural, L.
In model-based coding of facial images, the accuracy of motion and depth parameter estimates strongly affects the coding efficiency. MBASIC is a simple and effective iterative algorithm (recently proposed by Aizawa et al.) for 3-D motion and depth estimation when the initial depth estimates are relatively accurate. In this correspondence, we analyze its performance in the presence of errors in the initial depth estimates and propose a modification to MBASIC algorithm that significantly improves its robustness to random errors with only a small increase in the computational load.
Open Access
Knives are picked before slices are cut: Recognition through activity sequence analysis
(ACM, 2013-10) İşcen, Ahmet; Duygulu, Pınar
In this paper, we introduce a model to classify cooking activities using their visual and temporal coherence information. We fuse multiple feature descriptors for fine-grained activity recognition as we would need every single detail to catch even subtle differences between classes with low inter-class variance. Considering the observation that daily activities such as cooking are likely to be performed in sequential patterns of activities, we also model temporal coherence of activities. By combining both aspects, we show that we can improve the overall accuracy of cooking recognition tasks. © Copyright 2013 ACM.
Open Access
A line based pose representation for human action recognition
(2013) Baysal, S.; Duygulu, P.
In this paper, we utilize a line based pose representation to recognize human actions in videos. We represent the pose in each frame by employing a collection of line-pairs, so that limb and joint movements are better described and the geometrical relationships among the lines forming the human figure are captured. We contribute to the literature by proposing a new method that matches line-pairs of two poses to compute the similarity between them. Moreover, to encapsulate the global motion information of a pose sequence, we introduce line-flow histograms, which are extracted by matching line segments in consecutive frames. Experimental results on Weizmann and KTH datasets emphasize the power of our pose representation, and show the effectiveness of using pose ordering and line-flow histograms together in grasping the nature of an action and distinguishing one from the others. © 2013 Elsevier B.V. All rights reserved.
Open Access
Moving region detection in wavelet compressed video
(IEEE, 2004) Töreyin, B. Uğur; Çetin, A. Enis; Aksay, Anıl; Akhan, M. B.
In many vision based surveillance systems the video is stored in wavelet compressed form. In this study, an algorithm for moving object and region detection in video that is compressed using a wavelet transform (WT) is developed. The algorithm estimates the WT of the background scene from the WTs of the past image frames of the video. The WT of the current image is compared with the WT of the background and the moving objects are determined from the difference. The algorithm does not perform inverse WT to obtain the actual pixels of the current image nor the estimated background. This leads to a computationally efficient method and a system compared to the existing motion estimation methods.
Open Access
Object rigidity and reflectivity identification based on motion analysis
(IEEE, 2010) Zang, D.; Schrater P.R.; Doerschner, Katja
Rigidity and reflectivity are important properties of objects, identifying these properties is a fundamental problem for many computer vision applications like motion and tracking. In this paper, we extend our previous work to propose a motion analysis based approach for detecting the object's rigidity and reflectivity. This approach consists of two steps. The first step aims to identify object rigidity based on motion estimation and optic flow matching. The second step is to classify specular rigid and diffuse rigid objects using structure from motion and Procrustes analysis. We show how rigid bodies can be detected without knowing any prior motion information by using a mutual information based matching method. In addition, we use a statistic way to set thresholds for rigidity classification. Presented results demonstrate that our approach can efficiently classify the rigidity and reflectivity of an object. © 2010 IEEE.
Open Access
Piecewise-planar 3D reconstruction in rate-distortion sense
(IEEE, 2007-05) İmre, E.; Güdükbay, Uğur; Alatan, A. A.
In this paper, a novel rate-distortion optimization inspired 3D piecewise-planar reconstruction algorithm is proposed. The algorithm refines a coarse 3D triangular mesh, by inserting vertices in a way to minimize the intensity difference between an image and its prediction. The preliminary experiments on synthetic and real data indicate the validity of the proposed approach.
Open Access
Rate-distortion based piecewise planar 3D scene geometry representation
(IEEE, 2006) Imre, E.; Alatan, A.A.; Güdükbay, Uğur
This paper proposes a novel 3D piecewise planar reconstruction algorithm, to build a 3D scene representation that minimizes the intensity error between a particular frame and its prediction. 3D scene geometry is exploited to remove the visual redundancy between frame pairs for any predictive coding scheme. This approach associates the rate increase with the quality of representation, and is shown to be rate-distortion efficient by the experiments. © 2007 IEEE.
Open Access
Rate-distortion optimized layered stereoscopic video streaming with raptor codes
(IEEE, 2007) Tan, A. Serdar; Aksay, A.; Bilen, C.; Bozdağı-Akar, G.; Arıkan, Erdal
A near optimal streaming system for stereoscopic video is proposed. Initially, the stereoscopic video is separated into three layers and the approximate analytical model of the Rate-Distortion (RD) curve of each layer is calculated from sufficient number of rate and distortion samples. The analytical modeling includes the interdependency of the defined layers. Then, the analytical models are used to derive the optimal source encoding rates for a given channel bandwidth. The distortion in the quality of the stereoscopic video that is caused by losing a NAL unit from the defined layers is estimated to minimize the average distortion of a single NAL unit loss. The minimization is performed over protection rates allocated to each layer. Raptor codes are utilized as the error protection scheme due to their novelty and suitability in video transmission. The layers are protected unequally using Raptor codes according to the parity ratios allocated to the layers. Comparison of the defined scheme with two other protection allocation schemes is provided via simulations to observe the quality of stereoscopic video.
Open Access
Robust transmission of multi-view video streams using flexible macroblock ordering and systematic LT codes
(IEEE, 2007) Argyropoulos, S.; Tan, A. Serdar; Thomos, N.; Arıkan, Erdal; Strintzis, M. G.
The transmission of fully compatible H.264/AVC multi-view video coded streams over packet erasure networks is examined. Macroblock classification into unequally important slice groups is considered using the Flexible Macroblock Ordering (FMO) tool of H.264/AVC Systematic LT codes are used for error protection due to their low complexity and advanced performance. The optimal slice grouping and channel rate allocation are jointly determined by an iterative optimization algorithm based on dynamic programming. The experimental evaluation clearly demonstrates the validity of the proposed method.