Browsing by Subject "Speech communication"

Now showing 1 - 5 of 5

Open Access
Flame detection method in video using covariance descriptors
(IEEE, 2011) Habiboǧlu, Y.H.; Günay, Osman; Çetin, A. Enis
Video fire detection system which uses a spatio-temporal covariance matrix of video data is proposed. This system divides the video into spatio-temporal blocks and computes covariance features extracted from these blocks to detect fire. Feature vectors taking advantage of both the spatial and the temporal characteristics of flame colored regions are classified using an SVM classifier which is trained and tested using video data containing flames and flame colored objects. Experimental results are presented. © 2011 IEEE.
Open Access
Perceived auditory environment in historic spaces of anatolian culture : a case study on Hacı Bayram mosque
(International Institute of Acoustics and Vibrations, 2016) Acun V.; Yilmazer, Semiha; Taherzadeh, P.
This article reports the initial finds of a research that is concerned with the perceived auditory environment within an historical mosque and its surroundings. Haci Bayram Mosque and its surrounding area of Hamamönü has been selected as the research site due to being the historical center of Ankara. Although there are studies concerned with the acoustical characteristics of mosques, there isn't enough research focusing on users' expectation and interpretation of the perceived auditory environment within a mosque. This study adopts the user focused of Grounded Theory to capture individuals' auditory sensation and interpretation of the perceived auditory environment within a historical mosque and its surroundings. In depth interviews are held with the congregation of the mosque and with the individuals sitting around the surrounding area. Based on their subjective responses, a theoretical framework is generated to gain an insight on the factors that affect individuals understanding and expectation from mosques. Acoustical characteristics of the mosque are analyzed by computer simulation and in-situ measurements of sound pressure levels. Objective room-acoustic indicators consist of reverberation time (RT) and speech transmission index (STI). The conceptual framework generated through Grounded Theory shows how perceived auditory environment may influence individuals' response to the physical environment of the mosque by showing the associations between the soundscape elements, spatial function and sense of place.
Open Access
Real time noise-cancellation using ICA, PSO and PE
(IEEE, 2012) Bor, R. İrem; Ider, Y. Ziya; Arıkan, Orhan; Ertan, Erdem
In order to provide noiseless transmission of speech in wireless communication systems a real-time implementable noise cancellation algorithm is developed. Speech and noise sources are not known but only their mixtures are observed. That system is modeled with instantaneous mixture model. Combination of independent component analysis (ICA) and particle swarm optimization (PSO) algorithms is used to separate speech and noise. However, ICA has an ambiguity such that it is not possible to know which one of the separated signals is speech or noise. As a result, the transmitted signal can be noise, instead of speech. To overcome this ambiguity problem, a pitch extraction (PE) algorithm is developed and combined with ICA-PSO. ICAPSO-PE algorithm is implemented in MATLAB. Contributions of this work are modifying objective functions of ICA algorithm to make them more robust, combining ICA with PSO to make it work fast and robust, and overcoming the ambiguity problem using PE algorithm. © 2012 IEEE.
Open Access
Recovery of sparse perturbations in Least Squares problems
(IEEE, 2011) Pilanci, M.; Arıkan, Orhan
We show that the exact recovery of sparse perturbations on the coefficient matrix in overdetermined Least Squares problems is possible for a large class of perturbation structures. The well established theory of Compressed Sensing enables us to prove that if the perturbation structure is sufficiently incoherent, then exact or stable recovery can be achieved using linear programming. We derive sufficiency conditions for both exact and stable recovery using known results of ℓ 0/ℓ 1 equivalence. However the problem turns out to be more complicated than the usual setting used in various sparse reconstruction problems. We propose and solve an optimization criterion and its convex relaxation to recover the perturbation and the solution to the Least Squares problem simultaneously. Then we demonstrate with numerical examples that the proposed method is able to recover the perturbation and the unknown exactly with high probability. The performance of the proposed technique is compared in blind identification of sparse multipath channels. © 2011 IEEE.
Open Access
Source and filter estimation for Throat-Microphone speech enhancement
(Institute of Electrical and Electronics Engineers Inc., 2016) Turan, M. A. T.; Erzin, E.
In this paper, we propose a new statistical enhancement system for throat microphone recordings through source and filter separation. Throat microphones (TM) are skin-attached piezoelectric sensors that can capture speech sound signals in the form of tissue vibrations. Due to their limited bandwidth, TM recorded speech suffers from intelligibility and naturalness. In this paper, we investigate learning phone-dependent Gaussian mixture model (GMM)-based statistical mappings using parallel recordings of acoustic microphone (AM) and TM for enhancement of the spectral envelope and excitation signals of the TM speech. The proposed mappings address the phone-dependent variability of tissue conduction with TM recordings. While the spectral envelope mapping estimates the line spectral frequency (LSF) representation of AM from TM recordings, the excitation mapping is constructed based on the spectral energy difference (SED) of AM and TM excitation signals. The excitation enhancement is modeled as an estimation of the SED features from the TM signal. The proposed enhancement system is evaluated using both objective and subjective tests. Objective evaluations are performed with the log-spectral distortion (LSD), the wideband perceptual evaluation of speech quality (PESQ) and mean-squared error (MSE) metrics. Subjective evaluations are performed with an A/B comparison test. Experimental results indicate that the proposed phone-dependent mappings exhibit enhancements over phone-independent mappings. Furthermore enhancement of the TM excitation through statistical mappings of the SED features introduces significant objective and subjective performance improvements to the enhancement of TM recordings. ©2015 IEEE.