Browsing by Subject "Speech processing systems."
Now showing 1 - 7 of 7
- Results Per Page
- Sort Options
Item Open Access Coding of speech and image signals using Gabor decomposition(1994) Gündüzhan, EmreA new low bit rate speech coding method which uses Gabor time-frequency decomposition and the matching pursuit algorithm is developed. A new algorithm based on the projections onto convex sets method is used to smooth the discontinuities between speech frames. A two-dimensional extension of the Gabor time-frequency decomposition is also developed for image coding. Simulation examples are presented.Item Open Access Computationally efficient voice dialing system(1998) Solmaz, Mustafa HakanSubband based feature parameters are becoming widely used for speech recognition purposes. In this thesis a subband-based, small-vocabulary, speaker-dependent, isolated-word recognition system is proposed. The most distinctive property of the proposed system is its low computational cost which enables it to run at real-time on a simple microcontroller. The system is used as the core of a voice dialer which is designed to work together with Karel switchboxes. In training section first, an energy-based endpoint (startingpoint) detection method is applied for speech detection. Then feature extraction is applied on a fixed length, pcm-quantized (a-law) speech long enough to cover a single word. In recognition section template matching is used to find the most likely vocabulary element. A recognition rate of 93% is obtained in the simulations.Item Open Access Low bit rate speech coding methods and a new interframe differential coding scheme for line spectrum pairs(1992) Erzin, EnginLow bit rate speech coding techniques and a new coding scheme for vocal tract parameters are presented. Linear prediction based voice coding techniques (linear predictive coding and code excited linear predictive coding) are examined and implemented. A new interframe differential coding scheme for line spectrum pairs is developed. The new scheme reduces the spectral distortion of the linear predictive filter while maintaining a high compression ratio.Item Open Access New methods for robust speech recognition(1995) Erzin, EnginNew methods of feature extraction, end-point detection and speech enhcincement are developed for a robust speech recognition system. The methods of feature extraction and end-point detection are based on wavelet analysis or subband analysis of the speech signal. Two new sets of speech feature parameters, SUBLSF’s and SUBCEP’s, are introduced. Both parameter sets are based on subband analysis. The SUBLSF feature parameters are obtained via linear predictive analysis on subbands. These speech feature parameters can produce better results than the full-band parameters when the noise is colored. The SUBCEP parameters are based on wavelet analysis or equivalently the multirate subband analysis of the speech signal. The SUBCEP parameters also provide robust recognition performance by appropriately deemphasizing the frequency bands corrupted by noise. It is experimentally observed that the subband analysis based feature parameters are more robust than the commonly used full-band analysis based parameters in the presence of car noise. The a-stable random processes can be used to model the impulsive nature of the public network telecommunication noise. Adaptive filtering are developed for Q-stable random processes. Adaptive noise cancelation techniques are used to reduce the mismacth between training and testing conditions of the recognition system over telephone lines. Another important problem in isolated speech recognition is to determine the boundaries of the speech utterances or words. Precise boundary detection of utterances improves the performance of speech recognition systems. A new distance measure based on the subband energy levels is introduced for endpoint detection.Item Open Access Speaker-dependent speech coding(2006) Kart, Mete KemalWith the rapid expansion of Internet, it became feasible to have low-cost and secure telephone calls via internet. New digital speech compression standards were developed. Digital speech codecs can be used both in regular telephone networks and Internet based systems. Thereby, for a secure call, speech data are firstly compressed by digital speech codecs, and then, these compressed packages are sent in an encoded way through Data Encryption Standard (3DES) [1], Advanced Encryption Standard (AES) [2] encoding methods. Compressing and encoding processes require high processor performance and they may even require the use of high frequency processors and DSP’s to encode the binary speech data. Current speech coders are speaker independent, i.e., they don’t perform any speaker specific operations. They do not even distinguish between male and female speakers. An interesting way to solve this problem is to send speech after encoding it with a system that is based on a specific user. This system, which can be also called as speaker dependent speech encoding, provides a computationally efficient and relatively secure VoIP call, with high quality and without any encoding compared to the same bit rate standard speech codecs. Despite the disadvantages of requirement of acquiring all the speech characteristics of users and the need for extra data space, it has advantages such as providing secure communication because speech characteristics of the speaker is unknown to other users and the synthesized speech has higher quality compared to a same bit rate LPC compressed speech.Item Open Access Speech spectrum non-stationarity detection based on line spectrum frequencies and related applications(1998) Ertan, Ali ErdemIn this thesis, two new speech variation measures for speech spectrum nonstationarity detection are proposed. These measures are based on the Line Spectrum Frequencies (LSF) and the spectral values at the LSF locations. They are formulated to be subjectively meaningful, mathematically tractable, and also have low computational complexity property. In order to demonstrate the usefulness of the non-stationarity detector, two applications are presented: The first application is an implicit speech segmentation system which detects non-stationary regions in speech signal and obtains the boundaries of the speech segments. The other application is a Variable Bit-Rate Mixed Excitation Linear Predictive (VBR-MELP) vocoder utilizing a novel voice activity detector to detect silent regions in the speech. This voice activity detector is designed to be robust to non-stationary background noise and provides efficient coding of silent sections and unvoiced utterances to decrease the bit-rate. Simulation results are also presented.Item Open Access Turkish text to speech system(2002) Eker, BarışScientists have been interested in producing human speech artificially for more than two centuries. After the invention of computers, computers have been used in order to synthesize speech. By the help of this new technology, Text To Speech (TTS) systems that take a text as input and produce speech as output have been created. Some languages like English and French have taken most of the attention and some languages like Turkish have not been taken into consideration. This thesis presents a TTS system for Turkish that uses the diphone concatenation method. It takes a text as input and produces corresponding speech in Turkish. The output can be obtained in one male voice only in this system. Since Turkish is a phonetic language, this system also can be used for other phonetic languages with some minor modifications. If this system is integrated with a pronunciation unit, it can also be used for languages that are not phonetic.