Source and filter estimation for Throat-Microphone speech enhancement

dc.citation.epage275en_US
dc.citation.issueNumber2en_US
dc.citation.spage265en_US
dc.citation.volumeNumber24en_US
dc.contributor.authorTuran, M. A. T.en_US
dc.contributor.authorErzin, E.en_US
dc.date.accessioned2018-04-12T10:42:45Z
dc.date.available2018-04-12T10:42:45Z
dc.date.issued2016en_US
dc.departmentDepartment of Electrical and Electronics Engineeringen_US
dc.description.abstractIn this paper, we propose a new statistical enhancement system for throat microphone recordings through source and filter separation. Throat microphones (TM) are skin-attached piezoelectric sensors that can capture speech sound signals in the form of tissue vibrations. Due to their limited bandwidth, TM recorded speech suffers from intelligibility and naturalness. In this paper, we investigate learning phone-dependent Gaussian mixture model (GMM)-based statistical mappings using parallel recordings of acoustic microphone (AM) and TM for enhancement of the spectral envelope and excitation signals of the TM speech. The proposed mappings address the phone-dependent variability of tissue conduction with TM recordings. While the spectral envelope mapping estimates the line spectral frequency (LSF) representation of AM from TM recordings, the excitation mapping is constructed based on the spectral energy difference (SED) of AM and TM excitation signals. The excitation enhancement is modeled as an estimation of the SED features from the TM signal. The proposed enhancement system is evaluated using both objective and subjective tests. Objective evaluations are performed with the log-spectral distortion (LSD), the wideband perceptual evaluation of speech quality (PESQ) and mean-squared error (MSE) metrics. Subjective evaluations are performed with an A/B comparison test. Experimental results indicate that the proposed phone-dependent mappings exhibit enhancements over phone-independent mappings. Furthermore enhancement of the TM excitation through statistical mappings of the SED features introduces significant objective and subjective performance improvements to the enhancement of TM recordings. ©2015 IEEE.en_US
dc.description.provenanceMade available in DSpace on 2018-04-12T10:42:45Z (GMT). No. of bitstreams: 1 bilkent-research-paper.pdf: 179475 bytes, checksum: ea0bedeb05ac9ccfb983c327e155f0c2 (MD5) Previous issue date: 2016en
dc.identifier.doi10.1109/TASLP.2015.2499040en_US
dc.identifier.issn2329-9290
dc.identifier.urihttp://hdl.handle.net/11693/36510
dc.language.isoEnglishen_US
dc.publisherInstitute of Electrical and Electronics Engineers Inc.en_US
dc.relation.isversionofhttp://dx.doi.org/10.1109/TASLP.2015.2499040en_US
dc.source.titleIEEE/ACM Transactions on Audio Speech and Language Processingen_US
dc.subjectGaussian mixture modelen_US
dc.subjectSpeech enhancementen_US
dc.subjectStatistical mappingen_US
dc.subjectThroat microphoneen_US
dc.subjectBandpass filtersen_US
dc.subjectGaussian distributionen_US
dc.subjectMappingen_US
dc.subjectMean square erroren_US
dc.subjectMicrophonesen_US
dc.subjectPhotomappingen_US
dc.subjectQuality controlen_US
dc.subjectSource separationen_US
dc.subjectSpeechen_US
dc.subjectSpeech communicationen_US
dc.subjectSpeech intelligibilityen_US
dc.subjectTelephone setsen_US
dc.subjectTissueen_US
dc.subjectExcitation enhancementen_US
dc.subjectGaussian Mixture Modelen_US
dc.subjectLine spectral frequenciesen_US
dc.subjectLog spectral distortionsen_US
dc.subjectPerceptual evaluation of speech qualitiesen_US
dc.subjectSubjective evaluationsen_US
dc.subjectSubjective performanceen_US
dc.subjectThroat microphonesen_US
dc.subjectAudio recordingsen_US
dc.titleSource and filter estimation for Throat-Microphone speech enhancementen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Source and Filter Estimation for Throat-Microphone Speech Enhancement.pdf
Size:
1.98 MB
Format:
Adobe Portable Document Format
Description:
Full printable version