Prosody-based automatic segmentation of speech into sentences and topics

dc.citation.epage154en_US
dc.citation.issueNumber1-2en_US
dc.citation.spage127en_US
dc.citation.volumeNumber32en_US
dc.contributor.authorShriberg, E.en_US
dc.contributor.authorStolcke, A.en_US
dc.contributor.authorHakkani-Tür, D.en_US
dc.contributor.authorTür, G.en_US
dc.date.accessioned2016-02-08T10:37:29Z
dc.date.available2016-02-08T10:37:29Zen_US
dc.date.issued2000en_US
dc.departmentDepartment of Computer Engineeringen_US
dc.description.abstractA crucial step in processing speech audio data for information extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segmentation is challenging, since the cues typically present for segmenting text (headers, paragraphs, punctuation) are absent in spoken language. We investigate the use of prosody (information gleaned from the timing and melody of speech) for these tasks. Using decision tree and hidden Markov modeling techniques, we combine prosodic cues with word-based approaches, and evaluate performance on two speech corpora, Broadcast News and Switchboard. Results show that the prosodic model alone performs on par with, or better than, word-based statistical language models-for both true and automatically recognized words in news speech. The prosodic model achieves comparable performance with significantly less training data, and requires no hand-labeling of prosodic events. Across tasks and corpora, we obtain a significant improvement over word-only models using a probabilistic combination of prosodic and lexical information. Inspection reveals that the prosodic models capture language-independent boundary indicators described in the literature. Finally, cue usage is task and corpus dependent. For example, pause and pitch features are highly informative for segmenting news speech, whereas pause, duration and word-based cues dominate for natural conversation.en_US
dc.description.provenanceMade available in DSpace on 2016-02-08T10:37:29Z (GMT). No. of bitstreams: 1 bilkent-research-paper.pdf: 70227 bytes, checksum: 26e812c6f5156f83f0e77b261a471b5a (MD5) Previous issue date: 2000en_US
dc.identifier.doi10.1016/S0167-6393(00)00028-5en_US
dc.identifier.issn1872-7182en_US
dc.identifier.issn0167-6393en_US
dc.identifier.urihttp://hdl.handle.net/11693/25001en_US
dc.language.isoEnglishen_US
dc.publisherElsevieren_US
dc.relation.isversionofhttp://dx.doi.org/10.1016/S0167-6393(00)00028-5en_US
dc.source.titleSpeech Communicationen_US
dc.subjectComputational Linguisticsen_US
dc.subjectComputer Simulationen_US
dc.subjectInformation Retrievalen_US
dc.subjectMarkov Processesen_US
dc.subjectSpeech Analysisen_US
dc.subjectTrees (Mathematics)en_US
dc.subjectWord Processingen_US
dc.subjectHidden Markov Models (HMM)en_US
dc.subjectProsody-Based Speech Segmentationen_US
dc.subjectSpeech Audio Dataen_US
dc.subjectWord-Based Statistical Language Modelsen_US
dc.subjectSpeech Processingen_US
dc.titleProsody-based automatic segmentation of speech into sentences and topicsen_US
dc.typeArticleen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Prosody-based automatic segmentation of speech into sentences and topics.pdf
Size:
423.35 KB
Format:
Adobe Portable Document Format
Description:
Full printable version