Enhancement of Esophageal Speech Using Statistical Voice Conversion

Doi, Hironori; Nakamura, Keigo; Toda, Tomoki; Saruwatari, Hiroshi; Shikano, Kiyohiro


Hokkaido University \| Library \| HUSCAP	Advanced Search		言語

	Home
	About HUSCAP
	Open Access Policy

	Browse by Author

Browse
	Communities & Collections

	Scholarly Journals
	Theses
	Doctoral Dissertations Listed by Graduate Schools
	Conference Procs.
	Events

	HUSCAP Senior (in Japanese)

	Societies

	Downloads (country)

For university staff
	How to post your papers to HUSCAP
	Publication of theses
	Helpline about theses publication

Open Archives Compliant

You can search our collection also at:
	Google
	Google Scholar
	CiNii
	IRDB
	OAIster
	NDLTD

Hokkaido University Collection of Scholarly and Academic Papers >
Hokkaido University Sustainability Weeks >
Sustainability Weeks 2009 >
2009 APSIPA Annual Summit and Conference >

Enhancement of Esophageal Speech Using Statistical Voice Conversion

Files in This Item:

WA-P1-3.pdf

447.71 kB

PDF

View/Open

Please use this identifier to cite or link to this item:http://hdl.handle.net/2115/39810

Title:	Enhancement of Esophageal Speech Using Statistical Voice Conversion
Authors:	Doi, Hironori Browse this author
	Nakamura, Keigo Browse this author
	Toda, Tomoki Browse this author
	Saruwatari, Hiroshi Browse this author
	Shikano, Kiyohiro Browse this author
Issue Date:	4-Oct-2009
Publisher:	Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference, International Organizing Committee
Journal Title:	Proceedings : APSIPA ASC 2009 : Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference
Start Page:	805
End Page:	808
Abstract:	This paper presents a novel method of enhancing esophageal speech based on statistical voice conversion. Esophageal speech is one of the speaking methods for total laryngectomees. Although it allows laryngectomees to speak by generating a sound source and articulating it to produce audible speech sounds using their esophagus and vocal organs, the generated voices sound unnatural. To improve the naturalness of esophageal speech, we propose a voice conversion method from esophageal speech into normal speech (ES-to-Speech). A spectral parameter and excitation parameters, such as F0 and aperiodic components, of normal speech are separately estimated from the spectral parameter of the esophageal speech in the sense of maximum likelihood using different Gaussian mixture models. We conduct objective and subjective evaluations of the proposed method. The experimental results demonstrate that the proposed method yields significant improvements in naturalness of esophageal speech while maintaining its intelligibility.
Description:	APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference. 4-7 October 2009. Sapporo, Japan. Poster session: Speech Processing (7 October 2009).
Conference Name:	APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference
Conference Name:	2009年アジア太平洋信号情報処理連合学会アニュアルサミット・国際会議
Conference Place:	Sapporo
Type:	proceedings
URI:	http://hdl.handle.net/2115/39810
Appears in Collections:	北海道大学サステナビリティ・ウィーク2009 (Sustainability Weeks 2009) > 2009年アジア太平洋信号情報処理連合学会アニュアルサミット・国際会議 (2009 APSIPA Annual Summit and Conference)

OAI-PMH ( junii2 , jpcoar_1.0 )

- Hokkaido University