Incorporating Duration and Intonation Models in Filipino Speech Synthesis

Lazaro, Lito Rodel S.; Policarpio, Leslie L.; Guevara, Rowena Cristina L.


Hokkaido University \| Library \| HUSCAP	Advanced Search		言語

	Home
	About HUSCAP
	Open Access Policy

	Browse by Author

Browse
	Communities & Collections

	Scholarly Journals
	Theses
	Doctoral Dissertations Listed by Graduate Schools
	Conference Procs.
	Events

	HUSCAP Senior (in Japanese)

	Societies

	Downloads (country)

For university staff
	How to post your papers to HUSCAP
	Publication of theses
	Helpline about theses publication

Open Archives Compliant

You can search our collection also at:
	Google
	Google Scholar
	CiNii
	IRDB
	OAIster
	NDLTD

Hokkaido University Collection of Scholarly and Academic Papers >
Hokkaido University Sustainability Weeks >
Sustainability Weeks 2009 >
2009 APSIPA Annual Summit and Conference >

Incorporating Duration and Intonation Models in Filipino Speech Synthesis

Files in This Item:

MA-L2-3.pdf

106.62 kB

PDF

View/Open

Please use this identifier to cite or link to this item:http://hdl.handle.net/2115/39641

Title:	Incorporating Duration and Intonation Models in Filipino Speech Synthesis
Authors:	Lazaro, Lito Rodel S. Browse this author
	Policarpio, Leslie L. Browse this author
	Guevara, Rowena Cristina L. Browse this author
Keywords:	Filipino
	Speech Synthesis
	Prosody
	HNM
Issue Date:	4-Oct-2009
Publisher:	Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference, International Organizing Committee
Journal Title:	Proceedings : APSIPA ASC 2009 : Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference
Start Page:	45
End Page:	49
Abstract:	In this paper we describe the development of an intonation model and a duration model to generate prosody for the Filipino language. Z-scores of normalized durations are used for the duration model and the Tilt parameters are used for the intonation model. The Filipino Speech Corpus (FSC) is the source of statistical data for modeling the duration and intonation. A Classification and Regression Tree (CART) generator is used to build the model for duration and intonation. The Harmonic plus Noise Model (HNM) is developed for the FSC. The diphones are concatenated to produce the synthetic speech and HNM is used to modify the prosody. The synthesized speech is evaluated using the Mean Opinion Score (MOS). Results show that the duration model and the intonation model needs improvement. HNM synthesis performs slightly better than TD-PSOLA (time-domain pitch synchronous overlap-add).
Description:	APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference. 4-7 October 2009. Sapporo, Japan. Oral session: Speech and Music Processing (5 October 2009).
Conference Name:	APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference
Conference Name:	2009年アジア太平洋信号情報処理連合学会アニュアルサミット・国際会議
Conference Place:	Sapporo
Type:	proceedings
URI:	http://hdl.handle.net/2115/39641
Appears in Collections:	北海道大学サステナビリティ・ウィーク2009 (Sustainability Weeks 2009) > 2009年アジア太平洋信号情報処理連合学会アニュアルサミット・国際会議 (2009 APSIPA Annual Summit and Conference)

OAI-PMH ( junii2 , jpcoar_1.0 )

- Hokkaido University