Hokkaido University Collection of Scholarly and Academic Papers >
Hokkaido University Sustainability Weeks >
Sustainability Weeks 2009 >
2009 APSIPA Annual Summit and Conference >
Development of a WFST based Speech Recognition System for a Resource Deficient Language Using Machine Translation
Title: | Development of a WFST based Speech Recognition System for a Resource Deficient Language Using Machine Translation |
Authors: | Jensson, Arnar Thor Browse this author | Oonishi, Tasuku Browse this author | Iwano, Koji Browse this author | Furui, Sadaoki Browse this author |
Issue Date: | 4-Oct-2009 |
Publisher: | Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference, International Organizing Committee |
Journal Title: | Proceedings : APSIPA ASC 2009 : Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference |
Start Page: | 50 |
End Page: | 56 |
Abstract: | Text corpus size is an important issue when building a language model (LM) in particular where insufficient training and evaluation data are available. In this paper we continue our work on creating a speech recognition system with a LM that is trained on a small amount of text in the target language. In order to get better performance we use a large amount of foreign text and a dictionary mapping between the languages. A dictionary is used since we are assuming that the target language is resource deficient and therefore statistical machine translation (MT) is not available. In this paper we take a step forward from our previous published method by using a coupling of the speech recognition part and the translation part rather than pre-translating the foreign text. The coupling is achieved with a weighted finite state transducer (WFST) network which as well makes it possible to easily switch between the output language, i.e. that the output text is in the format of the resource deficient language or in the resource rich language. Our method outperforms the resource-deficient Icelandic speech recognition baseline, 82.6% keyword accuracy (KA), when the system is trained on 1500 Icelandic sentences, both for the English output (2.6% absolute KA improvement) and for the Icelandic output (1.6% absolute KA improvement) where the English text corpus consists of 63003 sentences. |
Description: | APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference. 4-7 October 2009. Sapporo, Japan. Oral session: Speech and Music Processing (5 October 2009). |
Conference Name: | APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference | 2009年アジア太平洋信号情報処理連合学会アニュアルサミット・国際会議 |
Conference Place: | Sapporo |
Type: | proceedings |
URI: | http://hdl.handle.net/2115/39642 |
Appears in Collections: | 北海道大学サステナビリティ・ウィーク2009 (Sustainability Weeks 2009) > 2009年アジア太平洋信号情報処理連合学会アニュアルサミット・国際会議 (2009 APSIPA Annual Summit and Conference)
|
|