HUSCAP logo Hokkaido Univ. logo

Hokkaido University Collection of Scholarly and Academic Papers >
Theses >
博士 (情報科学) >

A Study on Efficient Robust Speech Recognition with Stochastic Dynamic Time Warping

Files in This Item:
Xihao_Sun.pdf1.29 MBPDFView/Open
Please use this identifier to cite or link to this item:https://doi.org/10.14943/doctoral.k11523
Related Items in HUSCAP:

Title: A Study on Efficient Robust Speech Recognition with Stochastic Dynamic Time Warping
Other Titles: 確率的DTWを用いた高効率ロバスト音声認識に関する研究
Authors: 孫, 喜浩1 Browse this author
Authors(alt): Sun, Xihao1
Issue Date: 25-Sep-2014
Publisher: Hokkaido University
Abstract: In recent years, great progress has been made in automatic speech recognition (ASR) system. The hidden Markov model (HMM) and dynamic time warping (DTW) are the two main algorithms which have been widely applied to ASR system. Although, HMM technique achieves higher recognition accuracy in clear speech environment and noisy environment. It needs large-set of words and realizes the algorithm more complexly.Thus, more and more researchers have focused on DTW-based ASR system.Dynamic time warping (DTW) is based on template matching,it can accomplish time alignment of reference and test speech features by dynamic programming. Conventional DTW is fast and less complexity, however its recognition accuracy is limited. Therefore,Conventional DTW has mostly been used for speech recognition in clear environment.Recently, a DTW with multireferences (mDTW) algorithm has also been developed to improve the recognition accuracy in comparison to the hidden Markov model (HMM)algorithm under noisy conditions. However the mDTW algorithm increases the calculation cost and requires more memory resources which reduce the system practicability.It is possible to reconstruct the multireferences. The new method should be require less memory resources and reduce the calculation cost. Therefore, this study proposes a reconstruction method which add a training part to the DTW-based ASR system. The proposed reconstruction of references is aimed at making the DTW algorithm more effective. According to the DTW algorithm, the optimal warping path implies a minimumerror between any two given sequences. The algorithm that we have proposed will give us a way to build a new reference to replace the original two. This process will be done in three stages; First, for each reference word, speech utterances will be divided into two subsets. Second, for each pair of subsets, the optimal path will be computed and the new reference will replace the pair of subsets. Finally, the new references will be input to the DTW-based ASR system to get the recognition accuracy. The feasibility ofthe proposed technique was examined using computer simulations. The results demonstrated the effectiveness of the proposed technique. The simulation results show that our approach yields 96.94% accuracy compared with the 97.54% accuracy of mDTW in 20 dB white noise and 84.4% accuracy compared with 86.44% accuracy of mDTWin 10 dB white noise. Our approach yields 94.12% accuracy compared with 94.14% accuracy of mDTW in 20 dB babble noise and 80.82% accuracy compared with 81.64%accuracy of in 10 dB babble noise. Comparing our proposed technique to the mDTW,the calculation cost has been reduced 41.6%
Conffering University: 北海道大学
Degree Report Number: 甲第11523号
Degree Level: 博士
Degree Discipline: 情報科学
Examination Committee Members: (主査) 教授 宮永 喜一, 特任教授 野島 俊雄, 特任教授 小川 恭孝, 教授 齊藤 晋聖, 准教授 筒井 弘
Degree Affiliation: 情報科学研究科(メディアネットワーク専攻)
Type: theses (doctoral)
URI: http://hdl.handle.net/2115/57251
Appears in Collections:学位論文 (Theses) > 博士 (情報科学)
課程博士 (Doctorate by way of Advanced Course) > 情報科学院(Graduate School of Information Science and Technology)

Export metadata:

OAI-PMH ( junii2 , jpcoar_1.0 )

MathJax is now OFF:


 

 - Hokkaido University