A Study on Robust Speech Recognition with Time Varying Speech Features

Mufungulwa, George


Hokkaido University \| Library \| HUSCAP	Advanced Search		言語

	Home
	About HUSCAP
	Open Access Policy

	Browse by Author

Browse
	Communities & Collections

	Scholarly Journals
	Theses
	Doctoral Dissertations Listed by Graduate Schools
	Conference Procs.
	Events

	HUSCAP Senior (in Japanese)

	Societies

	Downloads (country)

For university staff
	How to post your papers to HUSCAP
	Publication of theses
	Helpline about theses publication

Open Archives Compliant

You can search our collection also at:
	Google
	Google Scholar
	CiNii
	IRDB
	OAIster
	NDLTD

Hokkaido University Collection of Scholarly and Academic Papers >
Theses >
博士　（情報科学） >

A Study on Robust Speech Recognition with Time Varying Speech Features

Files in This Item:

George_Mufungulwa.pdf

931.28 kB

PDF

View/Open

Please use this identifier to cite or link to this item:https://doi.org/10.14943/doctoral.k12944

Related Items in HUSCAP:

論文内容及び審査の要旨
A Study on Robust Speech Recognition with Time Varying Speech Features [an abstract of dissertation and a summary of dissertation review]

Title:	A Study on Robust Speech Recognition with Time Varying Speech Features
Other Titles:	音声の時変特徴量を用いた雑音ロバスト音声認識に関する研究
Authors:	Mufungulwa, George Browse this author
Issue Date:	25-Dec-2017
Publisher:	Hokkaido University
Abstract:	Speech feature extraction algorithms have become popular. Speech features can be usedfor various applications: biometric recognition, speech recognition, speaker identification,and so on. In these applications, a good speech feature can be obtained usingMel frequency cepstrum Coefficients (MFCC), Linear Predictive Coding (LPC), Timevarying LPC (TVLPC), Perceptual Linear Predictive (PLP) among others. This thesisfocuses on the use of TVLPC among feature extraction algorithms to improve the robustnessof automatic speech recognition (ASR) systems against various multiplicative andadditive noises. Time varying speech features (TVSF) are implemented in ASR withthe aim of improving the recognition accuracy on a number of small set of referencespeech databases. The significance of the study is based on the fact that both additiveand multiplicative noises cause great performance degradation of ASR systems, therebylimiting the speech recognition accuracy in real environments. For this reason, featurecorrection, feature compensation and normalization approaches are considered in orderto improve the robustness of a speech recognition system.The performance degradation is partly due to statistical mismatch between trainedacoustic model of clean speech features and noisy testing speech features. For the purposeof reducing the feature-model mismatch, corrective, compensation as well as normalizationtechniques are employed both during training and testing of speech features.In order to achieve improved system performance, normalization in modulationspectrum domain is used to remove non-speech components over a certain frequencyrange using running spectrum analysis (RSA) as a band pass filter. In comparison toother noise reduction techniques used in this study on robust speech recognition, theRSA filter has an advantage due to its adaptable parameters, that is, the first and secondpass band frequencies can easily be adjusted accordingly. In addition, speech featureenhancement using dynamic range adjustment (DRA) is utilized. The enhancement isaimed at correcting the difference between clean and noisy speech features by normalizingamplitude of speech features. For the purpose of channel normalization, cepstrummean subtraction (CMS) is used in this study.Two alternative time varying speech features (TVSF) methods are being proposedand compared with conventional Mel frequency cepstral coefficients (MFCC) featuresfor noisy speech recognition.The first experimental study shows that fast Fourier transform (FFT) based Mel frequencycepstrum coefficients (MFCCs) with directly converted time varying linear prediction(TVLPC) based MFCCs, which in this study is defined as time varying speechfeatures (TVSF), shows a competitive recognition accuracy performance to that of FFTbased MFCCs alone.In the second experimental study, robustness of speech recognition is further improvedby applying mel filtering and logarithmic transformations to short time windowedtime varying coefficients before converting to cepstrum coefficients in place ofdirect-converted TVLPC speech features. Results show that RSA produces better performancethan DRA and CMS/DRA on both similar pronunciation phrases and phrasesuttered by elderly persons. Experimental study shows that the use of time varying speechfeatures (TVSP) can produce improved speech recognition accuracy even if there is amismatch between the training and testing data sets.
Conffering University:	北海道大学
Degree Report Number:	甲第12944号
Degree Level:	博士
Degree Discipline:	情報科学
Examination Committee Members:	(主査) 教授宮永喜一, 教授大鐘武雄, 教授齊藤晋聖, 准教授筒井弘
Degree Affiliation:	情報科学研究科（メディアネットワーク専攻）
Type:	theses (doctoral)
URI:	http://hdl.handle.net/2115/68114
Appears in Collections:	課程博士 (Doctorate by way of Advanced Course) > 情報科学院(Graduate School of Information Science and Technology) 学位論文 (Theses) > 博士　（情報科学）

OAI-PMH ( junii2 , jpcoar_1.0 )

- Hokkaido University