Winter 2021 Thesis Defense Schedule

Thesis Defense Schedule

W Bothell sculpture

PLEASE JOIN US AS THE FOLLOWING CANDIDATES PRESENT THEIR CULMINATING WORK.

Winter 2021

For winter quarter 2021, all Final Examination and Defenses will not be held in person due to public health guidelines. For a link to attend a candidate's online defense, please contact our office at stemgrad@uw.edu.

Friday, March 12

Ruohao “Eddie” Li

Chair: Dr. Kaibao Nie
Candidate: Master of Science in Electrical Engineering

11:00 A.M.; Online
Improving Keywords Spotting in Noise with Augmented Dataset from Vocoded Speech and Speech Denoising

As more electronic devices have an on-device Keywords Spotting (KWS) system, producing and deploying trained models for keyword(s) detection is becoming more demanding. The dataset preparation process is one of the most challenging and tedious tasks in Keywords Spotting. It requires a significant amount of time to obtain raw or segmented audio speeches. In this thesis, we first proposed a data augmentation strategy using a speech vocoder to generate vocoded speech at different numbers of channels artificially. Such a strategy can increase the dataset size by at least two-fold, depending on the use case. With the new features introduced by the different number of channels of the vocoded speeches, a convolutional neural network (CNN) KWS system trained with the augmented dataset from vocoded speech showed promising improvement evaluated at +10 dB SNR noisy condition. The same results were confirmed in hardware implementation and proved using vocoded speech in data augmentation is the potential to improve KWS on microcontrollers. We further proposed a neural-network-based speech denoising system using the Weighted Overlap-Add (WOLA) algorithm for feature extraction for more efficient processing. The proposed speech denoising system uses regression between a noisy speech and a clean speech and converts noisy speech (as input) into clean speech (as output). Thus, the input of the proposed KWS system will be relatively clean speech. Furthermore, by changing the training target to vocoded speech, such a speech denoising system can convert noisy speech (as input) into vocoded speech (as output). The combination of speech denoising and vocoded speech in data augmentation achieved relatively high accuracy when evaluated at +10 dB SNR noisy condition.

Back to top

Questions: Please email eegrad@uw.edu

Thesis Candidates
Ruohao “Eddie” Li