Our friends
Молодіжна дисконтна мережа «KPI SDC»

Our friends
Український музичний ярмарок-2017

Robustness enhancement of automatic speech recognition systems by signal processing techniques

The goal of thesis: development of speech enhancement and robust feature extraction methods for robust automatic speech recognition.

The analysis of modern methods to improve the robustness of automatic speech recognition system is presented. It is indicated that the use of pre-correction enhancement techniques or robust feature-processing methods in automatic speech recognition system allows excluding the need to adapt the parameters of automatic speech recognition system to the distorted signal and thus avoiding complications arising from the need to change the structure and parameters of the existing automatic speech recognition systems.

Therefore, in this thesis such directions as the correction (enhancement) of speech signals by their preprocessing and robust parametric signal representation in the automatic speech recognition system, it decided to devote considerable attention.

Modification of existing logMMSE methods have proposed. It is used instead of noise spectrum estimator for enhancement of speech distorted by reverberation.

Neural network based voice activity detector for automatic speech recognition system have proposed. It is enabling the use of robust features power normalized cepstral coefficients with non-stationary noise.

It had been proposed to include as a classification feature trajectory of the pitch. For this purpose, it had been proposed the use of pitch tracking algorithm in noisy speech.

Adaptive correction parameters algorithm of neural network based voice activity detector had been proposed to accelerate the learning process. Systematic evaluations shows that the proposed neural network based voice activity detector are robust to different noise conditions. The proposed approach also outperforms other state-of-the-art voice activity detection algorithms.


Keywords: late reverberation suppression, neural network, pitch tracking, robust speech recognition, speech enhancement, voice activity detection.