IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 10, NO. 8, NOVEMBER 2002
Noise-normalized SPLICE denoising using the iterative stochastic algorithm for tracking nonstationary noise in an utterance of the Aurora2 data with an
10 dB. From top to bottom panels are noisy speech, clean speech, and denoised speech, all in the same spectrogram format.
algorithm has also been successfully extended from the max- imum likelihood version to the new MAP version to take ad- vantage of the noise prior information and to include a more accurate environment model that captures more detailed prop- erties of acoustic distortion .
F. Aurora2 Evaluation Results
Noise-robust connected digit recognition results obtained using the best version of SPLICE are shown in Fig. 6 for the full Aurora2 evaluation test data. Details of the Aurora2 task have been described in . Aurora2 is based on the TIDigits database that is corrupted digitally by adding different types of realistic, nonstationary noises at a wide range of SNRs (all Sets A, B, and C) and optionally passing them through a linear filter (Set C only). Sets-A and -B each consists of 1101 digit sequences for each of four noise conditions and for each of the 0 dB, 5 dB, 10 dB, 15 dB, and 20 dB SNRs. The same is for Set-C except there are only two noise conditions. All the results in Fig. 6 are obtained with the use of cepstral mean nor- malization (CMN) for all data after applying noise-normalized, dynamic SPLICE to cepstral enhancement. The use of CMN has substantially improved the recognition rate for Set-C. For simplicity, we have assumed no channel distortion in the imple- mentation of the iterative stochastic approximation algorithm for noise estimation. This assumption would not be appropriate for Set-C which contains unknown but fixed channel distortion. This deficiency has been, at least partially, offset by the use
of CMN. All the recognition experiments reported here were obtained using the standard Aurora recognition system  instead of our internal recognition system.
The word error rate reduction achieved as shown in Fig. 6 is 27.9% for the multicondition training mode, and 67.4% for the clean-only training mode, respectively, compared with the results using the standard Mel cepstra with no speech enhance- ment. In the multicondition training mode, the denoising algo- rithm is applied to the training data set and the resulting de- noised Mel-cepstral features are used to train the HMMs. In the clean-only training mode, the HMMs are trained using clean speech Mel-cepstra and the denoising algorithm is applied only to the test set. The results in Fig. 6 represent the best perfor- mance in the September-2001 Aurora2 evaluation in the cate- gory of the clean speech training mode . The experimental results also demonstrated the crucial importance of using the newly introduced iterations in improving the earlier stochastic approximation technique, and showed a varying degree of sen- sitivity, depending on the degree of noise nonstationarity, of the noise estimation algorithm’s performance to the forgetting factor embedded in the algorithm . More recently, the success of the noise-normalized SPLICE algorithm has been extended from the Aurora2 task to the Aurora3 task .
There has been a wide range of research groups around the world working on the same problem of noise-robust speech recognition for mobile and other devices as we are interested in; see , , , , , ,  for selected ap- proaches taken by some of these research groups. The general