This variability has three main components: linguistic variability, speaker variability, and channel variability. The major obstacle to high-accuracy recognition is the large variability in the speech signal characteristics. THE SPEECH RECOGNITION PROBLEMĪutomatic speech recognition can be viewed as a mapping from a continuous-time signal, the speech signal, to a sequence of discrete entities, for example, phonemes (or speech sounds), words, and sentences. We will argue that future advances in speech recognition must continue to rely on finding better ways to incorporate our speech knowledge into advanced mathematical models, with an emphasis on methods that are robust to speaker variability, noise, and other acoustic distortions. The capabilities of neural networks to model highly nonlinear functions can be used to develop new features from the speech signal, and their ability to model posterior probabilities can be used to improve recognition accuracy. Improving the performance of HMM systems. This paper also reviews more recent research directions, including the use of segmental models and artificial neural networks in Important to these improvements have been the availability of common speech corpora for training and testing purposes and the adoption of standard testing procedures. As a result of modeling advances, recognition error rates have dropped several fold. This paper describes the speech recognition process and provides typical recognition accuracy figures obtained in laboratory tests as a function of vocabulary, speaker dependence, grammar complexity, and the amount of speech used in training the system. The traditional processes of segmentation and labeling of speech sounds are now merged into a single probabilistic process that can optimize recognition accuracy. These methods are capable of modeling time and spectral variability simultaneously, and the model parameters can be estimated automatically from given training speech data. While knowledge of properties of the speech signal and of speech perception have always played a role, recent improvements have relied largely on solid mathematical and probabilistic modeling methods, especially the use of HMMs for modeling speech sounds. This paper focuses on speech modeling advances in continuous speech recognition, with an exposition of hidden Markov models (HMMs), the mathematical backbone behind these advances. Rather than being mostly a laboratory endeavor, speech recognition is fast becoming a technology that is pervasive and will have a profound influence on the way humans communicate with machines and with each other. The paradigm shift is taking place in the way we view and use speech recognition. increased power of audio-capable, off-the-shelf workstations.better recognition search strategies that reduce the time needed for high-accuracy recognition and.higher-accuracy continuous speech recognition, based on better speech modeling techniques.In the case of continuous speech recognition, the following advances have converged to make the new technology possible: Such software-based, real-time solutions usher in a whole new era in the development and utility of speech recognition technology.Īs is often the case in technology, a paradigm shift occurs when several developments converge to make a new capability possible. Users will be able to tailor recognition capabilities to their own applications. High-accuracy, real-time, speaker-independent,Ĭontinuous speech recognition for medium-sized vocabularies (a few thousand words) is now possible in software on off-the-shelf workstations. Recently, a qualitative change in the state of the art has emerged that promises to bring speech recognition capabilities within the reach of anyone with access to a workstation. More and more, speech recognition technology is making its way from the laboratory to real-world applications. This paper focuses on the speech recognition advances made through better speech modeling techniques, chiefly through more accurate mathematical modeling of speech sounds. These advances promise to make speech recognition technology readily available to the general public. A reduction in the word error rate by more than a factor of 5 and an increase in recognition speeds by several orders of magnitude (brought about by a combination of faster recognition search algorithms and more powerful computers), have combined to make high-accuracy, speaker-independent, continuous speech recognition for large vocabularies possible in real time, on off-the-shelf workstations, without the aid of special hardware. In the past decade, tremendous advances in the state of the art of automatic speech recognition by machine have taken place. John Makhoul and Richard Schwartz SUMMARY State of the Art in Continuous Speech Recognition
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |