Communications of the IIMA


Speech recognition (also known as automatic speech recognition) converts spoken words to text. It is a broad term which means it can recognize almost any speech – such as in a call centre system designed to recognize many voices. Speech Recognition in the field of telephony commonplace; and in the field of computer gaming and simulation, is becoming widespread. People with disabilities are another part of the population that benefit from using speech recognition programs. It is becoming increasingly certain, that the interaction between humans and speech recognition engines is on the increase. In certain circumstances, the caller is directed with a series of options. This is called a Directed Dialog interaction. On the other hand, there are situations where the caller is not limited by pre-defined options; but rather given the opportunity to indicate their intent. This scenario is known as an Open Dialog interaction where the caller indicates their intent orally, and the speech platform is expected to correctly interpret the caller’s intent. Such interpretations are prone to variation in recognition and classification. Even if the application software correctly classifies the caller intent, it may not adequately capture the actual utterance. This paper proposes statistical techniques for measuring the performance of three Speech Recognition engines in a directed-dialog scenario.