Voice Waveform
The inputs are the utterance from the caller and a list of phrases we reasonably expect they may say.
five thirty
five thirty pm
half past five
seventeen thirty
Right arrow
The recogniser compares the digitised utterance with the list of phrases and determines which it thinks was actually said.
Right arrow
The meaning, in this case the time 17.30, is extracted and used by dialoguemachine to determine the actions to carry out and the next step in the flow of the dialogue.

how does speech recognition work?
As described above, speech recognition uses a list of phrases we would reasonably expect a user to say in answer to a particular question. This list is known as a grammar and part of our expertise is crafting these for maximum coverage and accuracy.

Where applicable, we are happy to produce DTMF and non grammar based applications as well - we will always aim for the best caller experience.
In general, the more we can constrain the range of expected answers, the more accurate recognition will be. Accuracy will be very good when we can ask questions like:
  • What time do you want to arrive ?
  • Are you a vegetarian ?
We pretty much know what to expect a caller to say in answer to these. Recognisers have a harder time dealing with answers to questions to which a caller could say pretty much anything such as:
  • What do you want to do today ?
  • How are you feeling ?
Part of our art, is crafting prompts for maximum accuracy whilst retaining flexibility and the user experience.

dialoguemachine can use a number of 3rd party recognisers, the choice being down to such factors as:
  • license cost
  • recognition accuracy
  • language support
  • scalability
  • adherence to standards
Our current recogniser of choice is the Voxeo Prophecy recogniser which we find gives a good cost/performance ratio.