Voice Waveform
inputs
The inputs are the utterance from the caller and a list of phrases we reasonably expect they may say.
five thirty
five thirty pm
half past five
seventeen thirty
...
Right arrow
Logos
recogniser
The recogniser compares the digitised utterance with the list of phrases and determines which it thinks was actually said.
Right arrow
Clock
meaning
The meaning, in this case the time 17.30, is extracted and used by dialoguemachine to determine the actions to carry out and the next step in the flow of the dialogue.


how does speech recognition work?
As described above, speech recognition uses a list of phrases we would reasonably expect a user to say in answer to a particular question. This list is known as a grammar and part of our expertise is crafting these for maximum coverage and accuracy.

Where applicable, we are happy to produce DTMF and non grammar based applications as well - we will always aim for the best caller experience.
accuracy
In general, the more we can constrain the range of expected answers, the more accurate recognition will be. Accuracy will be very good when we can ask questions like:
  • What time do you want to arrive ?
  • Are you a vegetarian ?
We pretty much know what to expect a caller to say in answer to these. Recognisers have a harder time dealing with answers to questions to which a caller could say pretty much anything such as:
  • What do you want to do today ?
  • How are you feeling ?
Part of our art, is crafting prompts for maximum accuracy whilst retaining flexibility and the user experience.

Please check out the accuracy of some common questions for yourself by calling the recognition playground.
 
recognisers
dialoguemachine can use a number of 3rd party recognisers, the choice being down to such factors as:
  • license cost
  • recognition accuracy
  • language support
  • scalability
  • adherence to standards
Our current recogniser of choice is the Voxeo Prophecy recogniser which we find gives a good cost/performance ratio.