EnglishFrançais
web-call-back_us.jpg
 

Subscribe Newsletter

Subscribe Newsletter




Home arrow Solutions arrow Sibilo Contact Center arrow Speech Recognition & Synthesis

Speech Synthesis and Recognition

Print

pictos-asr_tts.jpg

Welcome to the future. Say goodbye to those high-pitched robotic voices which are now ancient history. Today’s new breed of speech recognition and synthesis technologies are now being widely adopted on the market. Integrated with your information system, these new technologies become remarkably powerful customer service tools.

 

Function

Speech synthesis

Speech synthesis, today referred to as Text-to-Speech (TTS), converts normal language text into speech. The texts can come from a database to restore a variety of sounds in accordance with the person hearing them. Speech synthesis is now widely used in dynamic voice portals to communicate a wide range of information including : account balances, flight times, delivery times, etc.

 

 

Speech recognition

Speech recognition, now referred to as Automatic Speech Recognition (ASR), is a technology that allows a computer to identify the words that a person speaks into a microphone or telephone. Coupled with speech synthesis, voice command, voice identification and comprehension, ASR forms the ideal man-machine interface, comfortably handling 10 times more information than you would with a keyboard entry system...  

 

 

Though speech synthesis (TTS - Text To Speech) doesn’t necessarily require speech recognition (ASR), the opposite is not true. In a Man/Machine conversation with speech recognition, complementary synthesis resources are indispensable.
 
Vocal 2.0 is today’s byword for the outstanding performance provided by this technology.

 


Key advantages 

  • Navigation by voice, no need for a keyboard 
  • It’s fast : connections are much faster  
  • Innovative customer service interactions
  • Information is easily accessed by natural language
  • Calls are routed to the best qualified agents
  • Highly efficient : much more information is gathered
  • Modern & Flexible : intelligent man-machine language for a more natural navigation experience 

 

How is a speech recognition project materialized ?


There are 4 main phases in a speech recognition project

 

1. The existing voice system (IVR, call center) is audited, with analysis for potential integration with the we

Voice ergonomics, preparation of scenario, detailed specification of service, scenario for transition to DTMF.
This phase represents 15% of the project development time.

2. Advice on voice ergonomics

The various dialogs are described, integrating the natural language in the man / IVR dialog.
In the past, a big mistake consisted in substituting an action on a button by a keyword, like « say BALANCE or press button 23 (assuming you find it !) to access your account balance”.
This phase represents 35% of the project development time.

3. Development and Prototype

Here, we create the VXML pages and grammars, scenarios and confirmation audio messages.
This phase represents 15% of the project development time.

4. Analysis and Tuning

Human conversations are listened to using the IVR to improve the phonetic conversation by the speech recognition engine. During this phase, the speech recognition rate is improved to achieve a success rate of up to 80%.
This phase represents 20% of the project development time.


The core of a natural language voice application is the grammar. Grammar is used by the speech synthesis engine to translate the different ways of pronouncing a word.

top.gif


Typical Dialog

 

In a modern Contact Center, a dialog may sound something like this :

The customer calls a travel agency.
  • [SVI] : « Hello, welcome to SibiloTour » --> pre-pickup with fixed audio prompt
  • [SVI] : « How can I help you? » [TTS] « Eric, are you calling to know if the plane ticket you ordered yesterday was sent ? »

  • [Client] : « Yes » -->[ASR processing]

  • [TTS] « It was sent in the post yesterday to the following address: 12 rue Victor Hugo, in Paris, 15th arrondissement »
  • [TTS] « You will be spending two nights in Paris. You haven’t reserved a hotel. Would you like Mathilde or someone else available make you a reservation? »

[Client] : « No, I would like to modify my ticket please » -->[ASR processing]

  • [TTS] « You wish to change your ticket, is that right? »

  • [Client] : « Yes » -->[ASR processing]

  • [TTS] « Please hold, I’ll connect you with Mathilde in less than 1 minute. While you’re waiting, would you like to listen to some music, or the news?»

  • [Client] : « Jazz, please » -->[ASR processing]
 

 

Used Technology : MRCP

Sibilo Voice, App-line’s IVR, seamlessly interfaces with today’s leading voice synthesis and recognition engines.

 

How it works

 

schema_tts.jpgText To Speech (TTS) : The voice server sends a text (list of words or phrase, to the speech synthesis server which in turns sends an audio stream to the customer’s telephone. With today’s increasingly sophisticated speech synthesis engines, it is possible to change voice gender (male or female), intonation, or talking speed. The voice can also be mixed with music.

 

 

 

 

schema_asr.jpg
Speech recognition (ASR) : The principle is reversed. The voice server generates a sound to the speech recognition engine with a grammar of words to be recognized. In return, the ASR engine informs the IVR that the person has pronounced, or not, the words of the grammar. It also quantifies the accuracy with which the word “Boat” was recognized.

 

 

 

 

 


Not too many years ago, TTS and ASR servers used with IVRs needed to be combined into a single product. This made speech scenario development much more complex and time consuming. Today, the IVRs and the TTS and ASR servers run on separate machines and dialog by the MRCP protocol. The end result : the development time is now divided by more that 10.

Using the MRCP protocol, Sibilo Voice operates with most of the speech synthesis and recognition engines available today. The 4 industry leaders are active in Europe and their engines run with App-line’s IVR. Certain brands have both a speech synthesis and recognition engine, while others only provide one of the 2 products. 

 

Compliant with Sibilo Voice   TTS 
  ASR 
 acapella.gif    
 logo_loquendo.jpg    
 logo_nuance.jpg    
 logo_orange.jpg    
 logo_telisma.jpg    

  top.gif