Sunday, February 6, 2011

Text To Speech using Java API - Voice


In my previous Text To Speech post  , I talked about java speech API, FreeTTS and sample code for converting text to speech.
In this post , I am going to give an alternative way to achieve the same using FreeTTS.


There are a number of objects that work together to perform speech synthesis. One of them is voice.
The Voice is the central processing point for FreeTTS. The Voice takes as input a FreeTTSSpeakable, translates the text associated with the FreeTTSSpeakable into speech and generates audio output corresponding to that speech. Read more

The VoiceManager is the central repository of voices available to FreeTTS. You can use this sample code to get available voice list.

package com.sarf.tts;

import com.sun.speech.freetts.Voice;
import com.sun.speech.freetts.VoiceManager;

public class VoiceDetector {
    public static void main(String[] args) {
     VoiceManager voiceManager;
     voiceManager = VoiceManager.getInstance();
     // Get all available voices
     Voice[] voices = voiceManager.getVoices();
     for (int i = 0; i < voices.length; i++) {
       System.out.println(voices[i].getName());
     }
   }
}

File Reader

This java class demonstrate the capability of java speech to read a text file loudly.
package com.sarf.tts;

import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;

import com.sun.speech.freetts.Voice;
import com.sun.speech.freetts.VoiceManager;

public class FileSpeaker {
  public static void main(String[] args) throws Exception {
     // Voice Name
     String voiceName = "kevin";
     VoiceManager voiceManager = null;
     Voice voice = null;

     voiceManager = VoiceManager.getInstance();
     voice = voiceManager.getVoice(voiceName);

     voice.setPitch((float) 4.00);
     voice.setPitchShift((float) .005);
     voice.setPitchRange((float) 0.01);
     // "business", "casual", "robotic", "breathy"
     voice.setStyle("business");
           
     //allocate the resources for the voice
     voice.allocate();

     // Create input stream from file
     InputStream in = new FileInputStream(new File("D:/temp/sample.txt"));
     voice.speak(in);

     voice.deallocate();
   }
} 
A wise old owl sat on an oak; The more he saw the less he spoke; The less he spoke the more he heard; Why aren't we like that wise old bird?

Speech To Text using Java API


In this post, we are going to learn about speech to text conversion using java speech API.
We will be discussing the following topics :

  • Brief about Speech Recognition.
  • Setup Speech Recognition engine
  • Verifying the Speech Recognition engine installation
  • Sample code
  • Conclusion

Speech Recognition is the process of converting spoken input to digital output, such as text.
Speech recognition systems provide computers with the ability to listen to user speech and determine what is said.

The Speech Recognition process can be divided into these four steps:
  1. Speech is converted to digital signals.
  2. Actual speech sounds are extracted from the sounds (based on energy of the sounds).
  3. The extracted sounds are put together into 'speech frames.'
  4. The speech frames are compared with words from the grammar file to determine the spoken word.
We are going to use a third party java speech recognizer engine TalkingJava SDK which is a full implementation of Sun's Java Speech API providing Text-To-Speech and Speech-Recognition engines.

TalkingJava SDK website has been shutdown. I have shared the old SDK You can download from here. Click Here 


Four easy step to set up  TalkingJava SDK
  1. Download TalkingJava SDK Installer file : Download
  2. Click on TalkingJavaSDK-1xx.jar and follow the instruction to install SDK.
  3. Unpack  TalkingJavaSDK-1xx.jar file. Then unpack packet.jar (available inside TalkingJavaSDK-1xx directory).
  4. Copy cgjsapi.jar and cgjsapi1xx.dll files available inside packet directory to your Java\jdk1.x.x.x\jre\lib\ext\  direcoty.
  5. Include your packet directory path into your CLASSPATH environment variable.
You can also refer to the installation guide here -cloudgarden installation


We are going to verify whether the speech recognition engine has been installed successfully or not using a simple java program.
TestRecognizerConfig.java
package com.sarf.talkingjava;
package com.sarf.talkingjava;

import java.util.Locale;
import javax.speech.Central;
import javax.speech.EngineList;
import javax.speech.recognition.RecognizerModeDesc;

public class TestRecognizerConfig {
  public static void main(String[] args) {
  try
    {
     Central.registerEngineCentral 
      ("com.cloudgarden.speech.CGEngineCentral");
     RecognizerModeDesc desc =  
       new RecognizerModeDesc(Locale.US,Boolean.TRUE);
     EngineList el = Central.availableRecognizers(desc);
     if(el.size() < 1){
        System.out.println("Recognition Engine is not available");
        System.exit(1);
        }else{
        System.out.println("Recognition Engine is available");
        System.exit(1);
        }
     }catch(Exception exception)
     {
      exception.printStackTrace();
      }
   }
}
Run this class file and check the output. If output is Recognition Engine is available then you have installed recognition engine successfully and you are good to go.


SpeechToTextConverter.java
package com.sarf.talkingjava;

import javax.speech.Central;
import javax.speech.recognition.*;
import java.io.FileReader;
import java.util.Locale;

public class SpeechToTextConverter extends ResultAdapter {
  static Recognizer recognizer;
  public void resultAccepted(ResultEvent resultEvent) {
    Result result = (Result)(resultEvent.getSource());
    ResultToken resultToken[] = result.getBestTokens();
    for (int nIndex = 0; nIndex < resultToken.length; nIndex++){
        System.out.print(resultToken[nIndex].getSpokenText() + " ");
    }
    try {      
         // Deallocate the recognizer
         recognizer.forceFinalize(true);          
         recognizer.deallocate();
        }catch (Exception exception) {
         exception.printStackTrace();
        }
       System.exit(0);
    }

    public static void main(String args[]) {
    try {
          Central.registerEngineCentral 
             ("com.cloudgarden.speech.CGEngineCentral");
          RecognizerModeDesc desc = 
              new RecognizerModeDesc(Locale.US,Boolean.TRUE);
         // Create a recognizer that supports US English.
         recognizer = Central.createRecognizer(desc);
         // Start up the recognizer
         recognizer.allocate();
         // Load the grammar from a file, and enable it
         FileReader fileReader =  
             new FileReader("D:\\my_grammar.grammar");
         RuleGrammar grammar = recognizer.loadJSGF(fileReader);
         grammar.setEnabled(true);
         // Add the listener to get results
         recognizer.addResultListener(new SpeechToTextConverter());
         // Commit the grammar
         recognizer.commitChanges();
         recognizer.waitEngineState(Recognizer.LISTENING);
         // Request focus and start listening
         recognizer.requestFocus();
         recognizer.resume();
         recognizer.waitEngineState(Recognizer.FOCUS_ON);
         recognizer.forceFinalize(true);               
         recognizer.waitEngineState(Recognizer.DEALLOCATED);
        } catch (Exception e) {
          e.printStackTrace();
          System.exit(0);
         }
   }
}
A Little Grammer
The JSpeech Grammar Format (JSGF) is a platform-independent, vendor-independent textual representation of grammars for use in speech recognition. Grammars are used by speech recognizers to determine what the recognizer should listen for, and so describe the utterances a user may say. JSGF adopts the style and conventions of the JavaTM Programming Language in addition to use of traditional grammar notations. To know more about JSGF
Here is the content of  my_grammar.grammar file.

#JSGF V1.0;

grammar com.sarf.talkingjava.example;

public <startExample> = (please | My name is sarf | 
What is your Name | Open Firefox | Open notepad |Open grammar |
Please to meet you ) *;
public <endExample> = [thanks | thank you | thank you very much];


There are few limitations of speech recognition technology. It does not transcribe  free-format speech input. So you might be getting some thing different while transcription. Speech recognition is constrained by the grammar.