Java Speech API 2.0 Specification Finally Released
Friday May 08th 2009, 10:27 pm
Filed under: java,language technology,research,technology
Posted by: Andrew Lampert

About 5 years ago, during my Masters studies, I wrote some simple speech applications using Java Speech API (JSAPI) 1.0 compliant speech engines. At the time, the JSR for JSAPI 2.0 was well underway. Well, it’s taken more than 8 years since the formation of the JSR, but *finally* the final release of the Java Speech API (JSAPI) 2.0 specification has been made available, released on 7th May 2009.

Of note, JSAPI 2.0 is now primarily aimed at the Java ME platform (specifically CLDC 1.0 and MIDP 1.0), meaning that it’s hoped the new spec will facilitate speech-enabled java applications on mobile devices. For this reason, gone are all floating point references and dependencies on AWT (yay!). Recognition Engines may provide full support for application-defined grammars or provide more limited support through specialized built-in grammars. Synthesis Engines may support full text-to-speech capabilities or simple text and audio sequencing. According to documentation in the spec, implementations can require 0.5-1.5 MBytes of ROM for models and algorithms and approximately 128 KBytes of RAM depending on vocabulary and grammar size. Of course, JSAPI 2.0 compliant engines can still run on Java SE platforms, and can obviously make good use of more substantial memory and processing resources.

Reinforcing comments made by expert group member Paul Lamere about the difficulties of satisfying all parties and developing a comprehensive speech API, Nokia made the following observation in approving the final specification:

“We think that the API is well designed and has very comprehensive functions. However, it is therefore highly complex and requires fairly advanced speech recognition and synthesis features. It also assumes a high level of speech recognition understanding from the application developer. It might not be feasible in many Java ME devices in the near term, but can provide good features in those high end platforms where applicable.”

Unrelated to Java ME compatibility, also gone are the Java Speech API Grammar Format (JSGF) and Java Speech API Markup Language (JSML), which were defined as companion specifications in JSAPI 1.0. Sensibly, given the standardisation that has thankfully occurred in the intervening years, these have been replaced by the W3C Speech Recognition Grammar Specification (SRGS) and the W3C Speech Synthesis Markup Language (SSML) respectively. After spending some time reviewing the plethora of speech synthesis markup languages, I’m very relieved to see this standardisation.

All in all, while it has taken a long time to come to fruition, I’m very pleased to see the JSAPI 2.0 standard finalised. Of course, given that JSAPI is only a specification (not an implementation) it remains to be seen how quickly the various speech recognition and speech synthesis systems move to support the new and modified APIs.


No Comments so far
Leave a comment



Leave a comment
Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

(required)

(required)