Java API for Voice Based Solutions Java Programs and Examples with Output

Introduction

Voice based solutions are needed when there is a legal requirement to play back an already recorded statement, recording and playing back the telephony conversations, speech recognition systems, visually impaired people, for training customer support representations and so on. Voice based solutions can be implemented using J2SE Java Sound API or Java Media Framework (JMF). Java Sound API specification, available from J2SE 1.3.x and higher, provides low-level support for audio-operations such as audio playback and capture (recording), mixing, MIDI-sequencing, and MIDI synthesis in an extensible, flexible framework. But JMF is a much richer set of API communizing all kinds of media with one single set of interfaces. This document explains about Java Sound API and its implementation.

Java Sound Overview

Java Sound API provides playback and capture support for PCM encoded WAVE, AU, AIFF, AIFC audio file formats by default. Playback and capture of non-standard audio formats like mp3, Ogg, Speex, GSM 6.1.0, Tritonus can also be implemented using Java Sound API. Support for vendor specific formats is provided transparently by an extension framework exposed in the form of Java Sound Service Provider Interfaces (SPI). SPI allows to plug-in-in different encoders, decoders for vendor formats and transcoders for different formats. An implementation of Java Sound SPI should be registered as an extension to standard Java SDK by making it available in the CLASSPATH of Java-virtual-machine. Application code making use of Java Sound API is hence made independent of vendor specific audio implementations.

Playback and Capture using Java Sound

In order to play or capture audio using the Java Sound API, at least three things are needed:

Formatted audio-data - Formatted audio data refers to sound in any of a number of standard formats.
A Mixer - In the Java Sound API, devices are represented by Mixer objects. A device is often a software interface to a physical input/output device.
A Line - A line is an element of the digital audio "pipeline"—that is, a path for moving audio into or out of the system.

Audio Format encapsulates encoding technique, number of channels, sample rate, bits/sample, frame rate, frame size (in bytes), byte-order, properties.

A possible configuration of lines for Audio-Output may be represented as below...

A possible configuration of lines for Audio-Input may be represented as below...

The hierarchy of the audio line interfaces is as follows...

Steps involved for recording and playback

Steps involved for PCM encoded standard-file-formats recording using Java Sound API

Get a source-dataline to read audio-data from a microphone port.
If line exists and is not open, open it with user permission (forcefully opening sound-input port is treated eavesdropping).
Start the target-dataline.
Read from target-dataline and write to an audio output stream.
Stop and close target-dataline.

Steps involved for PCM encoded standard-file-formats playback using Java Sound API

Read sound-file as audio input-stream.
Get a source-dataline to write audio-data to a speaker-port.
If line exists and is not open, open it.
Start the source-dataline.
Write to source-dataline.

In order to playback or record using non-standard extensions to Java Sound API, an additional intermediate step to decode vendor-encoding to PCM encoding is necessary.

From Java 1.5 onwards, support exists to embed additional metadata as a set of key-value (String-Object) data pair. This is an optional requirement which may not be honored by java sound service providers.

Permissions required

In order to read/write from or to local files, Applets have to be granted permissions in either of the two ways as suggested below...

Install permission by modifying ~JAVAHOME/lib/security/java.policy file with additional grant declarations.
Install permission by asking to user to sign digitally. (User is supposed to click on a digital agreement popped up while running the applet)

Option-A is not possible when the applet is catering to unknown users browsing on internet.

Option-B is made possible by buying a RSA digital signature from any of security solution vendors like Thwarte, Verisign etc.

Also Non-standard format service provider implementations have to be registered with JRE by copying SPI archives into ~JAVAHOME/lib/ext.

Integration with Browser

Recording API can be integrated with web-browser using any of client-computing facilities. Seamless client computing can be done with technologies like Java-Applets, MS ActiveX etc.; Client computing is needed for the interaction with sound-input port (microphone port) on the local machine. To enable Applets record with microphone-input as source, they need to be digitally signed and accepted by the user for security reasons.

User interacts through a web-browser like Internet Explorer with a Java-Runtime Environment supporting Java 1.3.x and higher. User requests a recording page from server with a specific URL. Server then returns a web-page with an embedded recording Applet object. User initiates recording by clicking on “record” button. Applet then listens to sound-input (microphone) indefinitely till user terminates recording by clicking on “stop” button.

Following sequence diagram illustrates a very high level process for recording. (NOTE: The process of server archiving sound-stream into a file on some database is not depicted here)

Where do we use Voice Based Solutions

Voice Based Solutions can be used in applications such as

Recording the user’s voice and playing it back when the user request for it.

Recording a person’s legal statement and playing it back when there is a legal requirement.

Recording and playing back the telephonic conversations.

Speech Recognition systems

Software that aids the visually impaired people.

Voice based Knowledge imparting software

Glossary

· MIDI – Musical Instrument Digital Interface (MIDI) is an industry-standard electronic communications protocol that enables electronic musical instruments, computers and other equipment to communicate, control and synchronize with each other in real time.

· WAVE - Waveform audio format (WAVE) is a Microsoft and IBM audio file format standard for storing audio on PCs.

· AU – The AU file format is a simple audio file format that consists of a header of 6 32-bit words and then the data (high-order byte comes first). This format was introduced by Sun Microsystems.

· AIFF - Audio Interchange File Format (AIFF) is an audio file format standard used for storing sound data on personal computers. This format was developed by Apple Computer and is most commonly used on Apple Macintosh computer systems.

· AIFC – The AIFF-Compressed (AIFC) is an audio file format that supports high compression rates.

· Mp3 – MPEG Audio Layer 3 is a lossy compression format, designed to greatly reduce the amount of data required to represent audio.

· Ogg – Patent-free compression format available from the open-source implementation VORBIS.

· VORBIS - Ogg Vorbis is a completely open, patent-free, professional audio encoding and streaming technology with all the benefits of Open Source.

· Speex - Patent-free audio compression format designed for speech.

· GSM 6.1.0 - Encoding designed for telephony use in Europe. GSM is a very practical format for telephone quality voice. It makes a good compromise between file size and quality. This is a highly recommended format for voice. Even wav files can also be encoded with the GSM codec.

Conclusion

Java Sound API is much more robust and gives greater control over the audio. Another advantage is the ability to manipulate the individual data streams. In earlier versions of the Java Sound API, one needed access to the entire sound clip before a sound could be played. Now one can buffer and read the sound using any sort of Producer/Consumer scheme, opening the way to network and streaming audio.

Java Programs and Examples with Output

Pages

Java API for Voice Based Solutions

2 Responses so far.

Leave a Reply

List of Java Programs

Total Pageviews

Followers

Popular Posts of This Week

Archives

Our Blogs

Labels

Popular Posts