Voice based solutions are needed when
there is a legal requirement to play back an already recorded statement,
recording and playing back the telephony conversations, speech recognition
systems, visually impaired people, for training customer support
representations and so on. Voice based solutions can be implemented using J2SE Java
Sound API or Java Media Framework (JMF). Java Sound API specification,
available from J2SE 1.3.x and higher, provides low-level support for
audio-operations such as audio playback and capture (recording), mixing,
MIDI-sequencing, and MIDI synthesis in an
extensible, flexible framework. But JMF is a much richer set of API communizing
all kinds of media with one single set of interfaces. This document explains
about Java Sound API and its implementation.
Java Sound Overview
Java Sound API provides playback and
capture support for PCM
encoded WAVE, AU, AIFF, AIFC audio file formats by default. Playback and
capture of non-standard audio formats like mp3, Ogg, Speex, GSM 6.1.0, Tritonus
can also be implemented using Java Sound API. Support for vendor specific
formats is provided transparently by an extension framework exposed in the form
of Java Sound Service Provider Interfaces (SPI). SPI allows to plug-in-in
different encoders, decoders for vendor formats and transcoders for different
formats. An implementation of Java Sound SPI should be registered as an
extension to standard Java SDK by making it available in the CLASSPATH of
Java-virtual-machine. Application code making use of Java Sound API is hence
made independent of vendor specific audio implementations.
Playback and Capture using Java Sound
In order to play or capture audio using
the Java Sound API, at least three things are needed:
- Formatted audio-data - Formatted audio data refers to sound in any of a number of standard formats.
- A Mixer - In the Java Sound API, devices are represented by Mixer objects. A device is often a software interface to a physical input/output device.
- A Line - A line is an element of the digital audio "pipeline"—that is, a path for moving audio into or out of the system.
Audio Format encapsulates encoding
technique, number of channels, sample rate, bits/sample, frame rate, frame size
(in bytes), byte-order, properties.
A possible configuration of lines for
Audio-Output may be represented as below...
A possible configuration of lines for
Audio-Input may be represented as below...
The hierarchy of the audio line
interfaces is as follows...
Steps involved for recording and
playback
Steps involved for PCM encoded
standard-file-formats recording using Java Sound API
- Get a source-dataline to read audio-data from a microphone port.
- If line exists and is not open, open it with user permission (forcefully opening sound-input port is treated eavesdropping).
- Start the target-dataline.
- Read from target-dataline and write to an audio output stream.
- Stop and close target-dataline.
Steps involved for PCM encoded
standard-file-formats playback using Java Sound API
- Read sound-file as audio input-stream.
- Get a source-dataline to write audio-data to a speaker-port.
- If line exists and is not open, open it.
- Start the source-dataline.
- Write to source-dataline.
In order to playback or record using
non-standard extensions to Java Sound API, an additional intermediate step to
decode vendor-encoding to PCM encoding is necessary.
From Java 1.5 onwards, support exists
to embed additional metadata as a set of key-value (String-Object) data pair.
This is an optional requirement which may not be honored by java sound service
providers.
Permissions required
In order to read/write from or to local
files, Applets have to be granted permissions in either of the two ways as
suggested below...
- Install permission by modifying ~JAVAHOME/lib/security/java.policy file with additional grant declarations.
- Install permission by asking to user to sign digitally. (User is supposed to click on a digital agreement popped up while running the applet)
Option-A is not possible when the
applet is catering to unknown users browsing on internet.
Option-B is made possible by buying a
RSA digital signature from any of security solution vendors like Thwarte,
Verisign etc.
Also Non-standard format service
provider implementations have to be registered with JRE by copying SPI archives
into ~JAVAHOME/lib/ext.
Integration with Browser
Recording API can be integrated with
web-browser using any of client-computing facilities. Seamless client computing
can be done with technologies like Java-Applets, MS ActiveX etc.; Client
computing is needed for the interaction with sound-input port (microphone port)
on the local machine. To enable Applets record with microphone-input as source,
they need to be digitally signed and accepted by the user for security reasons.
User interacts through a web-browser
like Internet Explorer with a Java-Runtime Environment supporting Java 1.3.x
and higher. User requests a recording page from server with a specific URL.
Server then returns a web-page with an embedded recording Applet object. User
initiates recording by clicking on “record” button. Applet then listens to
sound-input (microphone) indefinitely till user terminates recording by
clicking on “stop” button.
Following sequence diagram illustrates
a very high level process for recording. (NOTE: The process of server archiving
sound-stream into a file on some database is not depicted here)
Where do we use Voice Based Solutions
Voice
Based Solutions can be used in applications such as
Recording the user’s voice and playing it
back when the user request for it.
Recording a person’s legal statement and
playing it back when there is a legal requirement.
Recording and playing back the telephonic
conversations.
Speech Recognition systems
Software that aids the visually impaired
people.
Voice based Knowledge imparting software
Glossary
·
MIDI
– Musical Instrument Digital Interface (MIDI) is an industry-standard
electronic communications protocol that enables electronic musical instruments,
computers and other equipment to communicate, control and synchronize with each
other in real time.
·
WAVE - Waveform audio format (WAVE) is a
Microsoft and IBM audio file format standard for storing audio on PCs.
·
AU – The AU file format is a simple audio
file format that consists of a header of 6 32-bit words and then the data
(high-order byte comes first). This format was introduced by Sun Microsystems.
·
AIFF - Audio Interchange File Format (AIFF)
is an audio file format standard used for storing sound data on personal
computers. This format was developed by Apple Computer and is most commonly
used on Apple Macintosh computer systems.
·
AIFC – The AIFF-Compressed (AIFC) is an
audio file format that supports high compression rates.
·
Mp3 – MPEG Audio Layer 3 is a lossy
compression format, designed to greatly reduce the amount of data required to
represent audio.
·
Ogg – Patent-free compression format
available from the open-source implementation VORBIS.
·
VORBIS - Ogg Vorbis is a completely open,
patent-free, professional audio encoding and streaming technology with all the
benefits of Open Source.
·
Speex - Patent-free audio compression format
designed for speech.
·
GSM
6.1.0 - Encoding
designed for telephony use in Europe . GSM is a
very practical format for telephone quality voice. It makes a good compromise
between file size and quality. This is a highly recommended format for voice.
Even wav files can also be encoded with the GSM codec.
Java Sound API is much more robust and gives greater control over
the audio. Another advantage is the ability to manipulate the individual data
streams. In earlier versions of the Java Sound API, one needed access to the
entire sound clip before a sound could be played. Now one can buffer and read
the sound using any sort of Producer/Consumer scheme, opening the way to
network and streaming audio.
This post is different from what I read on most blog. And it have so many valuable things to learn.
Recorder Call
Thanks for the sharing of such information we will pass it on to our readers.
Recording Phone