Site hosted by Build your free website today!

Automatic Speech Recognition


The Future of Human/Computer Interaction

ASR is an outstanding tool to inferface with a computer but the human method of verbal communication is so complicated, you may be better using direct fact to face communication. This is one area where the human brain is still the best tool.

Is calling an automated call center frustrating? You bet.

Human beings have an innate capacity to detect pitch and intonation. Scientists are developing algorithms both for recognizing speech and for synthesizing it. Factoring the intention of the speakers is harder than just figuring out words. The National Science Foundation, the Defense Department, NASA, the Defense Advanced Research Projects Agency and other agencies are analyzing cues like pitch, pauses, emphasis and volume in natural rather than staged situations. 

A patent was recently granted to the Mitel Knowledge Corporation in Kanata, Ontario, for a system that searches for rapid speech, stuttering or profanity as indications of frustration. Sentence boundaries, for example ó which seem obvious on a printed page but are harder to spot in speech ó are typically marked in writing by periods. Today's machine transcriptions are a lot like classic run-on sentences, hundreds of words with never a full stop in sight. Scientists have mapped out some ways in which changes in pitch can signal sentence boundaries. For example, she has devised models of angry sentences, which tend to have a higher overall pitch and to end in a downward pitch, as well as being slower and having a striking emphasis on certain words. Analyzing the collective "ums," "ands" and false starts that are frequent when people talk. You can only train a computer to learn your method of speaking so far.

Teaching Machines to Hear Your Prose and Your Pain

1. Speech is a poor medium for most office work. We are conditioned to write, read, and edit by keyboard. Writen text is less ambiguous than speech.
2. While there are very good practices in speech interface design today and many useful services that can be built with them, there are still significant challenges and room for improvement. Those limits have to do with the shared understanding between a human and a machine sharing a speech interface.
3. Human being hear better than machines and transcribe punctuation better. We can tell when someone is joking.

Potential Uses:

1. There is already a large speech industry in medicine, and it is widely seen as one of the key technologies moving forward (it has probably already eclipsed "office ASR" and is a significant part of the speech recognition industry overall).
2. Biometrics based recognition over the internet to facilitate and protect content.
3. Computer learning and training applications such as foriegn language training
SRI Speech: Products: Software Development Kits: EduSpeak
4. Electronic stenographers: computer programs that automatically convert intercepted conversations and broadcasts into transcripts. Eavesdropping "Um. yeah. iíll iíll sweep it now just just to uh check." into transcripts. Combined with biometric recognition- powerful intelligence gathering device.
Teaching Machines to Hear Your Prose and Your Pain