Work started in the early 50's at Bell Laboratories to develop machines which could recognize elements of human speech [1]. Since then there has been great interest in developing systems which can process natural language. Today we find these systems in use primarily in telephone help systems where a computer and a voice recognition and synthesis engine is used to answer questions. The ability of these systems is still very limited and in most cases they are used to recognize just a few words and numbers. The words must be spoken clearly and such systems are often confused by different accents.
Several companies have produced speech recognition systems as commercial products. The author's personal experience with offerings from IBM and Microsoft is that even after considerable training, these tools are poor at best and the resulting manuscript resulting from a dictation session requires so much editing that the overall effort is more than would be required to type it in directly. For those of us who cannot touch type and who are prone to spelling errors and other character reversal mistakes, a good speech to text interface would be a great help. For people with physical disabilities, a good voice interface to a computer could make dramatic changes in quality of life.
The primary driving force for computers that can understand natural language is probably to reduce costs in call centers associated with large businesses. If computers can understand spoken language accurately and this technology can be combined with Artificial Intelligence then we have the potential for really useful support systems which could be far more effective than a poorly trained human reading from a script.
Combining the recognition of human speech with AI systems is being pursued for several research projects notable Project Halo which is funded by Paul Allen's Vulcan Ventures [2]. Project Halo's goal is to produce a "Digital Aristotle" - a teaching tool capable of answering scientific questions. Halo has produced some good results with text input - demonstrating the AI part of the program. In subsequent phases the intent is to include natural language processing and to develop the knowledge base using scientific personnel rather than knowledge base engineers.
[1] "Automatic Speech Recognition – A Brief History of the Technology Development, http://www.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/354_LALI-ASRHistory-final-10-8.pdf."
[2] http://www.projecthalo.com/.
Subscribe to:
Post Comments (Atom)
5 comments:
And out of all this, we developed voice response phone trees that can only understand you when you're swearing. Nice.
Yes, that's about as far as we seem to have gotten so far. People are really good at pattern matching and learning new languages etc yet some people can't understand strong accents in their own language - not much hope for the computers :)
The basic interface to the computer is too darn old. Relying on typing has limited the real potential of a computer.
As you point out the next step could be an oral interface,but, DARPA is trying to move beyond that to accomodate physically handicapped users by a thought reading process. DARPA may be way out there but if it happens then there will be true revolution in interface.
Another interestiong computer interface is the tongue - see http://www.washingtonpost.com/wp-dyn/content/article/2008/08/25/AR2008082500582.html
I worked on this back in the early 90's using speech recognition software, a true speech recognition software by Dragon call Naturally Speaking. Surprisingly it worked relatively well considering it was one of the first releases, but as you so eloquently pointed out, it was cumbersome and the learning curve was a bit much. But today it is relatively simple and some area's (legal and medical) have converted to it and it is reducing workload. So I agree the technology still has a way to go, but eventually all computing will be via voice, the only holdouts will be the people that have a resistance-to-change and they have always been around.
Post a Comment