December 2, 2007
So what’s the state of speech recognition and dictation software?
Linda asked me about the state of desktop dictation technology. In particular, she asked me whether there was much difference between the latest version and earlier, cheaper ones. My knowledge of the area is out of date, so I thought I’d throw both the specific question and the broader subject of speech recognition out there for general discussion.
Here’s much of what I know or believe about speech recognition:
- Most major independent commercial speech recognition efforts have wound up being merged into Nuance Communications. That goes for both desktop and server-side stuff. None was doing particularly well before its respective merger.
- A folk dance buddy (Jonathan Young, once of Dragon Systems) taught me the essential principle of developing speech recognition systems, which probably applies more broadly to other language-understanding technologies as well: “How do you make a good speech recognition product? You start with a bad one and keep incrementally improving it.”
- Linda tells me that a lot of novelists use dictation software, to reduce repetitive strain from typing. However, this often leads to repetitive use strains on their throats. I don’t know whether it makes a difference if one uses better microphones, talks more softly, and/or has access to software that is less demanding of carefully enunciated gaps between each word.
- Perhaps due to accuracy concerns, and perhaps also due to concern about noise pollution in the workplace, ordinary computer control via voice is rare. Most applications focus on specialized-circumstance dictation (hands-free, disabled users, users who are being harmed by typing, etc.) or telephone interaction.
- Rich semantic technology isn’t yet used in speech recognition to nearly the extent it is in text search/mining/analytics. The grammar in speech recognition systems is primitive at best. And while there may be some hand-built semantic networks with small numbers of nodes, ala Sybase AnswersAnywhere, nobody’s ever hooked up (say) a WordNet equivalent or a good entity-extraction engine as part of a mainstream commercial speech recognition product. (Please correct me if I’m wrong about this part!)
- There are real challenges in voice recognition via remote microphones in small enclosed places (e.g., automobiles), especially when noisy. But wearing headsets while driving is generally frowned on by the traffic police. EDIT: It seems that those challenges are being overcome.
- Overall, I can’t think of anything wrong in this Wikipedia article on Dragon NaturallySpeaking. That said, the article is a bit sloppy, so I’d encourage people to see if they can edit it a bit and spruce it up.
Any thoughts? In particular, what version of Dragon NaturallySpeaking or a competitive product should Linda use, and why?
Categories: Language recognition, Natural language processing (NLP), Nuance, Speech recognition, Sybase
Subscribe to our complete feed!
Comments
12 Responses to “So what’s the state of speech recognition and dictation software?”
Leave a Reply
.
Hi Curt,
Speech recognition in general, is gaining ground as an ubiquitous technology almost daily..
And Windows Vista offers dictation, and Command and Control that’s previously unheard of!
Here’s an article that may help you get a better picture:
http://wirelessspeech.blogspot.com/2007/12/speech-recognition-top-10-flop-says.html
Bill Burke
http://wirelessspeech.blogspot.com
.
If you have Vista you don’t need to get Dragon. Just go to the accessibility menu and turn on the speech recognition that is included in the OS. It is very good, and is both free and immediate, just a few clicks and a training session away…
Thanks!
steveh
Thanks, Steve.
I’ve chickened out and haven’t run Vista so far, despite Microsoft’s blandishments.
CAM
The MS product is also downloadable for XP / Word 2003, or you may already have it.
It’s only for US English, Chinese, Japanese.
Unfortunately, I don’t have Vista.
I used the Word program a few years ago, and found it pretty annoying. Despite long hours of “training,” the errors when I dictated were legion. I know writers who use Dragon, and love it, but the version they’re using is several years old. Does anyone know if Dragon has a recent update?
Thanks.
–Linda
Per Wikipedia, Dragon NaturallySpeaking 9 came out in mid-2006, and doesn’t require training. Does anybody know whether there are other significant enhancements in Version 9? And is the no-training claim really true?
CAM
Hi. I’ve used the open source Java research software Sphinx-4, which performs automatic speech recognition. I get about 5% – 10% error rate on my large vocabulary evaluations. It does not have a facility for training. And its not really an end-user product but it can be incorporated easily into Java applications.
See: http://cmusphinx.sourceforge.net/html/cmusphinx.php
-Steve
Stephen L. Reed
Artificial Intelligence Researcher
http://texai.org/blog
http://texai.org
3008 Oak Crest Ave.
Austin, Texas, USA 78704
512.791.7860
During the past few days, I’ve been increasing my usage of SR in Vista, and the results are encouraging. A boom microphone is essential (a Bluetooth-connected earset won’t work) and a reasonably quiet environment is needed (loud noises from outside such as bird songs (!) don’t help).
Anyway, because the Vista SR is part of the OS, it seems to have knowledge of all the special words, names, etc., in ones documents, contacts, and so forth. This radically reduces training. For ordinary conversation recognition, it does very well, and it sure beats typing. If it mis-identifies a word, the correct word is almost always found on the pop-up menu of alternates.
Not perfect, but impressive. Training is quite short, perhaps 15 minutes. And it also gives good control over the desktop, once again, not perfectly, but it’s a whole lot better than typing.
I think Sphinx is the only Opensource Speech recogntion program that works with Asterisk. I’ve never luck to integrate Sphinx 3 and 4 with Asterisk. However I tried Sphinx2 with Asterisk and it worked well for me on Asterisk 1.2.x, I followed the following steps:
http://www.syednetworks.com/asterisk-integration-with-sphinx-voice-recognition-system
If version 3 and 4 works for anyone please share. Thanks
[…] My December, 2007 survey of speech recognition technology […]
[…] it goes even further. For example, I was told by a guy who is now a senior researcher at Attivio: “How do you make a good speech recognition […]
I have been using Dragon Dictation since its very earliest inception, in fact I am using Dragon dictation 11.5 to dictate this note. I have never been happy with this or any similar programs.
We are, at best, but a single step along a very long road toward real speech recognition.
I am a writer, or at least I lie to myself when I say that and without Dragon Dictation I would be completely lost.
The problem is that Dragon makes at least one error in every line of text. If I am dictating a paragraph in a novel and Dragon makes a mistake or indeed several mistakes in that paragraph and I return to make corrections, I frequently am at a loss as to the exact wording of the paragraph. That’s my fault, not the fault of Dragon Dictation.
Nuance naturally advocate for their software suggesting that the software is 99% accurate, when indeed, nothing could be further from the truth. I would suggest, from my personal experience, that the software is at best 70% accurate.
Windows 7 has speech recognition built-in. If you have any hair and you plan to keep your hair then do not make any effort to use Microsoft’s speech recognition. I have no hair because I tore out what little I had in handfuls trying to get Microsoft’s speech recognition to work in any meaningful way.
I look forward to the day when we will have real speech recognition but alas, I suspect I will be little but a memory on a tombstone long before that happens.