Visionary is the industry leader in adapting voice recognition to the deposition environment. A demonstration in May 2011 highlighted the potential benefits and cost savings of replacing a court reporter at a deposition with a lower-paid technician and Visionary’s voice recognition technology. At the demonstration we were told Visionary’s voice recognition software provides an almost-immediate rough draft, with a digital recording serving as the final transcription source. For another view, check out Ted Brooks’ blog.
My view is, the software’s capabilities may be an upgrade in areas of the country that have already adopted digital reporting as a standard. Since the voice recognition product also produces multichannel digital recording from which the transcript will be produced, the starting point of each technology (VR and DR) is the same; a multichannel digital recording.
As the deposition proceeds, Visionary’s voice recognition software begins translating and displaying the rough text. The attorneys and witness are mic’d, and the name of the speaker assigned to each mic is displayed rather than Q’s and A’s. Assigning a speaker to one of the four mic channels occurs as part of the pre-deposition setup by the technician. Additional speakers beyond the current capacity of four channels will need to have access to a mic and have the operator change that mic’s name assignment. Plans are to expand mic capacity to eight channels and beyond.
The voice recognition software needs to “learn” how each speaker pronounces words. So creating a speaker profile is critically important to the translation. Because of this high hurdle, my opinion is early translations will be useless. Since court reporters return to the same clients over and over, repeatedly returning to the same client with voice recognition will yield a better translation as that speaker's dictionary continues to improve.
While that statement has merit, the reality is the client, the questioning attorney, is only one of the parties present in the room. There will be no speaker profile for the witness or opposing counsel. Assuming a quality translation for the client/questioning attorney and poor translation rates for opposing counsel and the witness, questions of usefulness and impartiality will inevitably arise.
In addition to building a speaker profile, it is suggested to use a case profile to narrow the possible words to translate. There are built-in dictionaries, such as asbestos, med-mal, construction, et cetera. The operator would also build case profiles.
In my opinion this product is not ready. There are too many shortcomings at this time that prevent voice recognition from being a threat to professional reporters. Among
-Attorneys resist change. If qualified reporters are available in a community, the cost savings will not be significant enough to impose change.
- The concept of a low-paid technician being conscientious enough to do all that’s necessary to make a quality record belies reality. If you’re paying peanuts, you should hire a monkey.
-There is lots of equipment, wires, mics, computers, and a complex program. It is interesting to note we were unable to view a live demo at the rollout because of an equipment failure. Attorneys are used to a court reporter coming in, setting up, and off we go. The stenotype machine is one of the most reliable machines on earth. It is proposed that a computer, in itself prone to failure, will be coupled with a complex program and together they will run reliably at the hands of a low-paid technician. I think that also ignores reality.
-Court reporters are recognized as neutral participants in a deposition. The voice recognition technician is not. Not only is he not neutral, the financial justification for switching to voice recognition is that the operator will be a low-paid independent contractor. I believe it is illogical to assume a low-paid operator will conduct himself professionally, build the required speaker dictionary and case profile, maintain confidentiality, do appropriate backups, and otherwise fulfill the obligations of the professional court reporter.
-The voice recognition translation rates in a deposition setting do not exist. Having observed the demonstration, I believe the translation rate will be far below acceptable.
-At the end of the day, a transcriptionist will still need to transcribe the entire proceedings from the digital recording. The only proposed benefit of the voice recognition program is the display of the translation.
-The voice recognition software utilizes no punctuation, Q’s and A’s or capitalization. Imagine every word above as one sentence. Then imagine giving that to an attorney and telling him it’s a better product!
-There is growing translation latency. The longer continuous speech goes on, the greater the translation latency. I’ve been in depositions that have gone hours without a minute break. Assuming the system worked perfectly, the translation would be significantly behind the proceedings, rendering the concept of the value of the immediate rough transcript moot.
-Finally, it is interesting to note that after travelling across the country to view the voice recognition software rollout, the live demo never occurred because the system failed at the hands of the experts in the field.
This is only my view. I invite your comments and questions.