A team of researchers led by Dr. Edward Chang at the University of California in San Francisco has developed a ‘brain decoder’ – a kind of mind-reading device that effectively translates your neural activity into recognizable speech, 75% of the time.
The study entitled ‘Speech synthesis from neural decoding of spoken sentences,’ published Wednesday (April 24) in the journal Nature, involved five epileptic volunteers, including four women and one man awaiting neurosurgery for their condition.
The patients had temporary electrodes implanted on their brain surface as a pre-surgery procedure to help identify and map the areas of the brain responsible for their affliction.
For the study, additional sensors were attached to the lips, tongue, and teeth to monitor their movements as the volunteers were made to read out hundreds of sentences, mostly passages from children’s classics like Sleeping Beauty, Alice in Wonderland, and The Frog Prince.
Electrical activity in the brains related to their vocal tract movement during the reading exercise was decoded and fed to a specially programmed computer system to produce intelligible sentences.
In humans, the vocal tract comprises the oral cavity, which includes the lips, inner cheeks, tongue, upper and lower gums, floor and roof of the mouth, and the small area behind the wisdom teeth, in addition to the nasal cavity, larynx, and the pharynx – all of which work in near-perfect harmony to produce intelligible sentences when we talk.
Dr. Chang’s team was able to equate the neural signals responsible for the movement of each of the vocal tract components with the participants’ speech.
The decoded neural activity was then converted into synthesized language with the help of a neural network linked to a voice synthesizer.
“Recurrent neural networks first decoded directly recorded cortical activity into representations of articulatory movement, and then transformed these representations into speech acoustics,” wrote the authors of the study.
To put it as simply as possible, it was, basically, a two-step process that involved translating neural activity into vocal movements and then transforming those movements into speech.
Although the reproduced speech sounds pretty much, well, synthetic, it is remarkably intelligible.
Also, considering that this is just the beginning, we can expect to see enhanced speech quality as the technology is further researched and fine-tuned in times to come.
This brief video clip will let you know exactly what we’re talking about here.
What’s amazing is that not only did the breakthrough decoder transform sentences that were read aloud, but it was also able to translate silently mimed sentences into audible speech.
In order to determine the recognizability of the decoded speech, hundreds of volunteers were asked to listen to 101 synthesized sentences and transcribe what they heard.
The results – as varied as they turned out to be – were nevertheless encouraging enough to warrant further research of the technology, as it has the potential to improve the quality of life of hundreds of thousands of people suffering from speech impairment due to conditions such as paralysis, ALS (amyotrophic lateral sclerosis ), throat cancer and Parkinson’s disease.
“Of the 101 synthesized trials, at least one listener was able to provide a perfect transcription for 82 sentences with a 25-word pool and 60 sentences with a 50-word pool,” wrote the authors, adding that the findings “may be an important next step in realizing speech restoration for patients with paralysis.”
Conventional speech-synthesizing technology in use today involves interpreting how speech sounds are represented in the brains – a tedious, time-consuming process that, at best, translates about eight words per minute; far slower than the 100-150 words per minute that natural speech is capable of.
The new technology has the potential to overcome these limitations and make near-normal conversation a reality, hopefully, in the not too distant future.
Dr. Chang’s team followed a different route, targeting those areas of the brain that send signals to the various vocal tract components, discussed earlier, in order for them to move in perfect unison, thereby enabling speech.
“For the first time … we can generate entire spoken sentences based on an individual’s brain activity,” said Chang.
“This is an exhilarating proof of principle that, with technology that is already within reach, we should be able to build a device that is clinically viable in patients with speech loss,” added the lead author of the paper.
Kate Watkins, a cognitive neuroscience professor at the University of Oxford, was quoted by The Guardian as saying that the research was a “huge advance” that could prove to be “really important for providing people who have no means of producing language with a device that could deliver that for them.”
“The brain is the most efficient machine that has evolved over millennia, and speech is one of the hallmarks of behavior of humans that sets us apart from even all the non-human primates,” Gopala Anumanchipalli, one of the co-authors of the study. was quoted by National Geographic as saying.
“And we take it for granted—we don’t even realize how complex this motor behavior is,” Anumanchipalli said.
In an accompanying News and Views article in the journal Nature, Yahia H. Ali and Chethan Pandarinath from Emory University, Atlanta, US, have expressed hope that continued research will go a long way in helping people with speech issues “regain the ability to freely speak their minds and reconnect with the world around them.”
While there’s still a lot of work left to be done before the technology can be perfected, it’s good to know that we’re headed in the right direction.