At medical facilities around the country, care is delayed, complicated and even jeopardized because doctors and patients don't speak the same language. The situation is particularly dire in diverse megacities like Los Angeles and New York.
Now, USC computer scientists, communication specialists and health professionals hope to create a cheap, robust and effective speech-to-speech (S2S) translation system for clinics, emergency rooms and even ambulances.
The initial SpeechLinks system will translate between English and Spanish. Professor Shrikanth Narayanan, who directs the Signal Analysis and Interpretation Laboratory at the USC Viterbi School of Engineering, hopes to test and deliver a working prototype within the 4-year window of a recently awarded $2.2 million NSF grant for "An Integrated Approach to Creating Context Enriched Speech Translation Systems."
Narayanan, who holds appointments the USC departments of electrical engineering, computer science, linguistics and psychology will collaborate with fellow engineering faculty member Panayiotis Georgiou, Professor Margaret McLaughlin of the Annenberg School for Communication and with researchers and clinicians from the Keck School of Medicine at USC on the project.
The project will also include investigators from two corporations, BBN and AT&T, who will not only collaborate on the research but serve as mentors to the students working on the project.
The detailed prospectus for the effort begins by explaining the need: "While large medical facilities and hospitals in urban centers such as Los Angeles tend to have dedicated professional language interpreters on their staff (a plan which still suffers from cost and scalability issues), multitudes of smaller clinics have to rely on other ad hoc measures including family members, volunteers or commercial telephone translation services. Unfortunately, human resources for in-person or phone-based interpretation are typically not easily available, tend to be financially prohibitive or raise privacy issues (such as using family members or children as translators)."
Filling these needs, Narayanan says, will require a system that can perceive and interpret not just words, but a wide range of human communications, an improvement on current, limited "pipeline" translation technology. "We want to let people communicate," he says. "We need to go beyond literal translation"—heavily based on translating written texts rather than spoken language—"to rich expressions in speech and non verbal cues. We want to enhance human communication capabilities."
The additional cues to be analyzed and incorporated into the translation mix include, according to the plan:
Other elements of the mix include embedding these sets of analyzed cues into speech that is synthesized from inputs keyboarded into the interface as a response or a question.