The Voice Response Translator: A Valuable Police Tool
by Mark P. Cohen
About the Author
An Oakland, California, police officer pulls over a motorist for speeding. The officer approaches and asks for the driver’s license and registration. The confused, frightened motorist shakes his head and begins speaking rapidly in Spanish.
A West Palm Beach, Florida, officer receives an anonymous phone tip about a domestic disturbance. The deputy arrives to find no assailant but a badly battered woman and infant. When he asks what happened, the woman responds in Haitian Creole.
Non-English speakers are nothing new to America. But as the number of foreign-born residents in the United States has steadily risen in the past decade, so has the number of people who are not fluent in English. Census data from 2000 showed that one in five U.S. residents speaks a foreign language at home. Only a little more than half of these people (55 percent) also reported speaking English “very well.”
This poses a dilemma for law enforcement. Increasingly, American policing requires interaction with speakers of not only Spanish, but Arabic, Hindi, Russian, Swahili, Tagalog, and Vietnamese. It simply is not possible in an emergency for police to wait for an interpreter to assist by phone much less arrive on the scene. As the number of foreign language speakers grows, law enforcement must find cost-effective means to communicate with these residents.
In December 1993, an NIJ advisory council identified instant language translation as an immediate law enforcement technology priority. In this high-tech age, the council reasoned, there must be an economical, technological means to assist officers in communicating with non-English speakers.
Eventually, NIJ identified and tested four devices with the potential to fulfill law enforcement needs. NIJ asked the Naval Air Systems Command (NAVAIR) Training Systems Division in Orlando, Florida, to perform testing on the Voice Response Translator (VRT) from Integrated Wave Technologies (IWT), Inc., the Phraselator from Marine Acoustics, Inc., Ectaco, Inc.’s, Universal Translator, and a Hewlett Packard iPAQ personal digital assistant. (The testing was conducted in 2002 on the most current models of each device available at that time—modifications have been made to each of these devices since then.)
Comparisons of the four units found remarkable similarity, with the largest differences being 1) ruggedness, 2) quality of speakers and microphones, and 3) voice activation for hands free operations. In the comparison, the VRT scored as the top choice for law enforcement use.
Voice-Activated Language Tool for Law Enforcement
The VRT is a one-way translator that allows users to instantly communicate with non-English speakers. Each VRT unit is “trained” by an individual to recognize that person’s short, voice-activated commands (called “trigger phrases”) in English. The English phrase is associated with a computerized audio file of a complete, foreign-language sentence recorded by a fluent speaker of that language. In less than a second, the VRT repeats the command in the desired language. The device can be equipped with either a headset or an adjustable gooseneck microphone and has a bullhorn jack. It can be kept in an officer’s shirt pocket or mounted on a citation book.
In the examples above, the Oakland officer who pulled over the Spanish-speaking motorist might say the trigger phrase, “Too fast.” The VRT would instantly repeat the phrase in English for verification, and then issue the appropriate full sentence in Spanish. Or the West Palm Beach officer might say, “You in pain?” and the VRT will ask the query in Haitian Creole. The VRT is programmed specifically for such common policing matters as traffic stops, domestic problems, lost children, and medical emergencies.
VRT prototypes have been used by a number of law enforcement agencies, including police departments in Oakland, West Palm Beach, and Nashville, Tennessee. Nashville Police Captain Ken Pence told National Public Radio that the VRT was a welcome innovation in his city, where police encounter some 20 languages on a daily basis.
Building the Prototype
The VRT employs sound analysis technology developed for military and covert operations by the former Soviet Union. When the USSR collapsed, IWT bought the rights to the Soviet research, which formed the basis for the original, desktop version of the VRT. This version, which was intended solely to demonstrate the technology to NIJ, could translate only 25 phrases.
NIJ asked IWT to come back with a device that was readily portable as well as “eyes free and hands free,” a policing necessity in emergencies. The result was the first generation of the VRT. It measured about 6 inches by 6 inches by 4 inches.
This version was tested by the Oakland Police Department. The field test showed that the unit had promise but that it was still too bulky to fit comfortably in an officer’s pocket and, more importantly, it did not consistently recognize officers’ voice commands.
So IWT developed its second generation unit. That device was still too big and did not perform adequately in high noise situations.
Navy Joins Testing of Device
In addition to testing prototypes at various law enforcement agencies, NIJ sent the VRT for independent analysis by the U.S. Navy, specifically, NAVAIR. (See “VRT and the U.S. Military.”)
Navy civilian psychologists, voice technology engineers, and instructional systems designers conducted two studies of the VRT, one in the lab and the other in the field. NAVAIR tested several generations of the VRT, including IWT’s third generation device, which was the smallest yet. The laboratory test, which was conducted in a sound studio, determined that the unit’s microphone did not perform as well as off-the-shelf models. NAVAIR swapped the microphone for the field tests.
The NAVAIR study found that many officers needed less than one day to become comfortable with the VRT and that the unit performed properly in all programmed languages. According to NAVAIR’s Dee Sheppe, the field test found that the VRT “is easy for people to learn how to use. It offers a quick solution that can help an officer on the street when he doesn’t have a lot of resources. The small size is an advantage.”
The results of the field test were summarized in a December 2003 report. Twenty-seven VRT units were distributed to law enforcement officers for a 3-week period. The VRT was employed most frequently in traffic stops. Spanish was the most frequently used language.
Overall, half of the officers found the VRT to be useful and user friendly and reported that the device enabled them to handle many situations that otherwise would have required a translator. However, the other half of the officers surveyed reported difficulty in operating the VRT and opted not to use it.
NAVAIR attributed dissatisfaction with the device partly to the fact that some officers are simply slow to adapt to new technology. Thomas Franz, the NAVAIR psychologist who led the field test, was not surprised to find a split decision on the VRT. “If you talk to police officers, there’s a given percentage who won’t use pepper spray. And there are other cops who swear by it. So there’s choice there by the individual officer. I get frustrated because people say [the VRT] doesn’t work for everybody. It’s a tool: some people will like it and some people will not.”
Some of the officers in the test may have been disappointed because they expected the device to work as a two-way translator. The VRT does not translate what a civilian says back to an officer. It can, however, prompt an individual to nod his or her head “yes” or “no,” show identification, or direct him or her to write down an answer.
VRT AND THE U.S. MILITARY
With its large research and development budget, the Defense Department has overseen the development of numerous technological innovations. In fact, law enforcement often adapts technologies from the military.
The VRT is an exception: NIJ shepherded its development and the military adapted it for use in Iraq and elsewhere, even though the Pentagon had been underwriting development of a similar technology by a different vendor.
In the Naval Air Systems Command's (NAVAIR’s) comparison of the VRT, the Defense Department device, and another similar device, the VRT was NAVAIR’s choice. “The shortcomings of the VRT (lack of volume control, lack of an auto-off feature, and lack of a PC link) could be easily overcome with a specification for these features included in the production/manufacturing requirements,” the report concluded.
The NIJ-NAVAIR collaboration introduced the VRT to the Navy and Coast Guard. NAVAIR funded nautical versions of the instruction booklet and command cards, and NIJ authorized the use of four translators by the Navy. Programmed with more than 200 commands organized into nine events, the nautically trained VRT was tested on three Navy ships.
The Coast Guard, which frequently encounters foreign speakers in boardings at sea, has purchased some 70 VRT units. The devices were deployed to the Persian Gulf during the Iraq war to warn foreign vessels away from oil rigs.
Likewise, IWT has developed a 34-page “Operating Instructions and Phrase List” for use of the VRT by the Marines. The Marine Corps has purchased 50 units and plans to buy more. Marines who used the VRT in Iraq have suggested the addition of numerous phrases, which have been translated into Iraqi Arabic by the Defense Language Institute and incorporated into the VRT commands.
In 2003, the U.S. Special Operations Command, which coordinates all the military special forces, witnessed a VRT demonstration, liked what it saw, and purchased 100 devices for use in Iraq.
Features of the Device
The VRT can be programmed to translate into any language. And once programmed, an officer can switch among languages by voice command.
The VRT is speaker dependent, so it only works for the particular officer or officers who “trained” it. However, a single device can be trained to recognize the commands of eight different officers.
Technology Has Limitations
The technology measures “peaks”—highs and lows—in an officer’s speech pattern. The precise phrases spoken into it initially are what it will look for in the future. So if an officer’s inflection or voice pattern is altered by a stressful encounter, the VRT might fail. And some officers find it difficult to say the same thing twice with the same inflection.
An example of a problem that occurred in testing involved a generally soft-spoken motorcycle cop. When asked to role-play a traffic stop, the officer unknowingly assumed a more hard-edged “Robocop” voice. Not surprisingly, this officer was unable to get the device to work at all.
The VRT might also falter when used by an officer with a distinctive ethnic accent.
During testing, a native Hebrew speaker was unable to operate the device using English commands (but the same officer had no problem recording trigger phrases in Hebrew).
Officers also reported that the microphone frequently failed to pick up their voices. The microphone has to be positioned precisely for the unit to work correctly.
Although the VRT generally performed well in noisy environments, it had trouble recognizing commands that began with what linguists term “voiceless speech” sounds, i.e., soft sounds formed without use of the vocal cords. (These include Ch, F, H, K, P, S, Sh, T, Th, and Wh.) Voiceless speech sounds were especially a problem for officers with a sore throat or chest or head cold.
The fix is to alter the trigger phrases so they begin with hard sounds that cause the vocal cords to vibrate. Whereas “P” alone does not work at the start of a command, the blend of “Pl” does. Similarly, rather than train their devices to translate “Hello,” the officers are instructed to change the trigger word to “Greetings.”
Making a Better VRT
A number of officers who used the VRT in the field test reported forgetting the precise trigger phrases necessary to operate the unit. This was especially a problem for phrases used in less frequently encountered policing situations. To address this limitation, officers carry color-coded Command Cards that break trigger phrases into four categories:
Black for an event; Blue for paperwork, such as “Car Registration”; Green for conversations; and Red for emergencies. The Command Cards list key phrases within a category sequentially. For example, traffic stop commands begin with “Turn off engine,” “Step out of the vehicle,” “May I have your driver’s license, please,” and so on.
The most frequently mentioned improvement sought by the officers was to include a volume switch. They also noted shortcomings in commands for dealing with certain common situations and suggested the following additions: “Driving under influence,” “Please write date of birth,” the Miranda rights, and “Permission weapons search.” Officers also asked for additional phrases related to possible driving under influence encounters.
The NAVAIR report recommended the creation of an instructional video on how to use the device, noting that officers generally did not use the written instructions regarding vocal volume levels or how to hold the device. Another possible improvement would be to incorporate software in the VRT that would enable users to readily add or modify trigger phrases. Currently, the device comes loaded with trigger phrases and changing them requires special training.
Commercialization and Cost
IWT president Tim McCune puts his company’s investment in the VRT at about $3 million over the last 10 years; NIJ’s Office of Science and Technology contributed another $1 million. McCune believes the VRT is nearly ready to move from the prototype stage to commercialization. He anticipates that each VRT package will sell for $3,000. That includes the translator, language modules, megaphone, cables, chargers, training materials, and documentation. However, the price will probably have to fall to around $1,000 before it is widely procured by domestic law enforcement agencies.
There are numerous other potential markets for the VRT—corrections officers, customs and immigration officials, persons disabled with ailments like cerebral palsy, school personnel—that could expedite commercialization and drive down per-unit costs for law enforcement.
The Next Generation
A fourth generation VRT is now in use by a police department in Kentucky. This latest version is 3 inches wide and 5 inches high. Although it consumes less battery power than its predecessors, it has the capacity to store 125 languages and 125,000 trigger phrases (although IWT does not anticipate law enforcement needs to exceed 500 phrases).
The VRT has proven its utility to law enforcement, but NIJ is also aware of its limitations. It is primarily being used, at least initially, for everyday patrolling, including pullovers, driver’s license and registration checks, and other relatively low-stress engagements.
Thus far, the VRT appears to work in every police situation for which it was designed, from arrests to returning lost children to their homes. As the device becomes more readily available, the list of situations in which it can prove useful is likely to continue to grow.