NIJ Journal No. 252 • July 2005
The Voice Response Translator: A Valuable Police Tool
by Mark P. Cohen
About the Author
An Oakland, California, police officer pulls over a
motorist for speeding. The officer approaches and asks for
the drivers license and registration. The confused,
frightened motorist shakes his head and begins speaking
rapidly in Spanish.
A West Palm Beach, Florida, officer receives an anonymous
phone tip about a domestic disturbance. The deputy arrives
to find no assailant but a badly battered woman and infant.
When he asks what happened, the woman responds in Haitian
Non-English speakers are nothing new to America. But as
the number of foreign-born residents in the United States
has steadily risen in the past decade, so has the number
of people who are not fluent in English. Census data from
2000 showed that one in five U.S. residents speaks a foreign
language at home. Only a little more than half of these
people (55 percent) also reported speaking English very
This poses a dilemma for law enforcement. Increasingly,
American policing requires interaction with speakers of
not only Spanish, but Arabic, Hindi, Russian, Swahili, Tagalog,
and Vietnamese. It simply is not possible in an emergency
for police to wait for an interpreter to assist by phone
much less arrive on the scene. As the number of foreign
language speakers grows, law enforcement must find cost-effective
means to communicate with these residents.
In December 1993, an NIJ advisory council identified instant
language translation as an immediate law enforcement technology
priority. In this high-tech age, the council reasoned, there
must be an economical, technological means to assist officers
in communicating with non-English speakers.
Eventually, NIJ identified and tested four devices with
the potential to fulfill law enforcement needs. NIJ asked
the Naval Air Systems Command (NAVAIR) Training Systems
Division in Orlando, Florida, to perform testing on the
Voice Response Translator (VRT) from Integrated Wave Technologies
(IWT), Inc., the Phraselator from Marine Acoustics, Inc.,
Ectaco, Inc.s, Universal Translator, and a Hewlett
Packard iPAQ personal digital assistant. (The testing was
conducted in 2002 on the most current models of each device
available at that timemodifications have been made
to each of these devices since then.)
Comparisons of the four units found remarkable similarity,
with the largest differences being 1) ruggedness, 2) quality
of speakers and microphones, and 3) voice activation for
hands free operations. In the comparison, the VRT scored
as the top choice for law enforcement use.
Voice-Activated Language Tool for Law Enforcement
The VRT is a one-way translator that allows users to instantly
communicate with non-English speakers. Each VRT unit is
trained by an individual to recognize that persons
short, voice-activated commands (called trigger phrases)
in English. The English phrase is associated with a computerized
audio file of a complete, foreign-language sentence recorded
by a fluent speaker of that language. In less than a second,
the VRT repeats the command in the desired language. The
device can be equipped with either a headset or an adjustable
gooseneck microphone and has a bullhorn jack. It can be
kept in an officers shirt pocket or mounted on a citation
In the examples above, the Oakland officer who pulled over
the Spanish-speaking motorist might say the trigger phrase,
Too fast. The VRT would instantly repeat the
phrase in English for verification, and then issue the appropriate
full sentence in Spanish. Or the West Palm Beach officer
might say, You in pain? and the VRT will ask
the query in Haitian Creole. The VRT is programmed specifically
for such common policing matters as traffic stops, domestic
problems, lost children, and medical emergencies.
VRT prototypes have been used by a number of law enforcement
agencies, including police departments in Oakland, West
Palm Beach, and Nashville, Tennessee. Nashville Police Captain
Ken Pence told National Public Radio that the VRT was a
welcome innovation in his city, where police encounter some
20 languages on a daily basis.
Building the Prototype
The VRT employs sound analysis technology developed for
military and covert operations by the former Soviet Union.
When the USSR collapsed, IWT bought the rights to the Soviet
research, which formed the basis for the original, desktop
version of the VRT. This version, which was intended solely
to demonstrate the technology to NIJ, could translate only
asked IWT to come back with a device that was readily portable
as well as eyes free and hands free, a policing
necessity in emergencies. The result was the first generation
of the VRT. It measured about 6 inches by 6 inches by 4
This version was tested by the Oakland Police Department.
The field test showed that the unit had promise but that
it was still too bulky to fit comfortably in an officers
pocket and, more importantly, it did not consistently recognize
officers voice commands.
So IWT developed its second generation unit. That device
was still too big and did not perform adequately in high
Navy Joins Testing of Device
In addition to testing prototypes at various law enforcement
agencies, NIJ sent the VRT for independent analysis by the
U.S. Navy, specifically, NAVAIR. (See VRT
and the U.S. Military.)
Navy civilian psychologists, voice technology engineers,
and instructional systems designers conducted two studies
of the VRT, one in the lab and the other in the field. NAVAIR
tested several generations of the VRT, including IWTs
third generation device, which was the smallest yet. The
laboratory test, which was conducted in a sound studio,
determined that the units microphone did not perform
as well as off-the-shelf models. NAVAIR swapped the microphone
for the field tests.
The NAVAIR study found that many officers needed less than
one day to become comfortable with the VRT and that the
unit performed properly in all programmed languages. According
to NAVAIRs Dee Sheppe, the field test found that the
VRT is easy for people to learn how to use. It offers
a quick solution that can help an officer on the street
when he doesnt have a lot of resources. The small
size is an advantage.
The results of the field test were summarized in a December
2003 report. Twenty-seven VRT units were distributed to
law enforcement officers for a 3-week period. The VRT was
employed most frequently in traffic stops. Spanish was the
most frequently used language.
Overall, half of the officers found the VRT to be useful
and user friendly and reported that the device enabled them
to handle many situations that otherwise would have required
a translator. However, the other half of the officers surveyed
reported difficulty in operating the VRT and opted not to
NAVAIR attributed dissatisfaction with the device partly
to the fact that some officers are simply slow to adapt
to new technology. Thomas Franz, the NAVAIR psychologist
who led the field test, was not surprised to find a split
decision on the VRT. If you talk to police officers,
theres a given percentage who wont use pepper
spray. And there are other cops who swear by it. So theres
choice there by the individual officer. I get frustrated
because people say [the VRT] doesnt work for everybody.
Its a tool: some people will like it and some people
Some of the officers in the test may have been disappointed
because they expected the device to work as a two-way translator.
The VRT does not translate what a civilian says back to
an officer. It can, however, prompt an individual to nod
his or her head yes or no, show
identification, or direct him or her to write down an answer.
VRT AND THE U.S. MILITARY
With its large research and development budget, the Defense
Department has overseen the development of numerous technological
innovations. In fact, law enforcement often adapts technologies
from the military.
The VRT is an exception: NIJ shepherded its development
and the military adapted it for use in Iraq and elsewhere,
even though the Pentagon had been underwriting development
of a similar technology by a different vendor.
In the Naval Air Systems Command's (NAVAIRs) comparison
of the VRT, the Defense Department device, and another
similar device, the VRT was NAVAIRs choice. The
shortcomings of the VRT (lack of volume control, lack
of an auto-off feature, and lack of a PC link) could be
easily overcome with a specification for these features
included in the production/manufacturing requirements,
the report concluded.
The NIJ-NAVAIR collaboration introduced the VRT to the
Navy and Coast Guard. NAVAIR funded nautical versions
of the instruction booklet and command cards, and NIJ
authorized the use of four translators by the Navy. Programmed
with more than 200 commands organized into nine events,
the nautically trained VRT was tested on three Navy ships.
The Coast Guard, which frequently encounters foreign
speakers in boardings at sea, has purchased some 70 VRT
units. The devices were deployed to the Persian Gulf during
the Iraq war to warn foreign vessels away from oil rigs.
Likewise, IWT has developed a 34-page Operating
Instructions and Phrase List for use of the VRT
by the Marines. The Marine Corps has purchased 50 units
and plans to buy more. Marines who used the VRT in Iraq
have suggested the addition of numerous phrases, which
have been translated into Iraqi Arabic by the Defense
Language Institute and incorporated into the VRT commands.
In 2003, the U.S. Special Operations Command, which coordinates
all the military special forces, witnessed a VRT demonstration,
liked what it saw, and purchased 100 devices for use in
Features of the Device
The VRT can be programmed to translate into any language.
And once programmed, an officer can switch among languages
by voice command.
The VRT is speaker dependent, so it only works for the
particular officer or officers who trained it.
However, a single device can be trained to recognize the
commands of eight different officers.
Technology Has Limitations
The technology measures peakshighs and
lowsin an officers speech pattern. The precise
phrases spoken into it initially are what it will look for
in the future. So if an officers inflection or voice
pattern is altered by a stressful encounter, the VRT might
fail. And some officers find it difficult to say the same
thing twice with the same inflection.
An example of a problem that occurred in testing involved
a generally soft-spoken motorcycle cop. When asked to role-play
a traffic stop, the officer unknowingly assumed a more hard-edged
Robocop voice. Not surprisingly, this officer
was unable to get the device to work at all.
The VRT might also falter when used by an officer with
a distinctive ethnic accent.
During testing, a native Hebrew speaker was unable to operate
the device using English commands (but the same officer
had no problem recording trigger phrases in Hebrew).
Officers also reported that the microphone frequently failed
to pick up their voices. The microphone has to be positioned
precisely for the unit to work correctly.
Although the VRT generally performed well in noisy environments,
it had trouble recognizing commands that began with what
linguists term voiceless speech sounds, i.e.,
soft sounds formed without use of the vocal cords. (These
include Ch, F, H, K, P, S, Sh, T, Th, and Wh.) Voiceless
speech sounds were especially a problem for officers with
a sore throat or chest or head cold.
The fix is to alter the trigger phrases so they begin with
hard sounds that cause the vocal cords to vibrate. Whereas
P alone does not work at the start of a command,
the blend of Pl does. Similarly, rather than
train their devices to translate Hello, the
officers are instructed to change the trigger word to Greetings.
Making a Better VRT
A number of officers who used the VRT in the field test
reported forgetting the precise trigger phrases necessary
to operate the unit. This was especially a problem for phrases
used in less frequently encountered policing situations.
To address this limitation, officers carry color-coded Command
Cards that break trigger phrases into four categories:
Black for an event; Blue for paperwork, such as Car
Registration; Green for conversations; and Red for
emergencies. The Command Cards list key phrases within a
category sequentially. For example, traffic stop commands
begin with Turn off engine, Step out of
the vehicle, May I have your drivers license,
please, and so on.
The most frequently mentioned improvement sought by the
officers was to include a volume switch. They also noted
shortcomings in commands for dealing with certain common
situations and suggested the following additions: Driving
under influence, Please write date of birth,
the Miranda rights, and Permission weapons
search. Officers also asked for additional phrases
related to possible driving under influence encounters.
The NAVAIR report recommended the creation of an instructional
video on how to use the device, noting that officers generally
did not use the written instructions regarding vocal volume
levels or how to hold the device. Another possible improvement
would be to incorporate software in the VRT that would enable
users to readily add or modify trigger phrases. Currently,
the device comes loaded with trigger phrases and changing
them requires special training.
Commercialization and Cost
IWT president Tim McCune puts his companys investment
in the VRT at about $3 million over the last 10 years; NIJs
Office of Science and Technology contributed another $1
million. McCune believes the VRT is nearly ready to move
from the prototype stage to commercialization. He anticipates
that each VRT package will sell for $3,000. That includes
the translator, language modules, megaphone, cables, chargers,
training materials, and documentation. However, the price
will probably have to fall to around $1,000 before it is
widely procured by domestic law enforcement agencies.
There are numerous other potential markets for the VRTcorrections
officers, customs and immigration officials, persons disabled
with ailments like cerebral palsy, school personnelthat
could expedite commercialization and drive down per-unit
costs for law enforcement.
The Next Generation
A fourth generation VRT is now in use by a police department
in Kentucky. This latest version is 3 inches wide and 5
inches high. Although it consumes less battery power than
its predecessors, it has the capacity to store 125 languages
and 125,000 trigger phrases (although IWT does not anticipate
law enforcement needs to exceed 500 phrases).
The VRT has proven its utility to law enforcement, but
NIJ is also aware of its limitations. It is primarily being
used, at least initially, for everyday patrolling, including
pullovers, drivers license and registration checks,
and other relatively low-stress engagements.
Thus far, the VRT appears to work in every police situation
for which it was designed, from arrests to returning lost
children to their homes. As the device becomes more readily
available, the list of situations in which it can prove
useful is likely to continue
- Language Use and English-Speaking Ability,
Census 2000 Brief, U.S. Census Bureau, October 2003.
Available online at