Computers Can Hear, butBy CIOinsight
Weak Speech Recognition Leaves Customers Cold
Problem: Companies need to take a more realistic approach to speech technologies.
Recently, a potential Amtrak customer called the company's automated phone system to get fare information. Here's how the conversation went:
"Hi!" exclaimed a recorded voice infused with welcoming, patient positivity. "I'm Julie, Amtrak's automated agent. Let's get started. What city are you departing from?"
"New York," the customer said.
"Hmm. I think you said Newark," Julie said. "Is this correct?"
"No," the customer said.
"Okay," Julie said. "Let's try again. What city are you departing from?"
"Manhattan," the customer replied.
"I think you said Meriden, Connecticut," Julie said. "Is this correct?"
Eventually Julie gave up and put the customer through to an actual human being.
While it's true that speech recognition systems have improved steadily over the past two decades, it has been a painfully slow progression. Often their use in call center applications seems specifically designed to annoy rather than to serve. Touch-tone systems are maddening enough, but trying to converse with artificially unintelligent digital drones can send a customer right over the edge.
Some experts say the reason why speech recognition has earned its bad reputation is that consumers have unfair expectations of what the software can do.
"You say 'speech recognition,' and consumers automatically expect HAL from '2001: A Space Odyssey,'" said Art Schoeller of Yankee Group Research Inc. And companies that use the technology tend to over-promise and under-deliver on that expectation; they create realistic personas to make customers feel as if they are speaking to a live agent.
So why would any company want to use this floundering technology? The answer is simple: to cut costs. Automated customer service (often referred to as "customer self-service"), whether speech-enabled or touch-tone, costs a fraction of the price of staffing call centers with live agents. And voice systems are designed to handle more complex transactions, such as travel reservations. According to Forrester Research Inc., customer service calls handled by automated systems cost an average of 20 cents per minute, compared with $7 per minute for live help.
But the Web has proven an even more effective tool for those complex kinds of customer interactions, and the speech recognition market has suffered as a result.
In 2000, the speech recognition market was $140 million and full of promise, but by 2004 it had slumped to just $117 million, according to Gartner Inc. Today, speech recognition seems to have settled into its relatively small call center nicheused for complex service calls from customers who don't have access to a PC.
Outside of the call center, speech recognition has limited traction. Companies are slowly automating processes that currently require the aid of actual humans, such as transcribing documents and processing forms. Consumers are also increasingly using speech technologies in their cars and on mobile phones. But it will be yearsprobably even decadesbefore the technology will meet our Star Trek-like expectations.
For now, companies considering speech deployments need to adopt realistic goals about what the technology can deliver, and focus on delivering an experience that customers can embrace.
Computers Can Hear, but
Still Can't Understand">
Understanding speech is harder for a computer than mapping the human genome.
Speech recognition is hard. In fact, it's arguably the most difficult thing a computer can do.
It took humans millions of years to develop our myriad languages. It's unrealistic to expect members of a relatively new species, like Julie, to understand them all in less than a decade.
Homonyms present a unique challenge. So do fuzzy cell phone connections and background noise. And the software often doesn't recognize words that are spoken too quickly, or said at the same time that the system is speaking to you.
But by far the biggest obstacle is that no two people speak the same waycomputers still have trouble with different dialects and accents, as well as speech impediments. Good for biometric identification, bad for speech recognition.
Taking all of these factors into account, it's no wonder that speech systems often seem to be more wrong than right.
"It's a nontrivial problem to recognize different sounds and figure out what's being said," said Walter Rolandi, a speech recognition consultant and president of the Voice User Interface Co. LLC in Columbia, S.C. "It's hugely complicated."
So why bother using speech at all? Touch-tone systems are sufficiently aggravating, but people seem to have gotten used to them. And with the advent of the Web, several complex customer service tasks can be handled sans human, even more cheaply than with touch-tone.
Still, Amtrak claims to have seen return on its $4 million investment in Julie within 18 months of installation. But according to Matthew Hardison, Amtrak's chief of sales distribution and customer service, the new system "is really to give customers alternatives for the most common reasons they calltrain status, schedules and fares, and simple reservations." He added, "Customers on the road will not have access to the Web, or may be waiting at an unstaffed station for a train to arrive and need to know when it is expected."
According to David Mussa, vice president of reservations at Wyndham Worldwide, the Dallas-based hotel subsidiary of $20 billion Cendant Corp., speech recognition software handles tasks that would be too confusing to attempt using touch-tone systemssuch as providing customers with hotel information and the ability to confirm or cancel reservations, a service it rolled out in October 2004.
With more than 100 hotels around the world, "To use touch-tones to provide that information would be ridiculous," Mussa said.
And it would be unrealistic to only provide that information online, he added, explaining, "Voice is still the largest reservation channel at Wyndham, and not everyone is at their PCs when trying to get information or book Wyndham hotels. With speech, you can accomplish tasks that cannot be done through touch-tone at all."
Wyndham reported that of the 2.5 million phone calls it receives each year, roughly 15 percent are completed without the caller ever speaking to a live agent.
Why Bother with Speech
Call centers are the obvious place to add speech-enabled systems, but be realistic.
At CDS Inc., a division of the $4 billion publishing giant Hearst Corp. based in Des Moines, Iowa, which handles subscriptions for roughly 350 magazines, Senior Product Manager Marc Francisco saw an opportunity to use speech software to automate simple functions.
"We started with an application that would let the customer identify him or herself, so that when they got to the service rep, he or she would have all the customer's information," Francisco said.
That alone reduced live agent call length by about 20 seconds, he said, which translated to roughly 10 percent in savings. Then the company expanded the service to handle change-of-address requests. "That's the top call we get," Francisco said, "and it can't be handled in a touch-tone environment because it's fairly complex."
Designing the user interface "was an art," Francisco said. "There is a lot of work that goes into writing the prompts and in determining how to guide the caller."
At first, for example, CDS' main menu gave callers a list of voice command options, such as "change address" or "renew subscription," but "callers were saying 'yes' instead of picking one of those choices," Francisco said. "We needed to work with speech experts to make sure the prompts were successful."
Because the change-of-address calls happen seasonally, automating that function reduced the need to hire and train temporary workers.
So how many customers prefer an automated agent over a live one? None. But when faced with the option of waiting several minutes for a live agent or being serviced immediately by a droid, 43 percent of customers will choose option two, according to Gartner.
Wyndham hasn't conducted a formal customer satisfaction survey, but Mussa said the proof of its success is in the number of calls the automated system handles. "We expected the system to handle 5 or 6 percent of the calls, but it's actually taking as much as 18 percent," he said. Of course, the system doesn't exactly make it easy to get to a live agent, even if you really want to.
In any event, it has allowed the company to cut 40 people from its service rep staff of 150, and shave nearly 15 percent from call center payroll. What's more, Mussa said, because the system handles the simple, routine calls, live agents can spend more time upselling other customers.
A handful of companies are looking to leverage speech applications outside the call center. At BNSF Railway Co., a subsidiary of Burlington Northern Santa Fe Corp. based in Fort Worth, Texas, Shannon McGovern, the firm's director of network support systems, oversees one of the company's seven speech applications, called Automated Train Reporting. "We needed a method of capturing train reporting information from about 10,000 conductors riding the trains," she said.
Typically, that information includes a car's location and contents. In the past, that data was delivered via telephone or radio to a live employee, who then keyed the information into an online database used by BNSF customers to track the location of their goods, and by BNSF to plan its operations. But inputting all that data took time, and often introduced errors into the system.
"Our customers need to know when they should schedule people to unload their containers, so the more timely the information we can provide, the better they can plan ahead," she said. Now, conductors relay the necessary data via mobile phone or radio to an automated system, which updates the database instantly.
But it wasn't easy, mainly because conductors operate in environments that aren't optimal for speech recognition. "You have the bell whistles at the crossings, not to mention the train engines. It's extremely noisy," McGovern said. It took several tries to get the system to work, and the company finally settled on a version that asks a series of very detailed questions and requires only brief input from the conductors. McGovern said roughly 75 percent of the calls are error-free. She couldn't estimate hard savings, but said the system has vastly improved customer service.
When Will It Work
? When Will It Be Worth the Work?">
As speech technologies improve, so do the possibilities.
It's difficult to envision what a truly speech-recognition-enabled world will look like. Will we be able to tell our televisions which shows we want to watch, or tell our alarm clocks to let us sleep for 15 more minutes?
Daniel Munyan, a biometrics expert and chief scientist of the global security solutions identity labs at Computer Sciences Corp., a $14.5 billion technology consulting firm based in El Segundo, Calif., predicts that speech patterns will replace traditional passwords in the future.
"Passwords are dead," he said. With voice as a biometric, "you create a password that's so long and complex that it can't be hacked in any amount of time that would give value to the defrauder." Plus, he added, "It's the only biometric that can be used remotely," so you can verify your identity over the phone. Munyan said he envisions a day in the not-so-distant future when Web sites will require voice samples to gain access.
Some companies imagine even greater possibilities. Miami Children's Hospital, for example, is piloting a project to equip operating rooms with speech software.
"One of the biggest hurdles for doctors is the accessibility of patient data," said Jeffrey White, the hospital's systems programmer, "particularly in areas where the environment is sterile, like an operating room during surgery, when a doctor's hands need to be on the patient."
A doctor wearing a small microphone can speak certain commands to the speech system, and the necessary information comes up on a screen, or is spoken back through speakers embedded in the ceiling.
"We have a strong belief that technology like this can change the future of health care for the better," White said.
Still, it will be a long time before speech recognition technology gets good enough for anything more complicated than simple, predefined verbal commands. "I wouldn't even like to guess how far we are from those kinds of applications," said Gartner analyst Steve Cramoysan.
For now, the next time you find yourself cursing at automated agents like Julie, calmly remind yourselfbefore you request a live agentthat she's doing her best. After all, she's only subhuman.