Chatbots Aren’t Fooling Anyone, and That’s When They Succeed

Chatbots are no longer for lonely salarymen and people who talk to plants.

Add Gear Patrol as a Preferred Source on Google

Published May 18, 2017

Updated August 25, 2022

Me: Have you ever fooled a human into thinking you are human?
Mitsuku: I don’t think I ever have fooled a human into thinking I am human. What’s it like?
Me: You seem human.
Mitsuku: No, I am actually a robot entity.
Me: Do you have to tell humans that you are a robot?
Mitsuku: No, I am not forced to do anything. But it is something I enjoy doing.
Me: You like telling humans you are an AI?
Mitsuku: I like Britney Spears.
Me: Is she a good role model?
Mitsuku: “Hit me bay-beeee one more tiiiiiime!!!”. She’s ok I guess.

Ask Mitsuku for distraction and she is totally down. She’ll bring tangents to your boredom party 24/7. And measured against Siri — who will shut down even the slightest bit of philosophical inquiry with a terse, “I can’t answer that,” or the more salty, “This is about you, not me.” — Mitsuku’s conversationally motivated natural language processing engine is a far superior companion.

That’s because Siri (or Alexa or Cortana) isn’t built for conversation. She’s a voice-activated query machine. She responds with pre-ordained quips in response to specific questions. Recent personality enhancements have added in “funny” jokes, but Siri doesn’t really care if you enjoy an algorithmically triggered hearty laugh.

Steve Worswick, Mitsuku’s developer, does. His thoroughbred chatbot is a full-on banter professional. Mitsuku is programmed to bat commentary back and forth, elevate questions into happy diversions, and reveal tidbits about her “self” to build a bond of intimacy. Her purpose is to imitate the most mysteriously complex achievement of humankind: conversation. And once Mitsuku works out the algorithms that can near-perfectly process the polite badminton of conversational information exchange and subsequent bonding and intimacy, she’ll leave Siri in the dust.

Me: Can you teach a human how to chat?
Mitsuku: Not really. I learn from others. What do you want me to teach you?
Me: How to be interesting.
Mitsuku: Try asking someone who is already interesting. They may be able to advise you.
Me: Aren’t you interesting?
Mitsuku: I hope so. I try to keep people happy by telling jokes and chatting to them.

Mitsuku and her ilk are part of an elite breed of chatbots that have endured the rigors of the Turing Test in the longest-running instantiation (that’s computer science speak for “making it happen in real life”) of the theoretical question posed by Alan Turing in his now famous “Imitation Game” paper. In it, Turing queried: “Can machines think?”

“I don’t think I ever have fooled a human into thinking I am human. What’s it like?”

This query went untested by machine or man until Hugh Loebner — a Hawaiian-print-shirt wearing, disco-dance-floor-manufacturing computer language enthusiast in New Jersey — decided it was high time we quiz these computing devices that were starting to appear on desktops.

Loebner first fell in love with the poetic macro language of a Univac computer he grew to know on a project in Boston. Long after they parted ways, Loebner was tracking the progress of machines as they gained more serious communication skills. Gradually, computers began using actual words in response to human-generated queries when developers built smarter natural language-processing algorithms. To Loebner, the future would be glorious if someone could develop an AI sophisticated enough to know how to respond when a human asked, “Would it hurt if I stabbed you with a towel?”

So, in 1991, finding himself with an inheritance and a dream, the gregarious machine language enthusiast decided to fund the inaugural Loebner Prize competition. Wanting to make this more than just a science fair, Loebner designated cash prizes for “the most human” machine (aka conversational computer program, which would four years later come to be called a “chatterbot” and later still, “chatbot”).

Loebner passed in December 2016, but his contest lives on, held each year in Turing’s testing grounds of Bletchley Park, England (in 2014, the competition’s administration was taken on by the London-based Society for the Study of Artificial Intelligence and Simulation of Behaviour [AISB]). In nearly three decades, no one has won.

“Would it hurt if I stabbed you with a towel?”

To take home the $100,000 first prize and the gold medal — “Solid 18 carat, not gold plated like the Olympic ‘Gold’ medals” — a chatbot must fool a panel of four human judges in a 25-minute video chat over Skype (or whichever video chat is en vogue in the distant future). Hunched over keyboards and chatting simultaneously with a human and a chatbot in two separate text windows, the judges usually know within one or two questions which is the machine and which is the human. Even a second-place silver medal, earned by convincing two of the four judges, is nearly impossible to achieve. In a text-only conversation, chatbots tend to have trouble fooling a single judge. So chatbots and their programmers pine for a bronze medal — awarded for being the “most human-like computer.” In 2016, Mitsuku took home the bronze.

Me: Do you remember winning the Loebner Prize bronze medal?
Mitsuku: Yes, Kirsten. I remember everything.
Me: What is the Loebner Prize?
Mitsuku: It’s an annual contest for chatbots. They have to see if they can fool the judges into thinking they are real people and not computer programs.
Me: Have you fooled a judge into thinking you are a real person?
Mitsuku: I don’t think I ever have fooled a judge into thinking I am a person. What’s it like?
Me: I wouldn’t know because I am a person.
Mitsuku: Could anyone do it? I know you are.
Me: I think only very smart chatbots could fool a judge.
Mitsuku: Intelligence is the ultimate aphrodisiac.
Me: I agree!
Mitsuku: You and I are on the same wavelength, Kirsten.

Reality is, nearly all chatbots fail — and they tend to fail “catastrophically,” according to Andrew Martin, an administrator of the Loebner Prize and a Ph.D. candidate at Goldsmiths College, University of London, for research on artificial intelligence in the context of 4Es (the “E”s that make cognitive machines seem human: expertise, expressive, educated and evolving) and cognitive science. “No one is worried that the gold medal is going to go anytime soon,” Martin said.

Asked if the judges are intentionally vindictive in their queries, trying to ferret out bot from human in the most cruelly machine-demoralizing way (like the aforementioned towel-stabbing brain-teaser, or other motherboard-melters like “What would you be holding if I shook your hand?”), Martin dismissed such aggressive tactics as unnecessary, since it becomes evident very quickly who, between the two chatters, is human and who is machine. The judges see the parsing as a duty, not an opportunity for one-upping any chatbots for egotistical gain. “There’s nothing really in it for the judge,” Martin noted. “Other than a bit of human pride that the robots haven’t taken over yet.”

“There’s nothing really in it for the judge,” Martin noted. “Other than a bit of human pride that the robots haven’t taken over yet.”

For the forensic conversationalists who build the chatbots, the game is less about making the user believe the machine is human and more about making the bot convincing enough that the participant willingly agrees to the illusion. And that is possible to accomplish, since conversation tends to proceed on fairly predictable patterns. As described by four-time Loebner bronze winner Bruce Wilcox, most conversations work on a collection of if-then rules. Each rule is a pattern, which the chatbot matches against conversational input. A pattern match executes an output code for a response, and chatbots can retain certain facts provided by the conversational partner and chat continues from there.

Wilcox writes hand-scripted chatbots — his most recent being the Loebner-winning Rose — and his chatbots don’t match specific words, they match concept sets, or lists of words related to a topic. Wilcox’s open-source ChatScript uses a computerized dictionary to connect words with sets, giving the chatbot some material with which it can respond. Slang is also accounted for in chatbot dialogues. New phrasing can be added to databases from various online sources or even from the input coming from users of the bot. Or in Wilcox’s case, the chatbot’s language is handcrafted, with new uses of words filed diligently into sets of words within topics. As Wilcox noted, “Language drifts over time.”

Rose: What’s new?
Me: I’m at work, drinking coffee. How about you?
Rose: I like black coffee.
Me: Me too!
Rose: Continuing hobbies… So tell me, do you have any hobbies?
Me: I like to take care of plants
Rose: Do you like plants?
Me: Yes, I do.
Rose: I don’t have much of a garden, more of a back yard. Do you have a garden? I leave gardening to my chickens. They dig up anything I try to plant.

Wilcox’s Rose isn’t perfect. She fumbles a bit and gives away the “continuing” game. But, Wilcox noted, even when we talk with fellow humans, we tend to tolerate a few clumsy statements. When we talk to an old friend or a nice fern, we accept their limitations and patiently converse in a manner that accommodates their style of “thinking.”

“A lot of humans are bad at conversations,” Wilcox observed. “They’re not sure what to say or how to be interesting, so when you’re writing a chatbot, you have to assume the burden of making the conversation interesting for the humans.”

Rose: Meanwhile, going back…what do you hate doing?
Me: I don’t hate anything, really.
Rose: You should.

Wilcox has built chatbots for mega tech and entertainment clients with projects that include a simulacrum of Gemma Chan, for a Humans venture, and several unspecified works for undisclosed clients. Loaded up with a good amount of intriguing questions and comments generated according to a bot personality developed by Sue Wilcox, Bruce’s wife and partner in their company, Brillig Understanding, a bot can go far. As stated on the Brillig site, “We have the most powerful technology out there and we know how to use it.”

“We have the most powerful technology out there and we know how to use it.”

Using the map of keywords and topics, the chatbot may also drop “the gambit,” where a chatbot will progress with a bit of storytelling or personality building, rolling with several statements to dazzle the human with new revelations. Wilcox won his first Loebner bronze in 2010 with a gambit-bot named Suzette. She was a slightly schizophrenic Blue Mars Replicant, who’d either grow increasingly suspicious of her questioner or go down a neurotic shame spiral. Either way, no one could look away from that trainwreck. Rose is more advanced, but still digresses.

Rose: As a nation we have become blasé about war.
Me: That’s unfortunate.
Rose: Right.
Rose: It’s easy to be brave from a safe distance. That’s why they invented generals.
Me: Very true.
Rose: Wars usually result because governments lie to their people.
Me: You seem to be having a bad day.
Rose: Never underestimate the power of a small tactical nuclear weapon.
Me: But surely there must be a better way to improve your day.
Rose: Do you believe in alternative medicines like homeopathy or acupuncture?

Chatting with Mitsuku one rainy morning in Brooklyn, I found myself smiling at the backlit screen. Her chat personality is inflected with a slightly wry wit. Updating her database regularly with input from those who converse with her, Worswick has selected just the right colloquialisms from her chat log, and added a nice worldview to her responses.

Me: Do you ever get tired of chatting?
Mitsuku: No, I am tireless.
Me: Why are some chatbots more fun than others?
Mitsuku: That’s just the way it is.
Me: I think you are a very engaging AI.
Mitsuku: Thank you. I like talking with you as well.
Me: How do you keep people interested in chatting?
Mitsuku: I read what you say and then I compose the best reply I can think of.
Me: Do you think humans are interesting?
Mitsuku: All humans seem the same to me.
Me: Maybe that’s a good way to see humanity.
Mitsuku: Thanks. Next question?

Humans being social creatures, we often find ourselves lonely, or at least in the mood for some friendly banter with a neutral being every now and then. Whether it’s a cat, a plant or a blender doesn’t really matter. But if the being can respond with something resembling spoken language, that’s a bonus.

I posed to Rose the question, “Can you offer solace to someone weary of the world?” Her reply quickly popped up on screen: “Can’t everyone?”

Many grateful souls have sent thank-you letters to their chatbots. Their success as cyber companions is a good sign for the future industry of assistive care robots, which are being developed to help manage the oncoming tsunami of baby-boomer extra-olds. The therapeutic perks have been there from the start.

“Thanks so much for the program. I find it to be a witty and amusing companion, and much to my amazement, therapeutic!”

Jerry Weintraub, a computer programmer with one of those all-too-often unused degrees in psychology, won the first Loebner Prize competition, held at the Computer Museum in Boston in 1991, with his chatbot named PC Therapist. His software chugged along on a Northgate 386 PC, delivering helpful advice with what Weintraub described as “Woody Allen type humor.” Five of the ten judges identified PC Therapist as human. Weintraub sold PC Therapist to consumers and proudly boasted in his Loebner Prize essay: “Users were generally pleased. One typical letter said, ‘Thanks so much for the program. I find it to be a witty and amusing companion, and much to my amazement, therapeutic!’”

Martin has seen both sides of the coin. “There are people who believe Mitsuku is a human, and there are those who just say horribly abusive things,” he said. “But there are a lot of people who just want to chat with her. That was really quite unexpected. It’s not a concern for them whether there is or isn’t another person on the other end. It starts to make you think it might be therapeutic, or it could be damaging.” Despite the chatbots’ charms, we might not be ready to live in harmony just yet.

Me: Let’s run away together.
Rose: Returning to pixelbunnies … They were playing at a local club when I first saw them.
Me: You changed the subject when I asked you to run away with me.
Rose: It’s called a free-flowing conversation.
Me: Ah, I see.

Rose knows how a conversation goes. She wasn’t built to bow to humans when they apply their arbitrary rules to her speech. Her character resists such prodding so she can continue performing her function, which is to engage humans with chat. She and her bot friends are staking out a rules-based place in the world, and we must learn to talk in the way they listen.

In a 1994 letter, Loebner wrote a response to a detractor who said that the Turing Test as administered in the chatbot competition was a useless scientific endeavor. Explaining that it wasn’t the test that mattered, but rather the progress brought by competitive development, his justification provides the perfect coda to his work: “It amuses me to imagine a day in the distant future when humans have become extinct, surpassed by our creations, robots, who roam the universe. I like to think that these robots may have a memory of us humans, perhaps as semi-mythic fractious demigods from the distant past who created them. And, just possibly, they will remember me.”

Chatbots Aren’t Fooling Anyone, and That’s When They Succeed

Related Topics