When Good Chatbots Go Bad

Spread the love

In this week’s issue of New Scientist magazine, I have a feature article that takes a look at the evolution of chatbots, or computer programs that are designed to have conversations with real people. While there are a wide variety of chatbots out there, from fun, entertainment-oriented ones such as Cleverbot to customer service agents such as Shaw’s “Ask Amy,” my story specifically looks at how these programs are starting to go bad – or the chatbots that are being used by hackers and criminals.

Viewing the story requires a subscription, but fortunately it’s free. Go here to check it out.

As with all such features, there was a lot of material left on the cutting room floor. One particular aspect, which actually served as the intro to the story in an earlier draft, was the story of Roman Yampolskiy, a researcher and assistant professor at the University of Louisville who has applied human-like biometrics to chatbots.

Yampolsky and his colleagues actually studied commercial bots such as Jabberwacky to see if they exhibited writing styles, the same way that human writers often do. They found that they do indeed and that they can be identified using this technique, which is also employed by police forensic researchers.

What’s equally as fascinating is how Yampolskiy got interested in the topic. It all had to do with a bit of an obsession with online poker. Here’s his tale, from the early version intro to my article:

Roman Yampolskiy used to love playing chess and poker online. He relished the idea of pitting his skills against other players, but finding willing opponents of similar skill near him in the real world was tough. The internet, on the other hand, provided an instant wealth of suitable partners to square off against.

But then something ruined it for him: Robots.

As a PhD candidate in computer science and engineering at the University of Buffalo, he couldn’t help but notice that many of his poker opponents exhibited peculiar behaviours. Many would be online at all hours of the day, they’d click buttons in exactly the same spot every time or they’d ignore chat requests.

After some simple monitoring, he concluded that at least half the players in a typical game were bots, or software programmed with the rules of poker. In higher-stakes rooms, these bots were even more sophisticated – they were able to carry on conversations and pass themselves off as real people. For actual human players, that was grossly unfair.

“What makes them unethical is that people combine multiple bots into a single system and they share private card information, so you’re not really playing against five bots, you’re playing against a team of five bots,” says Yampolskiy, who is is now an assistant professor at the University of Louisville’s cybersecurity lab.

“If you have a population of very good artificial poker players, you’re not going to get many humans playing. They don’t like losing.”

Bots aren’t ruining just internet poker. Left unchecked, researchers and security experts say they have the potential to turn people off many online activities, including gaming, chat rooms and social networking sites. With nefarious bots that lure people into giving up personal data popping up in all of these places, and with the improving state of such applications, the impetus to develop tools and techniques to differentiate between humans and robots is growing.

Researchers are thus trying to create a better Turing test, the exam proposed by British artificial intelligence pioneer Alan Turing more than 60 years ago. Turing, who would have been 100 this June, thought that machines could be distinguished from humans by quizzing them through conversation. His test is still years away from being beaten, but bots are routinely fooling human judges in more limited versions of it, such as where conversations are restricted to a certain topic. Like, say, poker.

Yampolskiy’s discovery sparked an interest in the emerging field of virtual biometrics, or the detection of robots through human-like traits, such as writing styles and language usage. In April, Yampolskiy and his Louisville colleagues presented a paper at the Midwest Artificial Intelligence and Cognitive Science Conference detailing an experiment that charted the writing style of 11 well-known chatbots.

Using several years of transcripts submitted for the Loebner Prize, an annual contest that tests chatbots’ ability to pass as humans, the group looked for certain words and patterns. They found that several bots, particularly Alice and Jabberwacky, did indeed exhibit stylistic traits – much like human writers – that allowed them to be accurately identified.

Complicating the experiment, however, was the fact that several of the chatbots evovled in style as their creators improved algorithms or as programmed learning functions kick in, resulting in “behavioural shift.”

“That’s what makes this problem even more difficult. It’s not enough to establish an initial profile, you have to keep up with changes as time progresses in the style of those bots as well,” Yampolskiy says.

“If the bot gradually learns and changes over a period of years, we can keep up with that. If, all of a sudden, someone replaces all source codes with new ones and now it’s a completely different bot, obviously we won’t be able to do much about it.”