This Technique Uses AI to Fool Other AIs

Changing a single word can alter the way an AI program judges a job applicant or assesses a medical claim.

Artificial intelligence has made big strides recently in understanding language, but it can still suffer from an alarming and potentially dangerous, kind of algorithmic myopia.

Research shows how AI programs that parse and analyze text can be confused and deceived by carefully crafted phrases. A sentence that seems straightforward to you or me may have a strange ability to deceive an AI algorithm.

That’s a problem as text-mining AI programs increasingly are usedto judge job applicants, assess medical claims, or process legal documents. Strategic changes to a handful of words could let fake news evade an AI detector; thwart AI algorithms that hunt for signs of insider trading; or trigger higher payouts from health insurance claims.

“This kind of attack is very important,” says Di Jin, a graduate student at MIT who developed a technique for fooling text-based AI programs with researchers from the University of Hong Kong and Singapore’s Agency for Science, Technology, and Research. Jin says such “adversarial examples” could prove especially harmful if used to bamboozle automated systems in finance or health care: “Even a small change in these areas can cause a lot of troubles.”

Jin and colleagues devised an algorithm called TextFooler capable of deceiving an AI system without changing the meaning of a piece of text. The algorithm uses AI to suggest which words should be converted into synonyms to fool a machine.

To trick an algorithm designed to judge movie reviews, for example, TextFooler altered the sentence:

“The characters, cast in impossibly contrived situations, are totallyestranged from reality.”

To read:

“The characters, cast in impossibly engineered circumstances, are fully estranged from reality.”

This caused the algorithm to classify the review as “positive,” instead of “negative.” The demonstration highlights an uncomfortable truth about AI—that it can be both remarkably clever and surprisingly dumb.

Researchers tested their approach using several popular algorithms and data sets, and they were able to reduce an algorithm’s accuracy from above 90 percent to below 10 percent. The altered phrases were generally judged by people to have the same meaning.

Machine learning works by finding subtle patterns in data, many of which are imperceptible to humans. This renders systems based on machine learning vulnerable to a strange kind of confusion. Image recognition programs, for instance, can be deceived by an imagethat looks perfectly normal to the human eye. Subtle tweaks to the pixels in an image of a helicopter, for instance, can trick a program into thinking it’s looking at a dog. The most deceptive tweaks can be identified through AI, using a process related to the one used to train an algorithm in the first place.

Researchers are still exploring the extent of this weakness, along with the potential risks. Vulnerabilities have mostly been demonstrated in image and speech recognition systems. Using AI to outfox AI may have serious implications when algorithms are used to make critical decisions in computer security and military systems, as well as anywhere there’s an effort to deceive.

A report published by the Stanford Institute for Human-Centered AI last week highlighted, among other things, the potential for adversarial examples to deceive AI algorithms, suggesting this could enable tax fraud.

At the same time, AI programs have become a lot better at parsing and generating language, thanks to new machine-learning techniques and large quantities of training data. Last year, OpenAI demonstrated a tool called GPT-2 capable of generating convincing news stories after being trained on huge amounts of text slurped from the web. Other algorithms based on the same AI advances can summarize or determine the meaning of a piece of text more accurately than was previously possible.

Jin’s team’s method for tweaking text “is indeed really effective at generating good adversaries” for AI systems, says Sameer Singh, an assistant professor at the UC Irvine, who has done related research.

Singh and colleagues have shown how a few seemingly random words can cause large language algorithms to misbehave in specific ways. These “triggers” can, for instance, cause OpenAI’s algorithm to respond to a prompt with racist text.

But Singh says the approach demonstrated by the MIT team would be difficult to pull off in practice, because it involves repeatedly probing an AI system, which might raise suspicion.

Dawn Song, a professor at UC Berkeley, specializes in AI and security and has used adversarial machine learning to, among other things, modify road signs so that they deceive computer vision systems. She says the MIT study is part of a growing body of work that shows how language algorithms can be fooled, and that all sorts of commercial systems may be vulnerable to some form of attack.