AI Fails and What They Teach Us About Emerging Technology

These days, we’ve become all but desensitized to the miraculous convenience of AI. We’re not surprised when we open Netflix to find feeds immediately and perfectly tailored to our tastes, and we’re not taken aback when Facebook’s facial recognition tech picks our face out of a group-picture lineup. Ten years ago, we might have made a polite excuse and beat a quick retreat if we heard a friend asking an invisible person to dim the lights or report the weather. Now, we barely blink — and perhaps wonder if we should get an Echo Dot, too.

We have become so accustomed to AI quietly incorporating itself into almost every aspect of our day-to-day lives that we’ve stopped having hard walls on our perception of possibility. Rather than address new claims of AI’s capabilities with disbelief, we regard it with interested surprise and think — could I use that?

But what happens when AI doesn’t work as well as we expect? What happens when our near-boundless faith in AI’s usefulness is misplaced, and the high-tech tools we’ve begun to rely on start cracking under the weight of the responsibilities we delegate?

Let’s consider an example.

AI Can’t Cure Cancer — Or Can It? An IBM Case Study

When IBM’s Watson debuted in 2014, it charmed investors, consumers, and tech aficionados alike. Proponents boasted that Watson’s information-gathering capabilities would make it an invaluable resource for doctors who might otherwise not have the time or opportunity to keep up with the constant influx of medical knowledge. During a demo that same year, Watson dazzled industry professionals and investors by analyzing an eclectic collection of symptoms and offering a series of potential diagnoses, each ranked by the system’s confidence and linked to relevant medical literature. The AI’s clear knowledge of rare disease and its ability to provide diagnostic conclusions was both impressive and inspiring.

Watson’s positive impression spurred investment. Encouraged by the AI’s potential, MD Anderson, a cancer center within the University of Texas, signed a multi-million dollar contract with IBM to apply Watson’s cognitive computing capabilities to its fight against cancer. Watson for Oncology was meant to parse enormous quantities of case data and provide novel insights that would help doctors provide better and more effective care to cancer patients.

Unfortunately, the tool didn’t exactly deliver on its marketing pitch.

In 2017, auditors at the University of Texas submitted a caustic report claiming that Watson not only cost MD Anderson over $62 million but also failed to achieve its goals. Doctors lambasted the tool for its propensity to give bad advice; in one memorable case reported by the Verge, the AI suggested that a patient with severe bleeding receive a drug that would worsen their condition. Luckily the patient was hypothetical, and no real people were hurt; however, users were still understandably annoyed by Watson’s apparent ineptitude. As one particularly scathing doctor said in a report for IBM, “This product is a piece of s — . We bought it for marketing and with hopes that you would achieve the vision. We can’t use it for most cases.”

But is the project’s failure to deliver on its hype all Watson’s fault? Not exactly.

Watson’s main flaw was with implementation, not technology. When the project began, doctors entered real patient data as intended. However, Watson’s guidelines changed often enough that updating those cases became a chore; soon, users switched to hypothetical examples. This meant that Watson could only make suggestions based on the treatment preferences and information provided by a few doctors, rather than the actual data from an entire cancer center, thereby skewing the advice it provided.

Moreover, the AI’s ability to discern connections is only useful to a point. It can note a pattern between a patient with a given illness, their condition, and the medications prescribed, but any conclusions drawn from such analysis would be tenuous at best. The AI cannot definitively determine whether a link is correlation, causation, or mere coincidence — and thus risks providing diagnostic conclusions without evidence-based backing.

Given the lack of user support and the shortage of real information, is it any surprise that Watson failed to deliver innovative answers?

What Does Watson’s Failure Teach Us?

Watson’s problem is more human than it is technical. There are three major lessons that we can pull from the AI’s crash:

We Need to Check Our Expectations.

We tend to believe that AI and emerging technology can achieve what its developers say that it can. However, as Watson’s inability to separate correlation and causation demonstrates, the potential we read in marketing copy can be overinflated. As users, we need to have a better understanding and skepticism of emerging technology before we begin relying on it.

Tools Must Be Well-Integrated.

If doctors had been able to use the Watson interface without continually needing to revise their submissions for new guidelines, they might have provided more real patient information and used the tool more often than they did. This, in turn, may have allowed Watson to be more effective in the role it was assigned. Considering the needs of the human user is just as important as considering the technical requirements of the tool (if not more so).

We Must Be Careful

If the scientists at MD Anderson hadn’t been so careful, or if they had followed Watson blindly, real patients could have been at risk. We can never allow our faith in an emerging tool to be so inflated that we lose sight of the people it’s meant to help.

Emerging technology is exciting, yes — but we also need to take the time to address the moral and practical implications of how we bring that seemingly-capable technology into our lives. At the very least, it would seem wise to be a little more skeptical in our faith.