This Algorithm Doesn’t Replace Doctors—It Makes Them Better

An artificial intelligence system has outperformed physicians when detecting skin lesions. The results are changing how one school trains dermatologists.

DERMATOLOGIST HARALD KITTLER draws on more than a decade of experience when he teaches students at the Medical University of Vienna how to diagnose skin lesions. His classes this fall will include a tip he learned only recently from an unusual source: an artificial intelligence algorithm.

That lesson originated in a contest Kittler helped organize that showed image analysis algorithms could outperform human experts in diagnosing some skin blemishes. After digesting 10,000 images labeled by doctors, the systems could distinguish among different kinds of cancerous and benign lesions in new images. One category where they outstripped human accuracy was for scaly patches known as pigmented actinic keratoses. Reverse engineering a similarly trained algorithm to assess how it arrived at its conclusions showed that when diagnosing those lesions, the system paid more than usual attention to the skin around a blemish.

Kittler was initially surprised but came to see wisdom in that pattern. The algorithm may detect sun exposure on surrounding skin, a known factor in such lesions. In January, he and colleagues asked a class of fourth year medical students to think like the algorithm and look for sun damage.

The students’ accuracy at diagnosing pigmented actinic keratoses improved by more than a third, in a test where they had to identify several types of skin lesion. “Most people think of AI as acting in a different world that cannot be understood by humans,” Kittler says. “Our little experiment shows AI could widen our point of view and help us to make new connections.”

The Viennese experiment was part of a wider study by Kittler and more than a dozen others exploring how doctors can collaborate with AI systems that analyze medical images. Since 2017, a series of studies have found machine learning models outperform dermatologists in head-to-head contests. That has inspired speculation that skin specialists might be wholly replaced by a generation of AutoDerm 3000s.

“The chances these things are going to replace us are very low, sort of unfortunately. Collaboration is the only way forward.”

PHILIPP TSCHANDL, ASSISTANT PROFESSOR OF DERMATOLOGY, MEDICAL UNIVERSITY OF VIENNA

Philipp Tschandl, an assistant professor of dermatology at Medical University of Vienna who worked on the new study with Kittler and others, says it’s time to reframe the conversation: What if algorithms and doctors were colleagues, rather than competitors?

Skin specialists plan treatments, synthesize disparate data about a patient, and build relationships in addition to looking at moles, he says. Computers aren’t close to being able to do all that. “The chances these things are going to replace us are very low, sort of unfortunately,” he says. “Collaboration is the only way forward.”

Operators of paint shops, warehouses, and call centers have reached the same conclusion. Rather than replace humans, they employ machines alongside people, to make them more efficient. The reasons stem not just from sentimentality, but because many everyday tasks are too complex for existing technology to handle alone.

WIth that in mind, the dermatology researchers tested three ways doctors could get help from an image analysis algorithm that outperformed humans at diagnosing skin lesions. They trained the system with thousands of images of seven types of skin lesion labeled by dermatologists, including malignant melanomas and benign moles.

One design for putting that algorithm’s power into a doctor’s hands showed a list of diagnoses ranked by probability when the doctor examined a new image of a skin lesion. Another displayed only a probability that the lesion was malignant, closer to the vision of a system that might replace a doctor. A third retrieved previously diagnosed images the algorithm judged to be similar, to provide the doctor some reference points.

Tests with more than 300 doctors found they got more accurate when using the ranked list of diagnoses. Their rate of making the right call climbed by 13 percentage points. The other two approaches did not improve doctors’ accuracy. And not all doctors got the same benefit.

Less-experienced doctors, such as interns, changed their diagnosis based on AI advice more often, and were often right to do so. Doctors with lots of experience, like seasoned board-certified dermatologists, changed their diagnoses based on the software’s output much less frequently. These experienced doctors only benefited when they reported being less confident, and even then the benefit was marginal.Keep Reading

The latest on artificial intelligence, from machine learning to computer vision and more

Tschandl says this suggests AI dermatology tools might be best targeted as assistants to specialists in training, or physicians like general practitioners who don’t work intensively in the field. “If you have been doing this for more than 10 years, you don’t need to use it, or shouldn’t, because it might lead you to the wrong things,” he says. In some cases, experienced physicians negated a correct diagnosis by switching incorrectly when the algorithm was wrong.

Those findings and the experiment in Kittler’s dermatology class show how researchers might develop AI that elevates rather than eliminates doctors. Sancy Leachman, a melanoma specialist and professor of dermatology at Oregon Health & Science University, hopes to see more such studies—and not, she says, because she fears being replaced.

“This is not about who does the work, human or machine,” she says. “The question is how do you successfully use the best of both worlds to get the best outcomes.” AI that helps general practitioners catch more melanomas or other skin cancers could save many lives, she says, because skin cancers are highly treatable if detected early. Leachman adds that it will likely be easier to get doctors to embrace technology designed to enhance and build on their work than to replace it.

The new study also included an experiment that highlights the potential dangers of that embrace. It tested what happened when doctors worked with a version of the algorithm tweaked to give incorrect advice, simulating faulty software. Clinicians of all levels of experience proved vulnerable to being led astray.

“My hope was that physicians would be robust to that but we saw the trust they had in the AI model turned against them,” Tschandl says. He’s not sure what the answers might be, but says future work on medical AI needs to consider how to help doctors decide when to distrust what the computer tells them.