Nurse Dina Sarro didn’t know much about artificial intelligence when Duke University Hospital installed machine learning software to raise an alarm when a person was at risk of developing sepsis, a complication of infection that is the number one killer in US hospitals. The software, called Sepsis Watch, passed alerts from an algorithm Duke researchers had tuned with 32 million data points from past patients to the hospital’s team of rapid response nurses, co-led by Sarro.
But when nurses relayed those warnings to doctors, they sometimes encountered indifference or even suspicion. When docs questioned why the AI thought a patient needed extra attention, Sarro found herself in a tough spot. “I wouldn’t have a good answer because it’s based on an algorithm,” she says.
Sepsis Watch is still in use at Duke—in no small part thanks to Sarro and her fellow nurses reinventing themselves as AI diplomats skilled in smoothing over human-machine relations. They developed new workflows that helped make the algorithm’s squawks more acceptable to people.
A new report from think tank Data & Society calls this an example of the “repair work” that often needs to accompany disruptive advances in technology. Coauthor Madeleine Clare Elish says that vital contributions from people on the frontline like Sarro are often overlooked. “These things are going to fail when the only resources are put towards the technology itself,” she says.
Supersmart algorithms won’t take all the jobs, But they are learning faster than ever, doing everything from medical diagnostics to serving up ads.
The human-machine mediation required at Duke illustrates the challenge of translating a recent surge in AI health research into better patient care. Many studies have created algorithms that perform as well as or better than doctors when tested on medical records, such as X-rays or photos of skin lesions. But how to usefully employ such algorithms in hospitals and clinics is not well understood. Machine learning algorithms are notoriously inflexible, and opaque even to their creators. Good results on a carefully curated research dataset don’t guarantee success in the chaotic clockwork of a hospital.
A recent study on software for classifying moles found its recommendations sometimes persuaded experienced doctors to switch from a correct diagnosis to a wrong one. When Google put a system capable of detecting eye disease in diabetics with 90 percent accuracy into clinics in Thailand, the system rejected more than 20 percent of patient images due to problems like variable lighting. Elish recently joined the company, and says she hopes to keep researching AI in healthcare.
Election Sale. Don’t miss the future. Get 1 year for $10 $5. Subscribe Now
Duke’s sepsis project started in 2016, early in the recent AI healthcare boom. It was supposed to improve on a simpler system of pop-up sepsis alerts, which workers overwhelmed by notifications had learned to dismiss and ignore.
Researchers at the Duke Institute for Health Innovation reasoned that more targeted alerts, sent directly to the hospital’s rapid response nurses, who in turn informed doctors, might fare better. They used deep learning, the AI technique favored by the tech industry, to train an algorithm on 50,000 patient records, and built a system that scans patient charts in real time.
Sepsis Watch got an anthropological close up because the Duke developers knew there would be unknowns in the hospital’s hurly burly and asked Elish for help. She spent days shadowing and interviewing nurses and emergency department doctors and found the algorithm had a complicated social life.
The system threw up alerts on iPads monitored by the nurses, flagging patients deemed moderate or high risk for sepsis, or to have already developed the deadly condition. Nurses were supposed to call an emergency department doctor immediately for patients flagged as high risk. But when the nurses followed that protocol, they ran into problems.
Some challenges came from disrupting the usual workflow of a busy hospital—many doctors aren’t used to taking direction from nurses. Others were specific to AI, like the times Sarro faced demands to know why the algorithm had raised the alarm. The team behind the software hadn’t built in an explanation function, because as with many machine learning algorithms, it’s not possible to pinpoint why it made a particular call.
One tactic Sarro and other nurses developed was to use alerts that a patient was at high risk of sepsis as a prompt to review that person’s chart so as to be ready to defend the algorithm’s warnings. The nurses learned to avoid passing on alerts at certain times of day, and how to probe whether a doctor wasn’t in the mood to hear the opinion of an algorithm. “A lot of it was figuring out the interpersonal communication,” says Sarro. “We would gather more information to arm ourselves for that phone call.”
Elish also found that in the absence of a way to know why the system flagged a patient, nurses and doctors developed their own, incorrect, explanations—a response to inscrutable AI. One nurse believed the system looked for keywords in a medical record, which it does not. One doctor advised coworkers that the system should be trusted because it was probably smarter than clinicians.
Mark Sendak, a data scientist and leader on the project, says that incorrect characterization is an example of how Elish’s findings were more eye opening—and concerning—than expected. His team changed their training and documentation for the sepsis alert system as a result of feedback from Sarro and other nurses. Sendak says the experience has convinced him that AI healthcare projects should devote more resources to studying social as well as technical performance. “I would love to make it standard practice,” he says. “If we don’t invest in recognizing the repair work people are doing, these things will fail.” Sarro says the tool ultimately appeared to improve the hospital’s sepsis care.
Many more AI projects may soon enter the tricky territory Duke enountered. Amit Kaushal, an assistant professor at Stanford, says that in the past decade advances in machine learning and larger medical datasets have made it almost routine to do things researchers once dreamed of, like have algorithms make sense of medical images. But integrating them into patient care may prove more challenging. “For some fields technology is no longer the limiting factor, it’s these other issues,” Kaushal says.
Kaushal has contributed to a Stanford project testing camera systems that can alert health workers when they don’t sanitize their hands and says results are promising. Yet while it’s tempting to see AI as a quick fix for healthcare, proving a system’s worth comes down to conventional and often slow research. “The real proof is in the study that says ‘Does this improve outcomes for our patients?’” Kaushal says.
Results from a clinical trial completed last year should go some way to answering that question for Duke’s sepsis system, which has been licensed to a startup called Cohere Med. Sarro, now a nurse practitioner in a different health system, says her experience makes her open to working with more AI tools, but also wary of their limitations. “They’re helpful but just one part of the puzzle.”