The AI agent-augmented physician
What software engineering already knows... that medical education needs to learn ASAP!
We watched it happen in real time for coding
I can’t write code. Not really. I understood logic from when I coded in QBasic as a kid. I can fumble around in Python.
That all changed with Claude Code. Now I supervise an AI agent that writes code, debugs systems, manages infrastructure, and sends messages across platforms. It built a multi-channel chatbot. It architected a browser relay. I didn’t write any of that code. I described what I wanted, reviewed what it produced, caught the errors, and decided when to ship.
This is what software engineers now call “vibe coding”—describing intent and letting AI agents handle execution. When Andrej Karpathy coined the term in early 2025, he meant it half-playfully, especially when he said “the hottest new programming language is English”. Less than a year later, vibe coding is now agentic engineering: “you are not writing the code directly 99% of the time, you are orchestrating agents who do and acting as oversight.” The practice evolved fast. AI now writes 30% of Microsoft’s code and more than a quarter of Google’s. Claude Cowork was completely coded by Claude Code. Across the software industry, the developer’s role has undergone a fundamental shift: from writing code to supervising agents that write code.
The most productive developers in 2026 aren’t the ones who type the fastest. They’re the ones who are best at reviewing agent-generated code, catching subtle bugs, and knowing when the agent’s confident-sounding output is wrong. The World Economic Forum at Davos 2026 declared software developers “the vanguard” of how AI is redefining work, noting that 65% expect their role to be fundamentally redefined this year. Anthropic’s own internal research, published in December 2025, confirmed this: their engineers reported becoming more “full-stack”—able to succeed at tasks beyond their normal expertise—while worrying about the atrophy of the very skills needed to supervise effectively.
One Anthropic engineer put it bluntly: “I am for sure atrophying in my skills as a software engineer... But those skills could come back if they ever needed to, and I just don’t need them anymore.”
Does this de-skilling sound familiar? It should. Because medicine is next.
The transition has already started
Agents are excellent at execution. The clinical tasks that constitute the bulk of physician work (history-taking, differential diagnosis, care plan generation, documentation, follow-up coordination, guideline application) are increasingly within agent execution capability. This is no longer speculative.
In June 2025, Ferber and colleagues published a landmark study in Nature Cancer demonstrating an autonomous AI agent for clinical decision-making in oncology. Evaluated across 20 realistic multimodal patient cases, their agent autonomously selected appropriate clinical tools with 87.5% accuracy, reached correct clinical conclusions in 91% of cases, and accurately cited relevant oncology guidelines 75.5% of the time. Compared to a standalone large language model, the integrated agent improved decision-making accuracy from 30.3% to 87.2%.
Read that again. An agentic system — not a chatbot, but an autonomous agent that decides which tools to use and in what order—matched or exceeded the clinical reasoning of many oncology trainees. And that was eight months ago. Deloitte reports that 61% of healthcare executives are already building agentic AI initiatives, with 85% planning to increase investment in 2026.
The Lancet Regional Health published a perspective in October 2025 arguing that clinical AI now warrants an entirely new physician role. The authors argue we need physicians whose primary competency is the clinical deployment and oversight of AI systems, not physicians who happen to use AI on the side.
Meanwhile, 8VC’s Vision for Healthcare AI in America, published in December 2025, proposes three levels of clinical AI autonomy: AI that assists the physician, AI that acts under physician supervision, and fully autonomous AI. Their framework is useful but, I’d argue, insufficiently granular.
The question isn’t whether agents will do this work. They will. Assume it. Execution in broad strokes in medicine will be done by agents.
The question is whether physicians will transition from executors to supervisors the way developers did—and this is an education paradigm.
Here’s how the transitions map:
Developer describes feature intent → agent writes code → developer reviews
Physician describes clinical intent (”manage this patient’s heart failure”) → agent generates care plan → physician reviews
Agent runs automated tests → developer reviews failures
Agent runs differential against evidence base → physician reviews edge cases
Agent handles boilerplate (CRUD, documentation) → developer focuses on architecture
Agent handles documentation, orders, follow-up scheduling → physician focuses on judgment callsDeveloper catches agent hallucinations: subtle logic errors that compile but produce wrong behavior
Physician catches agent hallucinations: plausible-sounding plans that miss contextual factorsAgent flags anomalies in CI/CD → developer decides: real problem or noise?
Agent flags lag of progress in management plan → physician decides: signal or noise?Developer manages multiple repos and agents simultaneously
Physician supervises multiple agents simultaneously
The cognitive shift is the same. From “do the thing” to “verify the thing was done correctly.”
The skill profile changes the same way. The best doctors in an agent-mediated world won’t be the ones who memorize the most. They’ll be the ones who catch the most errors and know when to override. This analogous to what Gergely Orosz described in his blog The Pragmatic Engineer when he wrote about the “grief” developers feel watching AI write most of the code—and the realization that their value has shifted from production to quality assurance, architecture, and knowing what should be built.
The productivity gain is similar, too. One developer plus agents now does what a team of five did two years ago. One physician plus agents could potentially manage patient panels that would be impossible solo. This is already the implicit promise of agentic AI in healthcare—not replacing physicians, but amplifying what each physician can oversee.
Where the analogy breaks
It’s not a perfect analogy.
Software can be rolled back. Patients can’t. You can revert a git commit. You can’t easily revert a wrong medication. Simon Willison argued persuasively that hallucinations in code are the least dangerous form of LLM error—you catch them the moment you run the code. But even in software, this is proving optimistic. IEEE Spectrum reported in January 2026 that AI agents increasingly produce “silent failures”—code that appears to run successfully but quietly removes safety checks or produces subtly incorrect output. This is critical because in medicine, there is no software compiler. The silent failure may be noticed in hours, days, or never.
ECRI, the independent patient safety organization, named misuse of AI chatbots as the #1 health technology hazard for 2026. Not a theoretical risk—the top hazard, above cybersecurity, above surgical robotics, above everything else. Their concern isn’t that AI will make errors. It’s that clinicians will trust AI outputs without adequate verification, and that the systems lack the transparency needed for effective oversight. Again - this uncovers an AI education need for clinicians.
And then there’s the deskilling data from endoscopy that we’ve spoken about before. A study published in The Lancet Gastroenterology & Hepatology in August 2025 found that after endoscopists had been routinely using AI-assisted colonoscopy, their adenoma detection rate in non-AI-assisted procedures dropped by 20%—from 28.4% to 22.4%. This is the first study suggesting that AI exposure has a measurable negative impact on patient-relevant clinical endpoints. The AI didn’t just help. It made the doctors worse when the AI wasn’t there.
The moral stakes are categorically different. Software bugs are costly. Clinical errors harm people. The supervision model has to be qualitatively more rigorous.
The patient relationship resists automation. No one cares whether an agent wrote their app. Patients care deeply about who is making decisions about their health. The therapeutic alliance—trust, presence, cultural attunement, the hand on the shoulder in a difficult conversation—may be the one domain that cannot be supervised from a distance. It must remain irreducibly human.
There is no regulatory framework. Software has version control, audit trails, and blame-traceable commits. Medicine has nothing designed for agent-supervised care. A July 2025 article in Nature Medicine argued that autonomous AI agents in healthcare are already advancing beyond the scope of current US and European medical device regulations. The Lancet Digital Health has called for new frameworks. We’re building the plane while flying it.
The spectrum of autonomy is not binary
The 8VC framework’s three levels (AI assists, AI acts under supervision, AI acts autonomously) are a useful starting point, but my lived experience with coding agents suggests the reality is far more granular. In software right now:
Some code the agent writes, you glance at and merge. Low-risk, well-tested patterns. Boilerplate.
Some code you review carefully, line by line. Complex logic, security-sensitive areas.
Some code you wouldn’t let the agent near. Novel architecture, critical systems.
And crucially—you may not always know in advance which category a task falls into.
In medicine, this maps to:
Glance and approve: Refill a stable patient’s lisinopril → agent drafts, physician approves. Like merging boilerplate code.
Review carefully: Adjust an insulin regimen in a complex diabetic patient → agent proposes, physician examines every parameter. Like reviewing complex logic.
Fully human: Goals-of-care conversation with a dying patient’s family. Like novel system architecture—no agent should be anywhere near this.
Override and take control: A patient presenting with something the agent hasn’t seen before → physician takes over entirely. Like an edge case that breaks the framework.
The critical new competency is knowing which mode you’re in—and shifting fluidly between them. This is what Steve Hasker, CEO of Thomson Reuters, was getting at in Fortune when he argued that AI demands a complete rethink of the apprenticeship model for knowledge professionals. The traditional model—learn by doing, supervised by someone who’s done it longer—breaks down when the “doing” is increasingly performed by agents!
The apprenticeship crisis
Brynjolfsson, Chandar, and Chen at Stanford’s Digital Economy Lab published “Canaries in the Coal Mine” in August 2025, documenting a 13% relative decline in employment for entry-level workers (ages 22–25) in AI-exposed occupations like software engineering, compared to stable or rising employment for older workers in the same roles. Stack Overflow’s analysis, “AI vs Gen Z,” published in December 2025, explored the same phenomenon: if agents can handle junior-level tasks, what happens to the pipeline that produces senior expertise?
Medicine should be paying very close attention. Even with more modern GME/PGME pedagogies like simulation, our entire training model is essentially still apprenticeship-based. Medical students learn by doing—taking histories, writing notes, generating differential diagnoses, formulating care plans—under graduated supervision. If agents start doing those tasks, the learning opportunities evaporate. You can’t develop clinical judgment if you never make clinical decisions. You can’t learn to catch AI errors if you never learned to do the task yourself.
This is Anthropic’s “paradox of supervision,” translated to medicine: effectively supervising clinical AI requires the very clinical skills that may atrophy from AI overuse.
Mehta and colleagues, writing in Perspectives on Medical Education in November 2025, argued that physicians must be “not replaced, but reinvented”—that medical education needs new pathways emphasizing systems thinking, interdisciplinary collaboration, and AI literacy. Stanford embedded AI across its entire medical curriculum starting fall 2025. The AMA adopted formal AI literacy policy at its November 2025 interim meeting. These are good starts. But I think they understate the urgency. The transition in software engineering happened in roughly 18 months. Medicine won’t have a decade to figure this out.
What medical education must do now
If the physician’s role is shifting from executor to supervisor, medical education needs to change in at least four concrete ways:
1. Teach “AI review.” Not Can the learner evaluate an agent-generated care plan? Can they spot the subtle error? Can they distinguish a confident hallucination from correct reasoning? This is a teachable, assessable skill. We need an EPA (Entrustable Professional Activity) for it — and fast.
2. Build simulated agent failures into training. In software, developers are tested on debugging, not just writing code. Medical education needs simulated scenarios where the agent gets it wrong and the learner has to catch it. Not obviously wrong—subtly wrong. The kind of plausible error that a tired resident at 2 AM might wave through. This is where the real danger lives. In general age
3. Define the boundary: “no-agent zone.” Even the most agent-reliant developer doesn’t let the agent handle authentication and security for critical systems. What are the clinical equivalents? Goals-of-care conversations. Breaking bad news. Navigating cultural complexity. The physical exam findings that require a human hand and human presence. We need to identify the minimum irreducible core of physician work that must remain fully human—and protect it fiercely.
4. Assess calibration, not just competence. The critical skill isn’t “can you do the task.” It’s “do you know when to trust the agent and when to override?” Over-trust and under-trust are both failure modes. We have no assessment framework for this. We need to build one before the technology outpaces our training.
Scarborough as a test bed
I’m not writing this from the sidelines. At Scarborough Health Network, we’re a clinical partner of SAMIH—the Scarborough Academy of Medicine and Integrated Health—a new medical school at the University of Toronto Scarborough, with inaugural classes launching in 2026 and an undergraduate medical program in 2027.
Medical training is LONG - four years of undergraduate education then residency and possible fellowship. The first cohort of SAMIH students will graduate into a world where agents are deeply embedded in clinical practice. We have the opportunity to train the first generation of “agent-supervised physicians”: doctors whose core competency is not just executing clinical tasks, but supervising agents that execute clinical tasks while maintaining the irreducibly human work of medicine.
At SHN we’ve already started. Last summer, through our RISE program’s Summer of Vibes, students with minimal coding experience built clinical AI prototypes using vibe coding—the same paradigm shift I described above. They weren’t writing code. They were managing agents that wrote code. And they produced working clinical tools: an AI-powered colonic growth classifier, a medical training evaluation system, a virtual standardized patient, an emergency department digital twin.
These students are the proof of concept. They demonstrated that the agent-supervision model works, that non-technical users can be effective supervisors of AI systems, and that the results can be clinically meaningful. The methodology has since been published in Medical Science Educator as a replicable framework for health professions education. Now we need to embed this into the formal curriculum.
The honest uncertainty
I’ll end where I always end: with what we don’t know.
We don’t know the minimum irreducible core of physician work. We don’t know which clinical muscles atrophy safely and which are load-bearing. We don’t know what happens to the physician-patient relationship when the doctor is primarily a supervisor of AI systems. We don’t know whether the “neverskilling” problem—trainees who never develop skills because agents did the work first—is catastrophic or manageable.
But we know the transition is happening, because we’ve already watched it happen in software engineering. The Fortune 500 is already worried about finding talent with enough critical thinking to oversee AI systems. The Anthropic engineers are already debating whether to deliberately practice without AI to keep their skills sharp. Cory Doctorow has already warned about “reverse centaurs”—humans reduced to rubber-stamp appendages for AI systems, clicking “approve” without meaningful review. He called it “automation blindness”: what happens when you’re asked to repeatedly examine the output of a generally correct machine and somehow remain vigilant for its errors. Humans aren’t built for that. And Stack Overflow’s 2025 Developer Survey confirms the tension: 80% of developers use AI tools, but more actively distrust their accuracy (46%) than trust it (33%). Only 3% report “high trust.” The number-one frustration, cited by 45%, is dealing with AI solutions that are “almost right, but not quite.” In software, “almost right” ships a bug. In medicine, “almost right” harms a patient.
Medicine can learn from software’s transition, or it can repeat every mistake with higher stakes. The question isn’t whether physicians become agent supervisors. It’s whether medical education leads that transition or gets dragged along behind it.
I know which one I’m betting on. We’re building it in Scarborough.




It will be interesting to see how AI will be leveraged in medicine over the next 3-4 years. In the meantime, it is in our best interest to stay abreast of the facts and get good at using it.