Who trains the trainers?
What does faculty development for AI in Med Ed look like, when AI changes what teaching means?
The attending physicians, the program directors, the clerkship coordinators, etc — the people who deliver and govern medical education —were trained in a world where AI didn’t exist. Most of them have never used an AI agent. Many are just learning about LLMs and using them ad hoc. And we’re asking them to supervise learners who are using these tools daily, to assess competencies that didn’t exist two years ago, and to redesign curricula around technologies they’re still figuring out themselves.
This goes significantly beyond “AI literacy.” This is a need to develop faculty who are able to teach in an AI-forward world.
Given how fast AI is entering medical education, this is the elephant in the room. It is a faculty development urgency in medical education.
Here are some relevant thoughts (somewhat unstructured) that I had of what needs to be addressed in training a medical faculty on AI.
The learner almost definitely knows more about AI than the teacher
In most clinical teaching encounters in 2026, the learner knows more about AI than the teacher.
A multicentre survey of 293 health science faculty published last fall found that while 74.8% had positive attitudes toward AI, only 33.7% agreed on the reliability and accuracy of AI outputs. Knowledge was moderate at best. Formal training was rare. A separate survey of 128 faculty and 138 students found that most faculty self-identified as novice AI users with limited awareness and infrequent use. The top barriers: lack of knowledge, limited time, unclear benefits.
Meanwhile, their students are using ChatGPT to study for exams, Claude to draft notes, and NotebookLM to organize their clinical learning. The AMA’s November 2025 policy on AI literacy in medical education acknowledged this directly: peer-reviewed studies show trainees are already using AI tools and want formal training. The students are WAY ahead of the faculty.
Yet despite modern changes, medical education essentially remains apprenticeship-based: the expert teaches the novice by doing, then watching, then coaching. That model assumes the expert has more skill than the learner in the relevant domain. When the domain shifts to AI-mediated clinical work, that assumption collapses.
A Penn LDI panel with faculty from Stanford, Northwestern, and NYU put it bluntly: AI is moving so quickly that the generational knowledge gap between faculty and students is real and growing.
IMO we should design with this in mind. A training needs analysis at NYITCOM found students were significantly more familiar with generative AI than faculty (P < .001), yet both groups wanted the same thing: structured training and shared learning. The September 2025 Academic Medicine Macy supplement drives this home — a commentary in the proceedings argues that AI's pace outstrips the ability of medical schools and consensus organizations to adapt, so clinician-educators must engage with AI alongside their learners, not ahead of them. The Macy conference proceedings themselves recommend co-creating AI curricula with learners, not just for them. A student who built a NotebookLM workflow for ward prep has something to teach a clerkship coordinator. Clinical judgment still matters enormously, but the direction of learning about AI tools is, right now, bottom-up. The institutions that design for bidirectional learning will move fastest. (Josh Landy and I are trying to do some of this at SHN with Vibes 2).
The “double-supervision” problem
I think this highlights one of the core faculty development competencies that we need to build, which has no precedent in traditional medical education.
Here is the structure:
The learner is supervising the AI, functioning as human-in-the-loop.
The faculty member now is not just supervising the learner, but is also supervising the learner’s oversight and interaction with the AI.
Abdulnour and colleagues in New England Journal of Medicine last August presented a framework for clinical supervision of AI use that identifies this as a structural problem. There are two layers of oversight, and the person at the top (the attending) may be the least equipped to evaluate the AI layer.
We describe three failure modes: deskilling (the learner’s abilities erode from overreliance), mis-skilling (the learner internalizes AI-generated errors as correct practice), and neverskilling (the learner never builds the ability in the first place because AI was always doing it). The faculty member has to detect all three of these failure modes in the learner, while simultaneously evaluating whether the AI output itself was correct.
Think about that for a moment. An attending who has never used an AI agent is now expected to determine whether a learner’s AI-generated differential diagnosis is good, whether the learner appropriately modified it, and whether the learner could have generated it independently if AI was off. Without training.
It is not hypothetical. It is happening on clinical teaching units right now. Faculty are supervising learners who are using tools the faculty have never touched, producing outputs the faculty cannot independently evaluate.
The faculty development imperative here is concreate. Faculty need to be trained in the following:
When trainees need to modify AI outputs, and how they should do it. What does appropriate critical appraisal of an AI-generated care plan look like? What are the common failure patterns? Where does the AI tend to be confident and wrong?
How trust is developed with AIs and when to doubt it. Edge cases. Where hallucinations are more prone to occur. Why AI confidence does not equate to AI accuracy. How to probe the calibration level of the learner’s trust in AI.
When to program AI-off time for learners, and where it is critical to learn specific skills in the absence of the AI: History-taking? Physical examination? Differential diagnosis generation? Procedural decision-making? Empathic communication? Faculty need frameworks for deciding when AI augmentation helps learning and when it short-circuits it.
Determining how experiential use of the AI can teach the learner on top of of other methods (e.g. explainable AI), and when this should be facilitated. In endoscopy (which has been at the forefront of some of these studies) - this may even improve outcomes.
The paradox of supervision
In my last article, I wrote about Anthropic’s “paradox of supervision”: effectively supervising AI requires the very skills that atrophy from AI overuse. If a faculty member is using AI all the time for (for synthesis, for differential diagnosis, for documentation), how do they immunize themselves against falling into the traps of AI trust miscalibration.
Anthropic’s randomized controlled trial on AI-assisted coding found that developers using AI scored 50% on an independent skills quiz versus 67% for those coding by hand, with the largest gap on debugging. The implication: the oversight skill is harder than the execution skill.
In medicine, this means it’s not enough for faculty to know what a correct care plan looks like. They need to know what a subtly incorrect AI-generated care plan looks like, how it differs from the correct one, and what patterns of learner interaction with AI are indicative of “appropriate” versus “inappropriate” trust.
When writing this, the closest example I had was the EMR copy-forward problem: trusting the prior notes in the EMR, so you cut and paste sections into discharge plans and the new history note without reading it, and sometimes without even questioning the history. The penicillin allergy no one clarified. The medication list from 8 years ago. The information looks right because it is in the chart, and the chart has an authority that discourages questioning. You should NOT trust the chart uncritically, and teachers have to teach learners why.
AI outputs have the same problem, and it is amplified. LLMs are fluent, confident, formatted like expert reasoning. They carry an implicit authority that makes them harder to question than a colleague's opinion (i.e. a colleague might say "I'm not sure," and AI almost never does). Teaching learners to maintain appropriate skepticism toward AI-generated content (without becoming so skeptical that they dismiss genuinely useful outputs) is a critical clinical teaching competency. And it's one that faculty themselves need to develop through their own practice, training and reflection with these tools, not through a one-hour workshop In my opinion that is a critical clinical teaching competency.
What AI faculty development actually needs to look like
A didactic session on “AI in Healthcare” doesn’t build the competencies I’ve described. A policy document doesn’t change practice. And waiting for consensus will take years we don’t have.
Here’s what I think is needed:
1. Hands-on, experiential AI training — not literacy modules for all teachers
Faculty need to use AI tools, regularly, in their own work, before they can teach with them. Not read about them. Not watch demos. Use them. Write a clinic note with an ambient scribe. Generate a differential with an LLM and then critique it. Build a study guide with NotebookLM. Use an agent to do a literature search for a talk they’re preparing.
The NYITCOM training needs analysis found that both faculty and students wanted the same thing: structured, hands-on training. The barrier isn’t resistance, rather it is that we have yet to build the structured opportunities. The AAMC’s AI competencies for medical educators provide a framework. Now we need programs that operationalize it.
2. Co-learning environments that legitimize the knowledge inversion
We need to learn from the learners on AI. Design faculty development that explicitly pairs faculty with tech-fluent learners in structured co-learning. A resident who uses AI daily should be presenting at faculty development sessions. A medical student who built an AI workflow should be co-leading a workshop with the clerkship director.
This isn’t a radical idea. It’s how every other industry handles technology adoption: through reverse mentorship, paired learning, shared exploration. Medical education’s hierarchical culture makes this uncomfortable, but the Macy proceedings explicitly recommend it. The discomfort is the point.
3. AI supervision as a defined teaching competency
The double-supervision problem isn’t going away. It should be named, defined, and taught. Faculty need explicit training in:
Evaluating the quality of AI outputs in their clinical domain
Assessing whether a learner’s modifications to an AI output reflect sound clinical reasoning
Detecting the three failure modes (deskilling, mis-skilling, neverskilling) in observed clinical encounters
Designing assessment tasks that differentiate AI-assisted competence from independent competence
Calibrating their own trust in AI — modeling appropriate skepticism
This should be a defined domain of clinical teaching competency, not an afterthought. For teachers, things like promotion criteria, annual reviews, and teaching awards should reflect it.
4. Protected time and institutional commitment
Institutions that are serious about AI integration need to provide protected time for faculty to develop AI competencies — the same way they provide protected time for research or quality improvement. This means funded faculty development programs, not voluntary lunch-and-learns. It means making AI competency part of the academic mission, not an extracurricular interest.
5. Domain-specific AI failure libraries
Faculty need to know where AI fails in their specialty. I need to know why CADx may confuse hyperplastic polyps and serrated adenomas in endoscopy. A cardiologist needs to know that LLMs struggle with nuanced ECG interpretation. A surgeon needs to know that AI-generated operative plans can miss anatomical variants. A psychiatrist needs to know that AI can be confidently wrong about medication interactions in polypharmacy.
As part of quality assurance, we should be building specialty-specific libraries of AI failure cases. I am thinking like annotated and curated libraries of where AI was wrong, where it was subtly wrong, where it was right but for the wrong reasons. These are part of the teaching cases of the AI era. Faculty who’ve reviewed them can ask better questions when supervising AI-augmented learners.
6. Assessment reform that distinguishes AI-augmented from AI-independent competence
If we’re going to let learners use AI in clinical practice, we need to assess them both with and without it. This is a faculty development issue because faculty design and deliver assessments. They need to understand:
Which competencies should be assessed AI-free (and why)
Which competencies should be assessed with AI (because that’s how they’ll be practiced)
How to design assessments that probe clinical reasoning when the answer was AI-assisted
How to use OSCEs, oral examinations, and bedside assessments to evaluate the human contribution to an AI-augmented workflow
The Academic Medicine Macy supplement identifies the lack of assessment infrastructure as a key barrier. Faculty are the ones who have to build it, and they can’t build what they don’t understand. More on this to come in one of my next articles.
The urgency
Let me be direct about the timeline.
Stanford integrated AI across its medical curriculum starting fall 2025. The AMA adopted formal AI literacy policy in November 2025. The AAMC released AI competencies for educators. The NEJM published a supervision framework.
The field is moving. The frameworks exist. The evidence base is building.
What hasn’t caught up is the faculty. The 72.4% with no formal AI training. The majority who self-identify as novice users. The attendings on clinical teaching units right now who are supervising AI-augmented learners without any preparation for what that means.
Every month that passes without systematic faculty development widens the gap. Learners get more fluent with AI. Faculty fall further behind. The supervision mismatch grows.
This is not a problem we can workshop our way out of at an annual conference. It requires institutional investment, protected time, structured programs, and a willingness to acknowledge that the traditional hierarchy of expertise has, in this one domain, temporarily inverted.
The elephant in the room isn’t AI in medical education. It’s that the people responsible for delivery of medical education haven’t prepared to teach in an AI world. And we need to fix that now.


