The AI stack for the medical student or resident
If you are a medical student or resident, you need to be using AI. March 2026 version
If you are a medical student or resident in 2026, you need to be using AI.
A recent public survey from earlier this month found that 93.9% of respondents rated formal physician training on AI limitations as essential or very important, and 85.2% agreed that AI training should be mandatory in medical school. And the manner of learning has changed. A recent mixed-methods study in BMC Medical Education found that medical students are increasingly learning in ways that are personalized, flexible, self-directed, and technology-mediated. A recent review in JMIR Medical Education argues that AI integration has to be judged against form, context, instructional mode, and fit with the educational objective, not just enthusiasm for tools.
That does not mean every learner should throw clinical judgment at an LLM-chatbot and cut and paste replies for assignments.
But it does mean AI tools used responsibly may augment learning and prepare you for an AI-forward future clinical environment,
So here it is. As of March 2026 this is the AI stack that I would recommend00.
Note the typical AI caveats apply (don’t put PHI or other information you don’t want others to know into public systems; this is good as of this date, but this field moves fast, so it is easily outdated).
Finally — most students ALREADY know or use these tools. So this article may be more for the teachers and faculty — to know what AI tools your learners are using!
1. The agent layer
Agents are emerging but I think they are going to take over. Here is the key difference from chat: chat is good when you want a response. Agents are good when you want work done.
Agents can move across files, tabs, tools, and time. They can inspect a folder, compare websites, write code, use tools, run programs, and execute plans. An example would be the AI scribe that non only writes your note, but puts in your prescriptions or orders, writes a patient-facing summary, waits for the human to approve, and then sends everything where it needs to go.
While the educational opportunity is obvious - the risk is super high also due to neverskilling and other ills. And it is an easy trap to fall into: Taylor and colleagues’ ethnographic study of preclinical case-based learning showed that students often used AI to automate concept retrieval and problem-solving, rarely cross-checking false-but-plausible outputs, and were more likely to offload cognitive effort when the task structure rewarded completion rather than learning (Taylor et al., 2026). So be wary when using agents if it is something you have to learn yourself.
So what are the general purpose agents that should be in the stack for the medical learner?
Claude Code
Claude Code is Anthropic’s local agentic coding and workflow tool. This is where I would start. The official overview describes setup. The barrier to entry has reduced substantially as Claude Code is available in web version also. On current official pricing, Claude Pro is $20/month. Anthropic’s Max page and help article position Max at $100/month for the 5x tier and $200/month for the 20x tier. For most learners, Pro is the sensible starting point.
If you aim for terminal instead of web, you’ll need to be comfortable with installation, some WSL, and definitely use of command line interface (CLI) (see below).
(this interface was what we all used pre-1995 BTW)
Claude Code still is the hot agent of 2026 to date. The name is poorly representative of what it can do. It isn’t just for coding. It is for executing. Take a whole bunch of hand written notes on day 1 of your rotation - Claude Code can turn that into a rotation handbook. Hand-write the names of the podcasts you want to listen to — Claude Code can make a Spotify list for you and play it.
I don’t have enough experience with the other AI agents (I’ve dabbled in Codex and just seen videos of Perplexity Computer) but they strike me as similar, with Perplexity Computer having the additional advantage of also being web native.
OpenClaw
OpenClaw was the rage shortly after the Christmas break but due to complexity of installation and concerns regarding privacy, I don’t think it is where most medical learners should start, but it is worth watching closely. It is an open personal agent that can run on your own machine or on cheap infrastructure you control, that you can interact with through your phone (say through WhatsApp) or any computer. It “does” brilliantly, with skills and enhancements. I write about my OpenClaw experiences here and here.
Openclaw’s cost structure is different from the commercial subscriptions. The software itself is open. The main costs are the hardware or hosting you choose and the model access you connect to it. A Raspberry Pi setup as roughly a $35-80 one-time cost, but it can run virtually - e.g. an Oracle free tier option, low-cost VPS options such as Hetzner at about $4/month, and other paid hosting paths such as Fly.io at roughly $10-15/month depending on usage. Then you still need the model side, which could be Anthropic, OpenAI, any number of other models, or even a local model. The API costs can escalate rapidly, particularly for the “smarter” APIs.
Despite the barriers, the OpenClawification of the agent space if real. Openclaw creator Peter Steinberger joined OpenAI earlier this year and you can see the fingerprints of OpenClaw across all the new agentic offerings. It’s powerful stuff. I would frame OpenClaw as a watch list tool for advanced learners and clinician-developers, not as the default recommendation for every student. But it is the clearest example in this stack of where personal agents are heading.
2. The evidence layer
OpenEvidence is far and away the only AI tool I use for evidence.
What is OpenEvidence? It is a free AI tool for physicians and learners that answers clinical questions in natural language, with every response grounded in and cited to peer-reviewed literature. Think UpToDate meets ChatGPT — you get a conversational interface but with answers tethered to real evidence rather than probabilistic guesses. It’s already used daily by a majority of North American physicians. For the learner, it’s a rapid way to look up diagnostic criteria, treatment algorithms, or tricky management questions with sourced answers in seconds. Just remember: click through to the cited paper and verify before it informs any clinical decision.
This is the tool I would use to realize why we do certain things in medicine; for “just-in-time” teaching before rounds; for when you want to compare evidence behind management options; or for when you want to build a “first-pass” summary before turning it into a polished handout or presentation. I use it every day.
Caveat - as with any LLM - watch out for hallucinations, and double check from the source cited.
3. The study layer
NotebookLM is Google’s retrieval augmented generation (RAG) framework attached to an LLM. Think of it as a multimedia textbook you build, that you can ask question to, and generate other outputs (e.g. flashcards) that you can use for teaching.
The value of NotebookLM is not that it lets you ask anything. The value is that it lets you ask ONLY inside a corpus you chose on purpose. In my earlier piece on RAG in medical education, I argued that the move is not to replace the curriculum with a model, but to wrap the curriculum in retrieval. That same logic applies at the learner level.
A novice learner does not need undifferentiated information. A learner needs the right syllabus, guidelines, landmark papers, teaching slides, and notes in one place. That is why curated sources matter so much. If the source set is sloppy, the notebook will be sloppy. If the source set is sharp, the notebook becomes genuinely useful.
This is where I would use NotebookLM to build a GI bleed notebook from the endoscopy manual for the hospital and society guidance; a journal club notebook with the paper, editorial and related sources; a board-review notebook tied to explicit objectives for a topuc; or a commute-time audio recap of what I wanted to learn this week.
4. The chat layer
ChatGPT and Claude still matter; I still use them all the time.
They shouldn’t be used as authority substitutes (e.g. ChatGPT said this). Used ideally - I think they are (a) rubber duck debug tools and (b) articulation tools. Medical learners improve when they ask, question, explain, defend, rephrase, present, and try again. In programming, running through problems as if you are talking to a rubber duck when debugging is well described. This is brilliant for medical learning as well. As your questions, pivot, wonder why, problem solve, do it with voice. That is why a general conversational layer still belongs in the stack.
There is no question chatbot-LLMs are formally entering into medical curricula. Stanford’s late-2025 curriculum rollout framed AI as a way for students to practise diagnostic questioning at any time, while emphasizing that learners still need to judge whether the output is inaccurate or biased (Stanford Medicine, 2025). A 2026 analysis of AI-generated geriatric case studies found that platforms can create simulation material efficiently, but quality remains uneven (Ruggiano et al., 2026). A 2026 study of a customized educational chatbot in physician assistant training found gains in active learning, retrieval practice, and individualized feedback (Bogenschutz et al., 2026).
Cost matters here too — more on that in a later article.
5. The visual layer
Gemini, including the image-editing model commonly referred to as Nano Banana, belongs in the stack.
Where it becomes genuinely useful is explanation. Anatomy, mechanisms, procedural steps, trial schematics, room setup, and patient counseling all depend on making relationships visible enough to inspect.
This is the layer I would use to create mechanism diagrams for teaching, patient-facing visuals, workflow diagrams, procedural step maps, or slide-ready explanatory schematics. The boundary is clear: use visual AI to externalize and explain, not to replace radiology, pathology, or image-based clinical judgment.
6. The builder layer
Medical education does not currently have all the tools learners need. But no worries You can build them with these AI tools even without coding knowledge. This is why I think the stack should include coding agents like Claude Code, Codex, and Replit.
We do not have enough good local guideline copilots. We do not have enough useful personal learning dashboards. We do not have enough intelligent block-specific study tools. We do not have enough good communication practice environments. Learners used to have to wait for vendors or institutions to build those things. They no longer do.
I think this “developer” or “builder” layer is still under-taught in medicine. A December 2025 JMIR Medical Education viewpoint argued that medical education needs more formal innovation training. A 2026 internal medicine residency innovation curriculum found that residents saw innovation as relevant to practice but had limited prior exposure to AI, app development, or implementation work (Shair et al., 2026). That gap is real. It is also newly bridgeable. I will write separately about the clinician-developer, because that deserves its own piece. But the short version is this: learners who can not only use AI, but also shape local infrastructure, will matter disproportionately.
Replit is the fastest mainstream path from idea to working app if you do not want to manage a local setup first. On the current official pricing page, Starter is free, Replit Core is $20/month billed annually and includes $25 of monthly credits. For most learners, that makes the practical sequence obvious: explore on Starter, build seriously on Core.
(Replit ran everything we built during last year’s Summer of Vibes program at SHN with undergraduate students — 20+ apps — and its gotten WAY better since then).
7. Incoming - the “rep” layer
I also think one of the most important forthcoming categories is programmable virtual patients. Not generic chatbots pretending to be patients, but practice systems where the educator can set the disease, emotional tone, language, hidden agenda, case evolution, and assessment rubric. Many groups are building toward this. Penn Medicine has described work in this direction. The Medical College of Wisconsin has described ChatClinic. A 2025 npj Digital Medicine paper described a generative AI teaching assistant for personalized learning. Our emerging product is SharpenEDU.
Virtual patients linked to curriculum fit well with mastery learning. The learner can take the program with its branch points, and practice until minimal performance standards are met. It can emulate what is being done every day. They can we used at home or anywhere for practice. Feedback is instantaneous. Even hallucinations are a feature — real patients deviate from scripts all the time.
Cost
Cost is a real problem for the learner, as these systems can add up. My default would be one strong general-purpose ecosystem subscription. That might mean Claude Pro if you want Claude plus Claude Code, or ChatGPT Plus if you want OpenAI’s broader stack and Codex access. Then I would add OpenEvidence, and NotebookLM if you have institutional Google access. I would not start by paying for Perplexity Max or building an OpenClaw unless there was a specific use case demanding it. For now - but these things change quickly.
The through-line is this: none of these tools replace what you need to learn — they change how you learn it. I would view this stacks the new learning scaffold, not as a scaffold. Use agents to execute, evidence tools to ground, notebooks to organize, chat to think out loud, visuals to explain, and builders to make what doesn't exist yet. The students who thrive won't be the ones who used AI the most — they'll be the ones who understood when to lean on it and when to put it down and think. Start with one layer, get good at it, then add the next.
And the AI stack will keep changing. Your judgment is what compounds.










Really appreciated this framework, Dr. Grover. The layered structure makes the stack legible in a way that most AI-in-medicine writing does not.
I am Allen Li, an oncologist in community practice. I have been running a YouTube series and Substack called Oncology AI Lab where I stress test AI on real world inspired oncology cases, including Claude, OpenEvidence, and DoxGPT specifically. Your recommendations resonated with me.
What I keep finding is that the evidence layer is only as strong as the clinician evaluating the output. The failure modes are not always obvious hallucinations. Sometimes the model is confident and well-formatted and still missing an FDA-approved option, or anchoring to the wrong tumor paradigm entirely. The “click through and verify” step is especially hard for those early in training.
Your line about thriving learners being the ones who know when to put it down and think really stands out. That judgment, in oncology at least, takes a long time to build. That is the gap I am trying to make visible.
Would love to connect if you are thinking about the subspecialty clinical depth layer at some point.
YouTube: youtube.com/@oncologyailab.