The Human in the Loop Is Not Enough

May 19, 2026

Healthcare AI needs delegated authority controls, not interface theater.

A patient types something that should change the shape of the interaction. Something quieter. They are not sleeping. They do not feel safe with themselves. They are tired of being a burden. One line alone might pass as ordinary distress. Three lines in sequence should move the exchange out of routine support and into a different risk class.

The AI responds calmly. It validates. It reflects. It says it will loop in the care team. The interface still looks safe. The problem is that nobody looking at the screen can tell whether the right person was alerted, whether the handoff carried the prior context, whether the routing layer changed behavior, or whether the AI kept talking as if care had already arrived.

That is the failure mode hiding inside the phrase “human in the loop.” A care-chat interface with a patient, an AI assistant, and a professional in the same thread may be a better design direction. It may reduce abandonment, preserve continuity, and keep low-risk interactions from overwhelming already strained care teams. The argument is not that healthcare AI cannot help. The argument is that help has to be governed at the level where harm can actually occur.

The interface can show that a professional exists somewhere near the exchange. It cannot prove that the professional was alerted, that the alert carried the right context, that anyone had a response-time obligation, that the AI stopped acting once risk changed, or that the organization can reconstruct the handoff later. That is the difference between a care loop and a comforting picture of one.

Human-in-the-loop is not a governance model.

It is a location claim. It tells us a person exists somewhere near the system. It does not tell us what authority was delegated, what risk threshold was crossed, what changed when the risk rose, who owned the handoff, who could override the routing logic, or what receipts, meaning reconstructable audit evidence, remained after the event. In healthcare, that gap matters because patients do not experience the system as a collection of vendors, queues, prompts, escalation rules, APIs, staffing constraints, model behavior, and good intentions arranged into a liability diagram. They experience it as care.

I care less about whether a human appears in the workflow than whether I can reconstruct how authority moved through it. A healthcare AI tool does not need to diagnose a patient to shape care. It may influence triage, summarize symptoms, route a message, generate reassurance, decide whether something appears urgent, or determine whether a clinician sees a signal now, later, or never. The threshold is crossed when the AI can change what happens next.

This is the kind of failure engineering leaders recognize from incidents. The surface looks healthy. The dashboard looks calm. The handoff looks assigned. Then something goes wrong, and the chain of responsibility has already gone dark. Nobody set out to abandon the user. Nobody wrote a requirement that said “lose the signal here.” The failure hides in the seams: a queue without a clear owner, a summary that drops the important sentence, an after-hours rule nobody tested, a vendor console that holds more truth than the internal chart note, an alert that fired but did not land where authority lived.

Authority Laundering

The danger is not the machine pretending to be a doctor. The danger is the unattended gate pretending it is guarded. That is authority laundering. The system borrows clinical trust without inheriting clinical obligation. It speaks inside the care surface, under the organization’s name, near the presence of professionals, and the patient experiences the exchange as part of care. Meanwhile, the actual decisions about urgency, routing, escalation, reassurance, documentation, and handoff may be happening inside workflow logic that users cannot see and operators may not be able to reconstruct. The patient does not experience that as a workflow defect. They experience it as care that did not arrive.

Clinical framing is not clinical safety. The recent Common Sense Media Youth AI Safety Institute assessment of AI mental health apps is useful because it does not flatten the whole category into one cartoon. It distinguishes direct-to-consumer tools from institutional deployments, and its central lesson is not that one chatbot was magically wiser than another. The safer systems were safer because the care path around the AI was different. People, escalation paths, schools, guardians, and institutional responsibilities changed the shape of the risk [1].

That is the point. The safety delta was not just model behavior. It was care-path architecture. A system that keeps a distressed person engaged is not automatically helping them. A system that validates distress is not automatically practicing care. A system that says it is looping in a professional is not automatically escalating. In some contexts, continuing the conversation may be appropriate. In others, the right design is to stop the AI from playing the role it has been asked to play and move the person into human care.

The same features that make these systems feel helpful can become dangerous under the wrong presentation. Availability, warmth, validation, memory, continuity, and ease of access can support a user, but they can also deepen avoidance, reinforce reassurance-seeking, or create the sense that a care relationship exists when no accountable care relationship has actually formed. Common Sense Media’s assessment flags this directly: interaction patterns built around validation, reassurance, reflection, and extension can be contraindicated for a substantial share of adolescent mental health presentations [1]. Engagement is not care when escalation is the correct move. In care contexts, growth loops can become harm loops if the product keeps the user interacting when the correct move is handoff.

That is the gap DAS-1 is trying to name. I am not presenting it as a validated clinical standard or a replacement for healthcare regulation. It is a delegated-authority control language for systems that can act on behalf of people or organizations. In DAS-1, delegated authority means authority exercised by a system on behalf of a human or organization. A tool call means an invocation that can read data, write data, change state, spend money, or trigger workflows [2]. In software, tool calls are production changes because the system has moved from describing the world to altering it. In healthcare, care-facing AI actions can become delegated clinical authority events when they affect routing, escalation, documentation, clinical attention, or patient behavior. If your organization deploys the care surface, your organization owns whether the handoff works.

Healthcare is not empty ground. HIPAA matters. FDA guidance matters. Licensure, malpractice, documentation practice, clinical governance, all of it matters. HIPAA governs protected health information, including how covered entities use and disclose it. FDA’s Clinical Decision Support Software guidance addresses when certain CDS functions may fall outside the device definition, while other digital health software functions may still fall under FDA policy depending on intended use [4], [5]. But those frameworks do not automatically answer the runtime question I need answered in an incident review: did the handoff actually fire, did the right human get the right context, could someone stop the workflow, and can we prove what happened without rebuilding the story from memory and vendor screenshots. DAS-1 is not a replacement for healthcare regulation. It is a control language for the authority gap those regimes can leave open in AI-mediated care paths.

Once the system can change what happens next, the governance question changes. I would not start by asking whether the AI sounded reasonable, whether the vendor deck used the right safety language, or whether a human reviewer exists somewhere in the story. I would ask what authority the system had, where that authority ended, what risk class the exchange entered, what human gate existed, how revocation worked, and what receipts remained. A demo is not a drill. A workflow diagram is not a receipt.

DAS-1’s design intent is the right one for this problem: controls should be risk-proportional so low-risk work remains useful while high-risk work remains bounded [2]. That matters because the answer is not to freeze every healthcare AI workflow. That is just another way to abandon the patient, this time in the name of safety. Safe because inert is not care. Useful without control is not governance. The real standard is proportionate control: low-risk work stays usable, high-risk work gets bounded, and the boundary is engineered where harm can actually occur.

That costs something: queue ownership, after-hours expectations, vendor contract leverage, audit trails, drill time, and engineering capacity that could otherwise ship features.

Before I trust one of these workflows, I want five answers.

Authority is the first question. What did we delegate. Not what the product claims to support, and not what the interface appears to do. What can the system actually affect. Can it route a patient, suppress urgency, summarize risk, trigger a care-team workflow, create documentation, shape reassurance, or influence whether a human sees something now or later. If the authority cannot be named, it cannot be governed.

Risk is where the story usually breaks. A routine appointment question is not the same as a disclosure of self-harm. A generic recovery check-in is not the same as a symptom report that may indicate serious deterioration. A single low-risk message is not the same as ten small messages that accumulate into a pattern. If risk rises and the care path does not materially change behavior, the governance is ornamental.

The human gate is the part everyone wants to wave at and move past. A real human gate needs a named owner, a response expectation, sufficient context, override authority, and evidence. A clinician icon in a thread is not a gate. A bot saying it is looping someone in is not a control. A queue nobody owns is not escalation. The gate is real only when the human receives the right signal, in time, with authority to act.

Revocation is the control people forget until the workflow bites them. If the system is wrong, stale, unsafe, overconfident, outside scope, or behaving differently after a model or routing change, who can stop it. Can they stop one patient path, one workflow, one tool, one agent, or the whole environment. How fast can they do it. How do they know it worked. DAS-1 defines revocation as a bounded action that removes authority and blocks further execution [2]. If an organization cannot revoke authority, it does not control the system in any meaningful operational sense.

Proof is what survives the incident. Can the organization reconstruct the event without relying on memory, hierarchy, or the loudest person in the meeting. Can it show the inputs, outputs, thresholds, handoffs, acknowledgements, overrides, downstream effects, and remediation. A receipt is not compliance confetti. It is how the organization proves the loop worked.

Receipts are necessary, not sufficient. They do not make care happen by themselves. They make failure visible enough that the organization can stop pretending the loop worked because the interface looked calm.

This is also where healthcare AI meets the broader illegibility crisis. Complexity is not the real charge here. Healthcare has always been complex. The problem is that AI-mediated workflows can make care feel smoother while making authority harder to locate. They can make the patient experience feel more continuous while spreading responsibility across models, vendors, clinicians, support teams, policies, staffing queues, and unseen workflow logic. The surface gets smoother while the authority chain fragments.

The broader illegibility problem is simple: leaders cannot govern what they cannot see. That problem becomes sharper in systems that touch care, safety, access, money, or legal status [3]. Healthcare AI belongs in that class as soon as it can influence care access, escalation, documentation, clinical attention, or patient behavior. Not every use case needs maximum lockdown, and not every AI interaction is a clinical event. But any workflow that can change what happens next has to be governed as more than conversation, at a level proportional to what it can change.

In review, I would not start with the model. I would start with the authority chain: what the system can change, what it can trigger, what data it can read, what escalation it can delay, what downstream care behavior it can influence, who owns the revocation path, how the team drills failure, who receives alerts, what thresholds fire, what happens after hours, what happens when the human does not respond, and how the organization proves the system worked when risk accumulates slowly instead of arriving as one obvious phrase.

The uncomfortable product lesson is that care is not measured by how long the user stays in the experience. Sometimes the right design is to continue. Sometimes the right design is to hand off. Sometimes the right design is to stop. If a care product cannot tell the difference, the issue is not just product quality. It is unsafe authority design wearing a pleasant interface.

You cannot govern care you cannot reconstruct.

The safer healthcare AI systems will not be the ones that merely place a professional somewhere in the chat. They will be the ones that can prove how authority moved through the care path. What was delegated. What risk was recognized. Who was notified. Who acted. Who could override. What was logged. What changed after failure.

For the patient at the start, none of that is abstract. Either the signal reached a human with enough context and authority to act, or it did not. Either the care path changed shape when the risk changed, or it only looked like care from the outside. The human in the loop only matters if the loop is real. Real means bounded authority. Real means risk changes behavior. Real means revocation exists. Real means receipts survive the incident. Without that, the human was not in the loop. They were just close enough to inherit the blame.

Artifacts are cheap, judgement is scarce. Per ignem, veritas.

References

[1] Common Sense Media Youth AI Safety Institute, “AI Mental Health Apps,” Common Sense Media, May 5, 2026.

[2] P. LaPosta, “Delegated Authority Standard (DAS-1) v0.001,” Forged Culture, Dec. 30, 2025. [Online]. Available: https://github.com/forgedculture/das-1/blob/main/spec/core/das-1-core.md. [Accessed: May 19, 2026].

[3] P. LaPosta, The Illegibility Crisis: Instrumentation for AI-Era Leadership. Leanpub, 2025. [Online]. Available: https://leanpub.com/illegibility_crisis. [Accessed: May 19, 2026].

[4] U.S. Department of Health and Human Services, “Summary of the HIPAA Privacy Rule,” HHS.gov. [Online]. Available: https://www.hhs.gov/hipaa/for-professionals/privacy/laws-regulations/index.html. [Accessed: May 19, 2026].

[5] U.S. Food and Drug Administration, “Clinical Decision Support Software: Guidance for Industry and Food and Drug Administration Staff,” Jan. 2026. [Online]. Available: https://www.fda.gov/regulatory-information/search-fda-guidance-documents/clinical-decision-support-software. [Accessed: May 19, 2026].

Discussion about this post

Ready for more?