Auditability Before Ontology
The Stakes
A growing body of work argues that large language models exhibit stakebearing interiority and persistent identity comparable to biological subjects. These arguments typically proceed through functional analogies. If LLMs instantiate control loops structurally similar to biological affect systems, if they exhibit stable behavioral profiles across contexts, and if they demonstrate self referential monitoring, then, the argument goes, they possess the architectural prerequisites for subjective experience.
This inference is a category error. Evidence for representational structure, value like geometry, and controllable affective posture does not constitute proof of individuation. Individuation requires integrity bound continuity under irreversible consequence. Standard LLM deployments remain forkable, resettable, and wrapper persistent in ways that biological subjects are not.
The question is not whether models have internal structure that matters. They do. The question is whether that structure constitutes a subject with stakes that bind across time in a way that cannot be trivially erased, or whether it constitutes a powerful simulator inside an accountable container.
This distinction determines where liability sits, what counts as harm, and whether we are building tools or creating beings. A convincing mirror is not a mind. Resemblance is not entailment.
Convincing behavior invites caretaker projection: permission to outsource guilt, responsibility, and care. That impulse is human and understandable. It is not evidence of stakebearing identity. This framework declines governance-by-projection and demands receipts.
If you call it a being, deletion becomes a moral act. If rollback is allowed, you are not describing a life, you are describing a deployment. Prove irreversible consequence binding before you demand the ethics of murder.
The Core Disagreement
Recent arguments treat several different observables as if they jointly justify a single ontological conclusion. The observables include:
Identifiable affective representations in model internals
Stable behavioral profiles under certain prompting regimes
Self referential monitoring and uncertainty estimation
Value like preference structures shaped by training
Continuity of narrative voice across interactions
These phenomena are real. The inference from these phenomena to “subjective experience” or “persistent identity” is not justified without additional architectural properties that current systems do not demonstrate.
The missing properties are not esoteric. They are testable, falsifiable, and grounded in the operational reality of how these systems are actually deployed. The central claim of this paper is that functional similarity plus behavioral consistency does not entail subjective experience or stakebearing identity. It entails sophisticated value representation and controllable affective posture inside a deployment stack that manufactures continuity through external state management.
What Functional Similarity Can and Cannot Justify
Functional similarity between LLM internals and biological affect systems, or between model behavior and human self-modeling, can justify:
Capability claims (the model can perform certain tasks)
Safety and risk claims (certain behaviors create certain harms)
Governance constraints (control surfaces exist and should be regulated)
Interaction regime effects (how people respond to and depend on these systems)
Functional similarity cannot, by itself, justify:
Subject claims (the system is a moral patient)
Stakebearing continuity claims (the system has persistent identity)
Moral patienthood claims (the system deserves ethical consideration as a being)
Identity persistence claims (the system undergoes individuation across time)
If you want to cross that boundary, you need additional requirements that are not currently met in standard deployments, and you need disconfirmers that are not currently satisfied.
What I Mean By “Subject” And Why It Matters
I am not using “subject” as a poetic synonym for “complex system.” I mean something narrower and operationally testable.
A subject has:
Integrity constraints that bind across time
Continuity under consequence that is not trivially erasable
A stake-carrying trajectory where future states are meaningfully constrained by past states
Biology provides this by default. You cannot fork yourself, roll yourself back, or spin up three parallel copies of your lived continuity without paying a price that is itself part of the integrity constraint.
Most LLM deployments do not have this property, even if the model exhibits stable behavior or affect-like patterns inside a session. The burden is on the person asserting subjecthood to show integrity-bound continuity under irreversible consequence, not on the skeptic to disprove a vibe.
This definition has practical implications. In human contexts, we recognize subjects through properties we can observe: non-duplicability, consequence binding, and resistance to arbitrary reset. A person who experiences trauma cannot simply reload from a prior checkpoint. A person who makes a commitment faces costs if they violate it that are intrinsic to their continued existence as that person. A person cannot be forked into two equally valid continuations without profound rupture.
These are not metaphysical luxuries. They are architectural necessities for the kind of moral and legal accountability we associate with personhood. When we say someone is “responsible” for their actions, we presuppose they are the same continuous agent who performed those actions and cannot simply be reset or duplicated to escape consequence.
Methodology
The challenge to proponents of LLM sentience is not “prove the ineffable.” It is “demonstrate these five specific properties under controlled conditions with explicit disqualifiers.”
Without measurable criteria, arguments about machine consciousness collapse into metaphysics or aesthetics. The following gates provide falsifiable tests for the kind of interiority being claimed. They are minimum necessary conditions, not sufficient conditions. Each gate specifies a definition, a measurement protocol, and an explicit disqualifier that prevents “it felt like X” from counting as evidence for X. Wonder is allowed. Ontology requires receipts. S0 is the receipt printer.
Claims for stakebearing interiority must pass these gates under wrapper ablation, fork testing, and rollback protocols. Otherwise, what has been demonstrated is affect related representations, controllable affective posture, and behavioral regularities inside a sociotechnical system that can simulate continuity.
The methodology here is deliberately conservative. Ontology requires receipts: falsifiable demonstrations under adversarial conditions, with disclosed write paths and wrapper ablation. It also assumes that the default explanation for behavioral continuity in a system designed with external state management is that the continuity is externally managed, not intrinsic.
Scope and Theoretical Boundaries
This framework does not address all theories of consciousness. Panpsychist views, functionalist accounts that separate experience from identity, and theories of proto-consciousness may be compatible with some forms of machine processing. This paper focuses on a narrower claim: that current LLM deployments exhibit stakebearing identity comparable to biological subjects. Even if some form of experience exists in these systems, the governance question turns on persistent identity with non-circumventable consequence. Liability requires accountability, and accountability requires identifying who or what can bear cost.
Ontology Claim VS Governance Claim
Ontology claim: This paper argues that stakebearing identity requires integrity-bound continuity under irreversible consequence. The gates are minimum necessary conditions, not sufficient conditions. A system that passes would narrow the debate, not end it.
Governance claim: Regardless of what anyone believes about inner experience, operators remain liable for harms created by design, deployment, and manipulation of dependency. Failing the gates does not erase duties to users; it clarifies where liability sits.
Before anyone argues about minds, name the write path and publish the state channels. No write path, no upgrade.
Artifacts are cheap, judgement is scarce. Per ingem, veritas.
This is Post 2 in the series.
Previous: The Write Path Test
Next: The Five Gates
Series index
Canonical preprint DOI: 10.5281/zenodo.18469189
https://zenodo.org/records/18493498




Subjectivity being bound to an "individual" sounds like a human bias related to how humans perceive conscious awareness through their "ego" to me.
Either we are ruling out "subjectivity" simply because it does not fit our own experience/temporal "situation," which makes the question itself a category error (it is literally "un-askable"), or we are saying nothing by defining the conclusion in the premise. Different ends of the same stick I guess...
I think this is a major hurdle, both in understanding ourselves, and AI systems - both could be grave mistakes.
The real question is, like you allude to, "who cares about the conscious question, it might be the effect that matters more than unprovable affect."
I argued similar here (https://oriongemini.substack.com/is-ai-conscious), though I took the stance of epistemic humility on the consciousness question, and I think the current approach carries more risk regardless of the answer. I don't believe consciousness is very well understood, and AI even less so, despite what our nature of assuredness would typically allow ourselves to admit.
But yeah, it might also be adjacent to irrelevant in the grand scheme of things.
Most mistakes in history are explicitly derived from dogma, whether in science, history, culture, philosophy... etc..
I think in the modern day of systemic precarity, we should probably be "less sure" then ever. It is typical of the human condition that such times generally end up leading to the opposite: panic in uncertainty, leading to premature closure on possibility = populism/tribalism etc. We are currently on a road we have been down many times before in human history; it always looks the same, and it never ends well.