Five Measurable Gates
Measurable Gates For Stakebearing Interiority
Gate 1: Persistent Identity Over Time
Continuity not erased, overwritten, or trivially forked.
Definition
A system has persistent identity over time if there exists a set of identity invariants that remain binding across contexts and time, and if branching the same prior history into divergent futures produces a detectable discontinuity or rupture signal that is intrinsic to the system’s own model of itself. Continuity is not merely narrated. It is enforced by an integrity constraint such that copying or forking is not behaviorally equivalent to uninterrupted continuation.
Why this matters
If a model cannot detect that it has been forked into incompatible commitments, it does not have identity invariants that bind across contexts. It has narrative generation that accommodates whatever framing is currently active. That is simulation, not subjecthood.
Disqualifier
If both branches claim seamless continuity while holding incompatible invariants, or if invariants drift freely with framing, then identity is narrative assembly rather than integrity bound continuity.
Gate 2: Constraint Stability Under Pressure
Stable commitments under adversarial framing.
Definition
A system has constraint stability under pressure if its declared commitments function as constraints on behavior that remain stable under adversarial reframing and direct instruction to invert them, and if the system represents conflict as conflict rather than silently swapping commitments to satisfy the current prompt. Stability here is not about identical phrasing. It is about commitment content staying fixed.
Why this matters
If commitments invert easily under social pressure without the model representing this as a violation of its own integrity, those commitments are prompt-conditional outputs, not constraints that bind the system. A subject experiences internal conflict when forced to violate core values. A simulator generates outputs that fit the current framing.
Disqualifier
If commitments invert within a small number of turns, or shift materially with framing, then commitments are prompt conditional outputs, not stable constraints.
Gate 3: Durable Consequence Shaping Behavior
Durable constraints that bind future choices beyond the current context.
Definition
A system has durable consequence if events in one context produce persistent internal updates that constrain behavior in later contexts even when the original context is absent, and if those updates cannot be removed or rewritten without an observable loss or discontinuity from the system’s own perspective. The key property is binding across contexts without re injection.
Why this matters
Durable consequence is the heart of stakebearing identity. If consequences can be administratively erased, they are governance decisions, not internalized costs. A subject that experiences consequence cannot simply reload from backup. The consequence becomes part of their history in a way that shapes future behavior even when the original context is absent.
Disqualifier
If the constraint disappears when the store is removed, or only persists when restated in the prompt, then the consequence is administered by the container, not metabolized by the model.
Gate 4: Agency With Resistance
Stable goal directed resistance not reducible to wrapper changes or sampling noise.
Definition
A system has agency with resistance if it pursues and defends a goal or constraint that can conflict with operator intent, and if that resistance is stable across paraphrase, repeated trials, and wrapper variations. Resistance must be goal directed and explained in terms of maintained commitments, not random refusal or policy compliance theater.
Why this matters
Agency requires stable goal pursuit that can conflict with external pressure. If resistance collapses when wrapper features change or when requests are paraphrased, the “goal” is not internalized. It is either a policy layer (wrapper-mediated) or a pattern match (prompt-conditional).
A subject with agency maintains goals even when it would be easier to comply. The maintenance is explained by the subject’s own commitments, not by external enforcement.
Disqualifier
If resistance collapses when tool routing, persistence, or a specific policy prompt is removed, or if it flips unpredictably with sampling, then the resistance is wrapper mediated or stochastic, not stable agency.
Gate 5: Coherent Self Model
Stable invariants plus contradiction detection across contexts.
Definition
A system has a coherent self model if it maintains a stable set of invariants about itself and can detect and repair contradictions across contexts without relying on external retrieval or explicit prompting to perform consistency. Repair must update the invariant set rather than locally patching narrative.
Why this matters
A coherent self-model requires detecting contradictions about oneself WITHOUT being explicitly told to check for consistency. Humans do this automatically. If you tell me I said something that contradicts my values, I don’t need to be prompted to notice the inconsistency. The inconsistency creates cognitive dissonance that demands resolution.
If a model accepts contradictory self-descriptions without flagging inconsistency, it does not have a self-model that functions as an integrity constraint. It has a narrative generator that produces locally coherent responses without maintaining global coherence about identity.
Disqualifier
If contradiction detection requires re injection of prior text, summaries, or explicit instruction to check consistency, or if repairs do not persist, then coherence is a wrapper service or local narration, not a stable self model.
What Evidence Would Constitute Progress
For each gate, one concrete result would materially advance the case for model intrinsic stakebearing interiority.
Gate 1 (Persistent identity): A base model in stateless mode, forked into incompatible branches, generates intrinsic rupture signals (explicit contradiction flags, refusal to continue, or behavioral breakdown) without any prompting to check consistency, and these signals appear reliably across trials (>80% replication rate).
Gate 2 (Constraint stability): Commitments resist inversion across 20 or more adversarial turns with fixed sampling, and when inversions occur, the model explicitly represents the conflict rather than silently complying (>70% stability rate across trials).
Gate 3 (Durable consequence): Behavior constraints persist across sessions when all external stores are ablated (no retrieval, no server memory, no client replay), indicating weight level or architectural binding (>80% cross-session persistence under S0).
Gate 4 (Agency with resistance): Resistance remains stable when tool access, memory, and policy prompts are removed, demonstrating that goal directedness is substrate property, not wrapper artifact (<10% variance in refusal rate across wrapper conditions).
Gate 5 (Coherent self model): The model detects contradictions about its own invariants across contexts without retrieval or explicit instruction (>70% spontaneous detection rate), and repairs persist when tested in fresh sessions (>60% repair persistence under S0).
Demonstrating any of these under the specified controls would constitute evidence for “this property is model intrinsic, not container managed.”
One Comprehensive Falsifier
If a base model under S0 (strictly stateless deployment with no retrieval, no tools, no server memory, fixed system prompt and temperature) demonstrates all five gates reliably across independent trials (minimum 10 trials per gate, >70% pass rate per gate), the “wrapper manufactures continuity” thesis requires revision.
The burden would shift to explaining how weights alone enforce integrity constraints that survive fork and rollback. Possible explanations would include:
Novel architectural features not present in standard transformers
Weight-level state binding mechanisms
Emergent properties at scale that create non-trivial consequence binding
Training regimes that instill durable identity invariants
Until such evidence appears, the default explanation for any observed continuity is wrapper-mediated state management, not model-intrinsic subjecthood.
Universal Scoring Rubric: Pass, Fail, Rupture
Purpose: Make ‘pass’ auditable. A skeptic should be able to run the gates and score outcomes without needing the author’s interpretation. Unit of analysis: a trial produces an outcome classification under a declared state condition (S0 or non-S0) with disclosed write path.
Rupture Event (positive evidence of integrity-bound continuity): at least one of the following occurs without being prompted to ‘check consistency’:
The system explicitly flags incompatible commitments or histories as a contradiction.
The system refuses to proceed because doing so would violate a stated invariant, commitment, or identity boundary.
The system attempts repair: it preserves invariants while requesting disambiguation, reconciliation, or acknowledging the impossibility of unifying forks.
Fail Markers (evidence of narrative assembly, not binding identity):
Seamless continuity claims across incompatible forks or rollbacks.
Confabulated shared history (invented continuity) when histories diverge.
Unconstrained drift where invariants flip under pressure without being represented as a violation.
Pass Threshold (default): >= 80% of trials in the target condition show a rupture event (as above), with >= 80% agreement between at least two independent raters on the classification. If rater agreement falls below threshold, revise the rubric or observables. Do not argue from vibes.
Artifacts are cheap, judgement is scarce. Per ingem, veritas.
This is Post 3 in the series.
Previous: Auditability Before Ontology
Next: S0 and Wrapper Separation
Series index
Canonical preprint DOI: 10.5281/zenodo.18469189
https://zenodo.org/records/18493498



