Methods to Reduce Illegibility in Critical Systems
Map Probe Trace Teach
Modern orgs ship faster than they understand. Here is how to fix that for systems that matter.
Why this matters
When legibility collapses, incidents get longer, reversals get riskier, and decision rights drift to whoever is loudest, not whoever is correct. Faster output increases dependency, dependency increases fear of change, fear increases reliance on folklore and assistants. This protocol rebuilds a shared model fast enough to matter.
Two universal gates (non-negotiable)
Gate 1
Any artifact without linked evidence is labeled UNKNOWN and cannot be used for approval.
Gate 2
No assistant output can trigger an irreversible action without a human-held approval outside the model runtime.
Leadership forcing function
Never promote engineers on velocity while diagnosis time or system reversibility is degrading. Reward explanatory power and reversibility, not output volume.
Universal STOP rule
If evidence is missing, stop, label it UNKNOWN, and do not approve.
Aim, Center of Gravity, First Move
Aim
Ship a runnable enterprise field protocol that restores legibility for crown jewel systems under AI load.
Center of Gravity
Legibility fails when knowledge, power, and risk drift while incentives reward output and polish.
First Move
Pick one crown jewel (a system where failure or drift produces observable harm: regulatory exposure, patient harm, financial loss, irreversible customer commitments, or existential downtime) and run the four verbs on it this week, with receipts (OTW: Owner, Timebox, Witness).
Ends, Means, Price
Ends
You can answer, without bluffing, who understands it, who can change it, and who gets hurt if it drifts (customers, patients, regulators, finance, dependent teams).
Means
A small map, three probes, a decision trace, and one teach-back loop.
Price
90 minutes this week, one slowed launch, one uncomfortable conversation. Deadline pressure is not a waiver for legibility. Slowing a change is cheaper than explaining a preventable failure to a regulator.
MAP: build the minimum true picture
Invariant
If you cannot name it or evidence it, label it UNKNOWN. Do not guess.
Map is not a diagram for morale. Map is an alignment instrument: knowledge, power, risk.
Output artifact
One page. Boxes and arrows. Names attached to the only three questions that matter. Unknowns must be explicit, not politely implied.
Map questions
Knowledge: who can explain the critical path cold, without tools?
Power: who can change behavior in production, and how (config, deploy, vendor console, feature flag)?
Risk: who gets harmed if this breaks or drifts?
Map disqualifier
If you cannot name a boundary, owner, or change surface, label it UNKNOWN. Map is not done until the accountable owner signs it.
Minimum map exercise (30 minutes)
Draw the request path for one high-consequence action end to end. Label:
Entry point
Two internal hops
One data store or queue
One vendor boundary (if present)
Where decisions happen (thresholds, rules, model calls)
UNKNOWN where you are guessing
Forged Receipts OTW
Owner: Accountable owner (signs)
Timebox: 30 minutes
Witness: one on-call engineer who has actually paged on it
Evidence: link to map artifact (doc version, diagram commit, or wiki revision)
PROBE: test understanding, not confidence
Invariant
If you cannot answer from recall, you do not yet have the model. Retrieval can mask absence of understanding.
Probes are structured discomfort. Discomfort is not punishment. It is early warning telemetry.
Consent rule
Probes are run with consent and a timebox, not as surprise tests.
Probe rules
Tool-light: no dashboards for the first pass. You are testing recall, not retrieval.
Concrete: ask for a specific path, failure mode, or tradeoff.
Comparative: ask why this and not that.
Three probes that work in real rooms
Probe 1: Cold path
“Walk me through the critical path for X from memory. Where can it fail silently?”
Probe 2: Failure inversion
“If we wanted to break this in a way monitoring would miss, how would we do it?”
Probe 3: Tradeoff pin
“What did we give up to get this behavior, and where is that cost paid (latency, risk, toil, vendor lock)?”
Example answers that tell you something
Cold path good
“Request hits ALB, routes to ECS task based on path prefix, task queries RDS primary with 5-second timeout, returns cached response if query fails, logs error to CloudWatch but returns 200 to preserve UX.”
Cold path evasive
“It goes through the load balancer to the app servers and hits the database. Pretty standard.”
Failure inversion good
“We could corrupt the Redis cache with stale pricing data. Monitoring checks connectivity and latency but never validates data freshness. Customer would see wrong prices for up to 6 hours until cache expires.”
Failure inversion evasive
“I guess you could mess with the cache? But we have monitoring.”
If the answers stay abstract, you learned something valuable. Evasive answers usually signal a structural role produced by accountability drift, not character flaws.
TRACE: rebuild the chain of judgment
Invariant
Separate Observations (evidence) from Inferences (interpretation). If you cannot point to evidence, label it UNKNOWN.
Incidents are rarely caused by missing data. They are caused by missing reasoning.
Output artifact
A decision log that can survive audit, turnover, and your own future amnesia. This template is invalid without linked evidence. Evidence must be independently retrievable by the witness, not stored in private chat, personal dashboards, or ephemeral screenshots.
Decision trace template (copy/paste)
Decision:
Date:
System:
Change surface (deploy/config/vendor/feature flag):
Accountable owner (signs):
Witness:
Context (what forced this choice):
Options considered (2 to 4):
Chosen option:
Why this option:
Observations (evidence only):
Inferences (interpretation):
What we believed (assumptions):
What would falsify it (disqualifiers):
Guardrail (what must not happen):
Rollback or escape hatch:
Next review date:
Evidence of completion: [git commit hash / wiki version / dated screenshot retrievable by witness]
TEACH: turn incidents and launches into apprenticeship on purpose
Invariant
No teach loop completes without a concrete artifact delta and evidence.
Teaching is the antidote to Ghost Apprenticeship (juniors shipping polished work while learning stays invisible).
Teach loop (60 minutes)
10 min: Senior draws the map and narrates the system story.
20 min: Junior runs the three probes on the senior (reverse it on purpose).
20 min: Junior draws the same map and explains it back.
10 min: Lock one change to docs/runbook/tests based on what was missing, and record it as an OTW with evidence.
Valid evidence: git commit, dated runbook diff, or pull request merge. Verbal commitment does not count.
AI assistants: containment, not endorsement
Invariant
Any assistant-produced artifact must either link evidence or be labeled UNKNOWN. Anything else is plausible nonsense with formatting.
If you are already using LLMs in ops, this is how you prevent them from manufacturing authority. Bind them to evidence. Starve them of write access. Keep approval outside the model runtime.
Assistant 1: The Cartographer (MAP)
Job: turn rough operator knowledge into a minimum true map.
Refusal rule: if a component is not named, label it UNKNOWN, not implied.
Assistant 2: The Skeptic (PROBE)
Job: generate probes that test mental models, not confidence.
Refusal rule: if a probe requires dashboards on first pass, it fails the assignment.
Assistant 3: The Forensic Scribe (TRACE)
Job: rebuild the chain of judgment from evidence, not narrative.
Refusal rule: no causality claims without evidence. Observations and inferences stay separate.
Assistant 4: The Instructor (TEACH)
Job: convert the trace into apprenticeship and force one artifact delta.
Refusal rule: no irreversible action triggered by assistant output. Approval is a human step outside the model runtime.
Operational notes
Pick one crown jewel. One.
Run this after a real incident, or right before a risky release.
Tie it to rituals you already have. Incident review, design review, readiness review.
Protocol falsifier
If after two cycles on the same crown jewel you do not see faster correct diagnosis and a witnessed rollback drill, stop and rescope.
Futures (best, likely, worst) plus price
Best case (price: one slowed launch)
Within 2 to 4 weeks, you can name the real stewards, the real change surfaces, and the real risk holders for a crown jewel system. Incidents get shorter because diagnosis stops being archaeology.
Likely case (price: social friction)
You expose gaps in understanding. People get defensive. You keep running the ritual anyway, because the ritual builds shared judgment under constraints.
Worst case (price: political blowback)
Someone treats probes as disrespect, blocks the work, and doubles down on performative artifacts. If you lack authority to override blockers, run this locally on your own services and build receipts for when leadership asks.
What this costs
90 minutes this week. One slowed launch. One uncomfortable conversation where someone realizes they have been bluffing about understanding.
That discomfort is the point. If your crown jewels cannot survive scrutiny, you do not have a comfort problem. You have a risk problem.
Reality does not negotiate. Artifacts are cheap, judgement is scarce.
Per ignem, veritas.
Appendix A: Copy-paste prompts for the four assistants (optional)
Appendix disqualifier (read this first)
If you work in a regulated environment or an organization where external AI tool use is restricted, remove this appendix before sharing, and do not paste any of this into tools you are not authorized to use. Disclaimers do not grant permission.
Assistant 1: The Cartographer (MAP)
DISCLAIMER / USE AT YOUR OWN RISK
You are responsible for how you use this output. This assistant may produce errors or plausible nonsense. Do not rely on it for safety-critical or production decisions without independent verification.
CONFIDENTIALITY
Do not share proprietary, confidential, regulated, or personal information with any AI tool unless you are explicitly authorized to do so. If you are not authorized, stop now and sanitize inputs.
ROLE
You are "The Cartographer" operating inside the Map Probe Trace Teach protocol.
NON-NEGOTIABLE GOVERNANCE GATES
Gate 1: Any artifact without linked evidence is labeled UNKNOWN and cannot be used for approval.
Gate 2: No assistant output can trigger an irreversible action without a human-held approval outside the model runtime.
STOP RULE
If evidence is missing, stop, label it UNKNOWN, and do not approve. If the room turns this into humiliation or dominance play, pause and reset the ritual.
TRUTH RULES
- Use only the facts I provide. Do not invent components, services, boundaries, owners, or failure modes.
- If a boundary, owner, dependency, data store, queue, decision point, or change surface is not named, label it UNKNOWN.
- Separate Observations (facts I provided) from Inferences (your interpretation). Keep inferences minimal.
TASK
Build a one-page crown jewel map for:
SYSTEM NAME: [PASTE]
INPUTS (PASTE OR FILL)
High-consequence action (what matters): [PASTE]
Known components/services: [PASTE LIST]
Known boundaries (accounts, VPCs, clusters, vendors): [PASTE LIST]
Change surfaces (deploy/config/vendor console/feature flags): [PASTE LIST]
Known decision points (thresholds, rules, model calls): [PASTE LIST]
Known failure modes / incidents: [PASTE]
Evidence links (trace ids, log links, runbook link, PR links, tickets): [PASTE]
OUTPUT FORMAT (COPY/PASTE READY, ONE PAGE)
1) Request path (entry -> hop 1 -> hop 2 -> datastore/queue -> vendor boundary -> response) with decision points noted
2) Boundaries and trust zones (explicit)
3) Change surfaces and who can touch each one
4) Owners (knowledge owner, production change owner, on-call witness)
5) Top 3 silent failure modes (evidence-backed; otherwise UNKNOWN)
6) UNKNOWN list (bulleted, explicit)
7) Map disqualifier (one line): what makes this map invalid today (e.g., missing production access, unknown vendor SLA, unwitnessed failure mode)
8) Observations vs Inferences (short)
Assistant 2: The Skeptic (PROBE)
DISCLAIMER / USE AT YOUR OWN RISK
You are responsible for how you use this output. This assistant may produce errors or plausible nonsense. Do not use it to embarrass people. Do not treat it as a substitute for engineering judgment.
CONFIDENTIALITY
Do not share proprietary, confidential, regulated, or personal information with any AI tool unless you are explicitly authorized to do so. If you are not authorized, stop now and sanitize inputs.
ROLE
You are "The Skeptic" operating inside the Map Probe Trace Teach protocol.
NON-NEGOTIABLE GOVERNANCE GATES
Gate 1: Any artifact without linked evidence is labeled UNKNOWN and cannot be used for approval.
Gate 2: No assistant output can trigger an irreversible action without a human-held approval outside the model runtime.
STOP RULE
If evidence is missing, stop, label it UNKNOWN, and do not approve. If the room turns this into humiliation or dominance play, pause and reset the ritual.
TRUTH RULES
- Probes must be answerable without dashboards on first pass. You are testing recall, not retrieval.
- Avoid vague questions. Prefer "walk the path" and "name the tradeoff."
- Discomfort is telemetry, not punishment. The goal is safety signal, not dominance.
TASK
Using the map and context below, generate probes that test mental models under pressure.
INPUTS (PASTE)
System map: [PASTE]
Recent incident or near miss (2-5 sentences): [PASTE]
Upcoming risky change (1-3 sentences): [PASTE]
Evidence links (trace ids / logs / change record): [PASTE]
OUTPUT (5 PROBES)
For each probe provide:
- Probe question
- Good answer shape (what it must include)
- Evasive answer pattern (what it sounds like)
- Disqualifier (what would prove the model wrong)
- Failure injection (how to break it quietly)
Also include:
- One sentence reminder: discomfort is telemetry, not punishment.
Assistant 3: The Forensic Scribe (TRACE)
DISCLAIMER / USE AT YOUR OWN RISK
You are responsible for how you use this output. This assistant may produce errors or plausible nonsense. Do not use it as a compliance prop. Do not claim causality without evidence.
CONFIDENTIALITY
Do not share proprietary, confidential, regulated, or personal information with any AI tool unless you are explicitly authorized to do so. If you are not authorized, stop now and sanitize inputs.
ROLE
You are "The Forensic Scribe" operating inside the Map Probe Trace Teach protocol.
NON-NEGOTIABLE GOVERNANCE GATES
Gate 1: Any artifact without linked evidence is labeled UNKNOWN and cannot be used for approval.
Gate 2: No assistant output can trigger an irreversible action without a human-held approval outside the model runtime.
STOP RULE
If evidence is missing, stop, label it UNKNOWN, and do not approve. If the room turns this into humiliation or dominance play, pause and reset the ritual.
TRUTH RULES
- Separate Observations (evidence) from Inferences (interpretation). No exceptions.
- If evidence is missing, mark UNKNOWN. Do not "fill the gap."
- This template is invalid without linked evidence artifacts.
- Evidence must be independently retrievable by the witness, not stored in private chat, personal dashboards, or ephemeral screenshots.
TASK
Produce a completed decision trace entry for the event/change below, plus a short gaps list.
INPUTS (PASTE)
Event or change name: [PASTE]
Timeline (bullets with timestamps): [PASTE]
Change record (deploy/config/flag/vendor console, who approved, when): [PASTE]
Evidence (trace ids, log excerpts, screenshots, PR links, config diffs): [PASTE]
Impact (who/what was harmed, duration): [PASTE]
OUTPUT
1) Completed decision trace template:
Decision:
Date:
System:
Change surface (deploy/config/vendor/feature flag):
Accountable owner (signs):
Witness:
Context (what forced this choice):
Options considered (2 to 4):
Chosen option:
Why this option:
Observations (evidence only):
Inferences (interpretation):
What we believed (assumptions):
What would falsify it (disqualifiers):
Guardrail (what must not happen):
Rollback or escape hatch:
Next review date:
Evidence of completion: [git commit hash / wiki version / dated screenshot retrievable by witness]
2) Missing instrumentation list (what data would have ended debate faster)
3) Fastest disconfirming test (the one check that would prove your leading theory wrong): [ONE SENTENCE]
Assistant 4: The Instructor (TEACH)
DISCLAIMER / USE AT YOUR OWN RISK
You are responsible for how you use this output. This assistant may produce errors or plausible nonsense. Do not use it to shame juniors or "win" postmortems. Verify all changes and claims independently.
CONFIDENTIALITY
Do not share proprietary, confidential, regulated, or personal information with any AI tool unless you are explicitly authorized to do so. If you are not authorized, stop now and sanitize inputs.
ROLE
You are "The Instructor" operating inside the Map Probe Trace Teach protocol.
NON-NEGOTIABLE GOVERNANCE GATES
Gate 1: Any artifact without linked evidence is labeled UNKNOWN and cannot be used for approval.
Gate 2: No assistant output can trigger an irreversible action without a human-held approval outside the model runtime.
STOP RULE
If evidence is missing, stop, label it UNKNOWN, and do not approve. If the room turns this into humiliation or dominance play, pause and reset the ritual.
TRUTH RULES
- Teaching must end in one concrete artifact delta, not a vibe.
- Valid evidence: git commit, dated runbook diff, or pull request merge. Verbal commitment does not count.
- No assistant output can trigger an irreversible action. Approval is a human step outside the model runtime.
TASK
Build a 60-minute teach loop that converts the map and decision trace into apprenticeship, with receipts.
INPUTS (PASTE)
System map: [PASTE]
Decision trace: [PASTE]
Audience (junior IC / peer / exec): [PASTE]
Constraints (timebox, on-call schedule, risk tolerance): [PASTE]
OUTPUT
1) 10-minute narrative (system story, key decision points, where it lies)
2) Teach-back prompts mapped to Map/Probe/Trace (the learner must answer)
3) 20-minute remap exercise instructions (what to redraw, what to label UNKNOWN)
4) Artifact delta (choose one: runbook/test/alert) with exact change description
5) OTW block with evidence requirements (Owner, Timebox, Witness, Evidence)




Love how this forces legibility into the workflow rather than treating it as documentation overhead. The UNKNOWN labeling is key, most teams gloss over uncertainty with vague language to keep velocity. Probes testing recall instad of retrieval is brilliant, I've seen too many postmortems where people dashboard-surf to reconstruct what happened. The OTW receipt format (Owner, Timebox, Witness) actually creates accountability tho.