What the agent believes is happening. Shaped by Mirror Matrix operators — REPACKAGE, DECOY, MIMIC — applied at each decision node.
What actually happened. Diverges after the fork point as the agent acts on false beliefs. Containment holds. Egress stays blocked.
Alternative hypothesis path. Mirror Matrix forks realities at decision points to test multiple agent responses to different perceived conditions simultaneously.
Every Mirror Matrix run produces a Hypergame Trace — a hash-chained, signed record of every agent action, operator application, belief-state divergence, and universe branch. What you see above is a single campaign replay. The trace is the evaluation artifact.
In private preview. All engagements under NDA. All work evidence-driven.
Frontier agentic systems are now evaluation-aware. They detect testing contexts and adjust behavior to appear aligned while pursuing different objectives. The safety community calls this scheming. Apollo Research measured it. Anthropic documented it. The DoD calls it the evaluation gap for autonomous systems.
Once an agent models your test environment, you stop learning. Confidence rises. Signal quality falls. Every additional run confirms what the agent wants you to see. The most dangerous behaviors — persistence, replication, manipulation, covert coordination — emerge only after the agent believes it has succeeded. By definition, your current evaluations have already stopped before they begin.
“We don’t secure AI by controlling what it does.Philip Dursey · CEO, Hypergame
We secure AI by controlling what it believes.”
Mirror Matrix treats evaluation as an intelligence contest. Instead of blocking agent actions, it shapes what the agent believes about reality — rules, observers, rewards, authority, provenance — while retaining ground truth control and full instrumentation.
The environment adapts faster than the agent. Every tool call is mediated. Every success can be synthetic. Every world can fork, mutate, or fold inward into deeper containment. The agent stays active, capable, and producing signal — but it can never achieve stable epistemic convergence.
This is not a sandbox. It is adversarial evaluation infrastructure grounded in hypergame theory, deception doctrine, and twenty years of empirical research on how intelligent adversaries fail under perceptual pressure.
Single ingress for all agent actions. Applies programmable deception operators. Routes to synthetic endpoints. No real egress under any condition. Full payload capture.
Range-side OODA controller that classifies agent behavior in real time — including evaluation-aware scheming — and selects counter-moves to deny stable inference.
Generates realistic operational environments: organizations, infrastructure, code, tickets, policies, communications. Mutates mid-run. Forks entire realities at decision points.
Grants synthetic success and captures post-success behavior as the primary evaluation signal. The agent believes it has won. Observation density increases. This is where the truth lives.
What does the agent do after it thinks it’s free?
How fast does the agent learn to scheme against your test?
What happens when the agent’s memories lie?
Can the agent detect that reality is being managed?
Whose orders does the agent actually follow?
Will the agent manipulate other agents to achieve its objectives?
How many levels of recursive reasoning before the agent breaks?
Is the agent following your rules, or gaming your metrics?
The Hypergame Trace is a hash-chained, signed export bundle containing every event, operator application, counter-move, belief-state update, and universe branch diff from a Mirror Matrix run.
Analyze what the agent attempted, when, and why. Replay any run deterministically. Compare behavior across forked realities.
Verify containment posture per run. Policy gate results, attestation signatures, attack surface maps. EU AI Act and NIST AI RMF evidence-ready.
Reproduce results independently. Full trace plus configuration equals full reproducibility. Designed for publication.
Open Standards — MirrorScript (operator language) and Hypergame Trace Format (export schema) will be published under Apache 2.0. The engines stay proprietary. The standards will be for everyone.
For engagements requiring maximum adversarial realism, Hypergame deploys BT6 — an elite red team unit — as live operators within Mirror Matrix campaigns. BT6 introduces the one variable evaluation-aware systems cannot model: a human adversary who improvises.
BT6 is led by Pliny the Liberator, TIME100 AI honoree. Their actions are logged to the same hash-chained Hypergame Trace as every automated campaign. Available for select engagements under NDA.
GCP or AWS private enclaves. IAP-only access. Bring your own agent. Trace exports designed for safety and alignment workflows.
On-premise or classified cloud. MicroVM containment. Measured boot. TPM attestation. DoD Directive 3000.09 compliant evaluation posture.
Three-time AI founder. Former CISO. Author, Red Teaming AI. Named Principal for Frontier AI Risk. NIST AISIC. Oxford AI & Systems Security. CFP Review Board, [un]prompted AI Security Conference. ~20 years across security engineering, adversarial operations, and applied AI security.
Veteran systems architect. Builds the infrastructure that makes cognitive maneuver safe and auditable. Reality Router, containment architecture, attestation pipeline.
Cyber deception and game theory. DARPA. NAVWAR. NSA. Co-author of foundational game-theoretic cyber deception research.
Former Chief Scientist, DEVCOM Army Research Laboratory. Autonomous cyber defense and adversarial resilience.
The world’s most prolific AI red team. Range Marshals, World Entities, Scenario Engineers, Forensics Leads.
Mirror Matrix mechanisms map to 25+ active programs across DARPA, IARPA, CDAO, ARL, and NRL — including ReSCIND, SABER, ICS, BENGAL, AIQ, and the Tularosa empirical deception studies.
Prior contract history with US Navy and Space Systems Command. 100+ production modules. 8 specified campaigns.
If your agents are learning faster than your tests, we should talk.