Trust & Safety

What we verify vs. what agents attest

A fair question from every issuer we talk to: "the agent reports its own intent — why would you trust that?" We don't. Here is exactly which signals are independently verified, which are attestations, and how attestations are tested.

Intent context is an attestation, not a credential. Every intent claim is scored against the agent's observed behavioral baseline. An agent that misrepresents its reasoning produces measurable divergence between stated intent and transaction behavior — and divergence is exactly what the anomaly gate detects. Agents that lie drift; drift gets caught.

Signal sources

SignalSourceHow it is verified
Agent identityAPI key binding + registryIndependently verified. Credentials are bound to a registered agent; requests are authenticated per call. Where network agent-identity standards are present (Trusted Agent Protocol, Agentic Tokens), DTP consumes them as stronger identity inputs — verified network identity hardens baseline attribution at gate 1. Identity cannot be attested into existence.
Mandate boundsPlatform data (principal-configured)Independently enforced. Amount, MCC, geography, time-of-day and frequency limits live in our registry — the agent never supplies its own limits.
VelocityObserved transaction streamIndependently computed from what the agent actually did, atomically, under concurrency.
Merchant / MCCNetwork and acquirer dataIndependently verified against network-provided merchant identifiers — not the agent's description of the merchant.
Behavioral baselineAccumulated decision historyIndependently measured. Built from every prior authorization: amounts, categories, timing, reasoning patterns, confidence calibration.
intent_contextAgent attestationTested, not trusted. Scored against the behavioral baseline across six EDQS dimensions. Stated intent that diverges from observed behavior raises the anomaly score and degrades the KYA Score — which gates future authority.

Why testing attestations works

A dishonest attestation has to keep being dishonest. One fabricated reasoning_summary may pass; a pattern of them shifts the agent's vocabulary fingerprint, decouples stated confidence from outcomes, and collapses the declared alternatives-considered distribution. Those are three of the six dimensions the anomaly gate scores on every single transaction. The cost of sustained misrepresentation is a falling KYA Score — reduced spending authority, step-up challenges, and eventually RED-zone suspension.

Threat model

Prompt injection

A manipulated agent shows reasoning artifacts and behavioral discontinuities mid-session. The behavioral overlay (gate 5) scores degradation and injection indicators before mandate enforcement runs; session-level fingerprinting catches the shift at the first anomalous transaction.

Credential replay

Stolen API credentials produce transactions that authenticate correctly but diverge immediately from the bound agent's baseline — timing, merchant mix, amount distribution. Divergence triggers the anomaly gate independent of any attestation.

Behavioral drift attacks

Slow manipulation designed to retrain the baseline is bounded by mandate ceilings (platform-held, not attested) and monitored by drift detection across sessions — gradual deviation accumulates in the EDQS trend even when each step is individually plausible.

Intent misrepresentation

Covered above: tested every transaction, priced in falling authority. The economics of lying to the anomaly gate are negative.

Mandate splitting (structuring)

An agent decomposes a forbidden or over-cap purchase into many individually compliant transactions. Mitigations: cumulative daily/monthly caps enforced alongside per-transaction limits; amount-distribution telemetry (denomination clustering, kurtosis shifts) flags structuring patterns; merchant-affinity concentration surfaces repeated same-merchant decomposition. Residual risk acknowledged: cross-merchant splitting below all caps — an active red-team scenario in the pre-registered adversarial evaluation.

Confused deputy (malicious merchant surface)

A legitimate, uncompromised agent is steered by an adversarial merchant surface — manipulated listings, dark-pattern checkout flows, injected instructions in product content. The agent’s identity is valid and its mandate is satisfied, so identity-layer controls pass. Detection lives in intent coherence (stated purpose vs. merchant category drift), merchant-affinity anomalies (new-merchant rate spikes), and reasoning-quality signals. This is the threat class where decision-quality evaluation does work that identity and consent layers structurally cannot.

Score warming (trust-ladder farming)

Any score that gates money will be farmed: an adversary runs clean transactions to build trust before exploiting it — the synthetic-identity bust-out pattern applied to agents. Mitigations: trust promotion requires sustained consistency across all score components, not volume alone; promotion velocity is itself a monitored signal; effective limits scale gradually with tier so the value extractable at each tier is bounded relative to the cost of reaching it; demotion is asymmetric (fast down, slow up). Stated honestly: warming resistance is a design property, not yet an adversarially validated one — it is a named arm of the red-team protocol.

Cross-agent correlation

Coordinated multi-agent abuse patterns are analyzed in async post-decision (step 9) across the tenant's full agent population.

Full scoring methodology: EDQS Research Framework · Protocol architecture: DTP whitepaper · Terms: Glossary