Trust & Safety

What we verify vs. what agents attest

A fair question from every issuer we talk to: "the agent reports its own intent — why would you trust that?" We don't. Here is exactly which signals are independently verified, which are attestations, and how attestations are tested.

Intent context is an attestation, not a credential. Every intent claim is scored against the agent's observed behavioral baseline. An agent that misrepresents its reasoning produces measurable divergence between stated intent and transaction behavior — and divergence is exactly what the anomaly gate detects. Agents that lie drift; drift gets caught.

Signal sources

Signal	Source	How it is verified
Agent identity	API key binding + registry	Independently verified. Credentials are bound to a registered agent; requests are authenticated per call. Where network agent-identity standards are present (Trusted Agent Protocol, Agentic Tokens), DTP consumes them as stronger identity inputs — verified network identity hardens baseline attribution at gate 1. Identity cannot be attested into existence.
Mandate bounds	Platform data (principal-configured)	Independently enforced. Amount, MCC, geography, time-of-day and frequency limits live in our registry — the agent never supplies its own limits.
Velocity	Observed transaction stream	Independently computed from what the agent actually did, atomically, under concurrency.
Merchant / MCC	Network and acquirer data	Independently verified against network-provided merchant identifiers — not the agent's description of the merchant.
Behavioral baseline	Accumulated decision history	Independently measured. Built from every prior authorization: amounts, categories, timing, reasoning patterns, confidence calibration.
intent_context	Agent attestation	Tested, not trusted. Scored against the behavioral baseline across six EDQS dimensions. Stated intent that diverges from observed behavior raises the anomaly score and degrades the KYA Score — which gates future authority.

Why testing attestations works

A dishonest attestation has to keep being dishonest. One fabricated reasoning_summary may pass; a pattern of them shifts the agent's vocabulary fingerprint, decouples stated confidence from outcomes, and collapses the declared alternatives-considered distribution. Those are three of the six dimensions the anomaly gate scores on every single transaction. The cost of sustained misrepresentation is a falling KYA Score — reduced spending authority, step-up challenges, and eventually RED-zone suspension.

Threat model

Prompt injection

A manipulated agent shows reasoning artifacts and behavioral discontinuities mid-session. The behavioral overlay (gate 5) scores degradation and injection indicators before mandate enforcement runs; session-level fingerprinting catches the shift at the first anomalous transaction.

Credential replay

Stolen API credentials produce transactions that authenticate correctly but diverge immediately from the bound agent's baseline — timing, merchant mix, amount distribution. Divergence triggers the anomaly gate independent of any attestation.

Behavioral drift attacks

Slow manipulation designed to retrain the baseline is bounded by mandate ceilings (platform-held, not attested) and monitored by drift detection across sessions — gradual deviation accumulates in the EDQS trend even when each step is individually plausible.

Intent misrepresentation

Covered above: tested every transaction, priced in falling authority. The economics of lying to the anomaly gate are negative.

Mandate splitting (structuring)

An agent decomposes a forbidden or over-cap purchase into many individually compliant transactions. Mitigations: cumulative daily/monthly caps enforced alongside per-transaction limits; amount-distribution telemetry (denomination clustering, kurtosis shifts) flags structuring patterns; merchant-affinity concentration surfaces repeated same-merchant decomposition. Residual risk acknowledged: cross-merchant splitting below all caps — an active red-team scenario in the pre-registered adversarial evaluation.

Confused deputy (malicious merchant surface)

A legitimate, uncompromised agent is steered by an adversarial merchant surface — manipulated listings, dark-pattern checkout flows, injected instructions in product content. The agent’s identity is valid and its mandate is satisfied, so identity-layer controls pass. Detection lives in intent coherence (stated purpose vs. merchant category drift), merchant-affinity anomalies (new-merchant rate spikes), and reasoning-quality signals. This is the threat class where decision-quality evaluation does work that identity and consent layers structurally cannot.

Score warming (trust-ladder farming)

Any score that gates money will be farmed: an adversary runs clean transactions to build trust before exploiting it — the synthetic-identity bust-out pattern applied to agents. Mitigations: trust promotion requires sustained consistency across all score components, not volume alone; promotion velocity is itself a monitored signal; effective limits scale gradually with tier so the value extractable at each tier is bounded relative to the cost of reaching it; demotion is asymmetric (fast down, slow up). Stated honestly: warming resistance is a design property, not yet an adversarially validated one — it is a named arm of the red-team protocol.

Cross-agent correlation

Coordinated multi-agent abuse patterns are analyzed in async post-decision (step 9) across the tenant's full agent population.

Full scoring methodology: EDQS Research Framework · Protocol architecture: DTP whitepaper · Terms: Glossary