AI in banking

Why AI audit trails break at the architecture, not the policy

27 May 2026
9
mins read

Banks keep adding logging layers and governance policies to fix AI audit trails, but the break happens before any of that matters. When AI agents run across fragmented systems with no unified customer context and no shared authority model, the audit gaps are a structural output of the architecture itself. Backbase's argument is that auditability has to be built into the execution layer, not bolted on as a reporting layer afterward.

Why audit trails break before the AI even runs

Most banks treat incomplete audit trails as a documentation problem. They add logging layers, tighten model governance policies, and assign accountability owners. None of that fixes the actual failure point. The trail breaks earlier - at the substrate level - before the AI agent takes a single action.

When AI agents run across fragmented systems, they have no unified view of the customer. Each system holds a partial picture. Rules differ across platforms. Write-backs go to separate data stores with no shared record of what happened or why. The result, as Backbase's Banking OS value proposition puts it directly, is that agents "operate on partial data, follow inconsistent rules, and write back to different systems - the result is not automation, it is chaos at higher speed." That chaos produces audit gaps as a structural output, not a compliance oversight.

This is the root cause that governance frameworks cannot reach. A policy that says "all AI decisions must be explainable" only works if the execution layer can produce a coherent, end-to-end record of what the agent knew, what it was authorized to do, and where it acted. Fragmented foundations cannot produce that record. The architecture makes it impossible - regardless of how well-written the policy is.

The guard function is the actual prerequisite for AI at scale

Most banks treat compliance as a condition they manage around AI deployment. Jouk Pleiter argues it's the condition that determines whether deployment happens at all. His warning is direct: "If you don't solve the guard function, I don't see AI at scale in banks at all. I basically see the risk and compliance argument paralyzing innovation." That's not a caution about moving too fast. It's a structural claim about viability. This insight comes directly from the agentic banking podcast where Pleiter laid out the full argument.

When risk and compliance teams can't verify what an AI agent did, why it did it, and under what authority, they block the rollout. That's a rational response. The problem isn't their caution - it's that most AI deployments give them nothing solid to evaluate. Without a governed record of every decision an agent took, audit readiness is a document exercise, not a real capability. Regulators don't accept that. Neither should boards.

This reframes the stakes entirely. Solving the guard function isn't about satisfying a checklist before launch. It's about whether agentic AI in banking is operationally sustainable at all. Banks that skip this step don't move faster - they accumulate liability until someone stops them. The guard function has to be solved first, not retrofitted later.

Fragmented foundations produce chaos at higher speed, not automation

Most banks deploying AI agents today are running them across a patchwork of systems that were never designed to work together. Each agent pulls from partial data, follows rules that differ by system, and writes results back to wherever it happens to have access. That is not automation. It is chaos at higher speed. The volume of decisions increases while the coherence of those decisions decreases.

The audit problem that follows is structural, not behavioral. When no single system holds the full context of a customer interaction, no audit trail can be genuinely end-to-end. Regulators may ask for a complete record of how an AI agent reached a decision. What they get instead is fragments scattered across CRM systems, origination platforms, and middleware layers that log nothing useful - no single system has the full picture. A governance framework sitting above this substrate cannot repair it. It can only describe the disorder more precisely. McKinsey's 2023 survey of financial services executives ranked data fragmentation as the single most cited barrier to AI scale - ahead of talent and regulation.

The fix requires precision at the execution layer itself. Banks need to define what each agent is permitted to do and where that permission stops. That definition needs to be enforced at runtime, not documented after the fact. An execution substrate that runs every banker, agent, and customer interaction through the same substrate against the same data is the only way to make those boundaries hold. Without that, compliance is a reporting exercise built on incomplete inputs. The records will always have gaps, because the system producing them was never designed to close them.

Compliance as an architectural property, not a reporting layer

Most banks treat compliance as something you add after an AI system runs. Logs get written. Reports get generated. Audit packets get assembled. That model worked when humans made decisions and systems recorded them. It doesn't work when AI agents act autonomously across fragmented platforms where no single system holds the full decision context. The audit trail isn't incomplete because the AI is opaque. It's incomplete because the operational substrate was never built to produce coherent end-to-end records in the first place. Gartner's analysis of AI governance identifies this execution-layer weakness as the primary compliance risk for enterprise AI deployments.

The structural fix isn't better logging. It's runtime enforcement. Banks need to authorize what each agent is entitled to do and where that authority stops. That authorization must live in the same execution layer where work happens. When governed authority is defined and enforced at runtime, the decision record isn't reconstructed after the fact. It's produced as a native output of how the action ran.

That's the design principle behind Decision Tokens in the Banking OS runtime. Every decision the system executes carries a token that captures the action, the authority behind it, and the limits it operated within. Auditability isn't a compliance add-on sitting above the execution layer. It's a property of the execution layer itself. Regulators don't receive a report about what happened. They get a record that traveled with the decision from the moment it was made. For a deeper look at AI compliance and banking regulation, the implications extend well beyond audit trails into model risk management frameworks.

How a unified control plane makes coherent audit records structurally possible

Most audit record problems aren't explainability problems. They're architecture problems. When customers, employees, and AI agents each operate through separate systems, no single layer holds the full picture of what happened, who authorized it, or what context drove the decision. Those system boundaries are where audit coherence breaks down. BCG's work on responsible AI treats this as a data architecture challenge before it is ever a governance challenge.

Banking OS addresses this at the structural level. It sits above systems of record as a Control Plane that coordinates all three frontline actors - customers, employees, and AI agents - under one operating model. That model carries governed authority and defined delegation limits. Every action runs through the same substrate, against the same unified customer context. That's the condition that makes a coherent, end-to-end audit record achievable in the first place. Banks looking to understand the full scope of this approach will find the Banking OS explained overview a useful starting point.

The mechanism is Decision Tokens. Each token travels with the action it represents, capturing the authority behind it, the context that triggered it, and the boundary it operated within. Compliance isn't assembled after the fact from scattered logs. It's embedded in how work runs. Regulators asking what happened and why get a record that was produced by the execution layer itself - not reconstructed from system fragments after the audit request arrives.

Decision Tokens and what auditable AI looks like in practice

Every decision the Banking OS runtime executes carries a Decision Token. That token travels with the action - recording what happened, who or what authorized it, and under which delegation limits. The audit record isn't assembled after the fact. It's produced as work runs. That distinction matters to regulators, because it means the record can't be incomplete in the ways fragmented systems routinely produce.

The structural reason this works is the Control Plane. Banking OS sits above systems of record and coordinates all three frontline actors - customers, employees, and AI agents - under a single operating model. Every actor operates within governed authority. That unified context is what makes a coherent, end-to-end decision record structurally possible in the first place. Without it, you're stitching logs together across systems that were never designed to talk to each other. The Sentinel capability extends this further, providing continuous monitoring across the agent layer.

For a regulator-facing audit, this changes the conversation. Continuous compliance monitoring and anomaly detection aren't outputs of a reporting layer assembled after the fact. They're outputs of how the work itself runs. When an AI agent acts on a customer signal, the token records it. When a human employee overrides that action, the token records that too. The full chain of authority is native to the runtime - not reconstructed from scattered system logs weeks later when an examiner asks for it. Banks exploring AI compliance for banking operations will find that runtime-native auditability is increasingly the standard regulators expect.

Banks that treat auditability as an architectural requirement - built into how every AI action is authorized, executed, and recorded - will be the ones that can scale agentic AI without the compliance argument grinding innovation to a halt.

Frequently asked questions

Why do AI audit trails in banking so often end up incomplete even when governance policies are in place?

The problem is architectural, not documentary. AI agents deployed across fragmented systems pull from partial data, follow inconsistent rules, and write results to separate stores. No governance policy can produce a coherent end-to-end record from a substrate never designed to generate one. The gaps are structural outputs, not compliance oversights.

What is the difference between a compliance layer assembled after the fact and auditability built into the execution layer?

A layer assembled after the fact reconstructs what happened by stitching together scattered logs. Auditability built into the execution layer produces the record as work runs. Decision Tokens in Banking OS travel with every action, capturing authority and context at the moment of execution, not weeks later when an examiner asks.

How does a banking AI system demonstrate explainability to regulators without exposing proprietary model logic?

Explainability for regulators does not require revealing model internals. It requires a verifiable record of what the agent was authorized to do, what customer context triggered the action, and what limits it operated within. Decision Tokens provide exactly that chain of authority without disclosing the underlying model architecture.

What does it mean for an AI agent to operate under governed authority and delegated limits in a banking context?

It means the agent's permissions are defined and enforced at runtime, not described in a policy document. The execution substrate specifies what actions the agent can take, under which conditions, and within what boundaries. Those constraints are active during execution, so every decision is traceable to a specific authorization rather than assumed compliance.

How should banks assess whether their current infrastructure can support continuous compliance monitoring for AI-driven decisions?

The test is straightforward: can the infrastructure produce a complete, end-to-end decision record natively, without reconstructing it from multiple system logs? If audit readiness depends on assembling fragments after the fact, the foundation is not adequate. Continuous compliance monitoring requires a single execution substrate where every agent action generates its own governed record.

About the author
Backbase
Backbase pioneered the Unified Frontline category for banks.

Backbase built the AI-native Banking OS - the operating system that turns fragmented banking operations into a Unified Frontline. Customers, employees, and AI agents work as one across digital channels, front-office, and operations.

Backbase was founded in 2003 by Jouk Pleiter and is headquartered in Amsterdam, with teams across North America, Europe, the Middle East, Asia-Pacific, Africa and Latin America. 120+ leading banks run on Backbase across Retail, SMB & Commercial, Private Banking, and Wealth Management.

Table of contents
Vietnam's AI moment is here
From digital access to the AI "factory"
The missing nervous system: data that can keep up with AI
CLV as the north star metric
Augmented, not automated: keeping humans in the loop