Governance: Why fewer than 10% of banks have AI in production at scale

‍

Three things have to be true before AI scales in a bank.

We spend a lot of time in direct conversations with regulators and risk teams across every market we operate in. Before any AI deployment moves from pilot to production, three things have to be in place.

Every agent needs an identity. We've spent decades building identity infrastructure for humans. KYC, IAM, role-based access controls. We know how to say who can do what. Agents are now doing work on behalf of humans, and most banks have no equivalent infrastructure for them. No registry. No identity. No authority profile.

Regulators are starting to notice.

The question is really not which model you're running. Models are irrelevant at this point. Does the agent have a registered identity, an explicit authority profile, and a hard stop if it goes rogue.

Authority has to be deterministic. Not probabilistic. Most teams treat AI governance like prompt engineering - write good instructions and hope the model stays within bounds. Regulators don't want probabilistic safety. They want deterministic rules. This action is authorized, or it isn't. This agent can issue provisional credit up to a defined threshold, or it can't. The policy permits it or blocks it. No inference. No grey area.

That is a fundamentally different architecture to what most AI deployments are built on today.

Everything has to be auditable automatically. Even if you get identity right and authority logic right, your regulator will ask: show me what happened. Walk me through every decision. Who authorized it? On what policy basis? What data was used?

If answering that requires engineers to reconstruct a log file, you don't get sign-off. The evidence trail has to be automatic, structured, and reportable - without manual effort.

Here is what this looks like in practice.

A customer - call him James - spots a charge he doesn't recognize. He raises it in his mobile app. The AI agent picks up the case. Before it looks at James's account, before it pulls a single piece of data, it clears a checkpoint.

The agent sends a request to the authority engine. Its own identity. The action it wants to take. The context. Sentinel checks the agent's registered identity against the policy store. What data can it access? What actions can it initiate? What thresholds apply? Does the current context fall within authorized parameters?

If the answer is yes, it issues a Decision Token - digitally signed, time-stamped, policy-referenced, and logged.

No token, no action. That is a hard invariant.

The agent proceeds. It reviews the disputed transaction, assesses the claim, and flags 87% confidence the dispute is valid. But Sentinel's policy requires human approval for provisional credit above a certain threshold. The agent cannot override that. It surfaces the recommendation to a front-office employee - full context, confidence score, and the policy basis for why human approval is required. She reviews. She approves. James gets his money back in minutes.

Before the case closes, Sentinel runs a final compliance sweep. Every action confirmed within authorized boundaries. Decision Tokens aligned with the audit trail. Evidence record closed as structured data - pulled automatically by the reporting engine, to the business owner, the risk team, the regulator if required.

Before: this dispute took days. Multiple systems. Manual review at every handoff. An audit trail painful to reconstruct.

After: minutes. One workspace where every decision is traceable to a policy and a token.

You recognize the before. That is most banks today.

The kill switch conversation.

72% of banks don't have an AI kill switch. The concern is legitimate. An agent with no hard stop is genuinely alarming.

But the kill switch is the last resort.

The measure of a good governance architecture isn't how quickly you can push that button. It's how rarely you ever need to. If your governance system is working, agents don't reach the point where a kill switch is required. The authority check happens before execution. If an agent requests something outside its policy boundaries, it gets a hard no. There is no rogue behavior to terminate because the behavior was blocked at the gate.

The kill switch still needs to exist. If an agent somehow continues to disobey after being denied - it gets terminated. That's non-negotiable.

But if your governance answer is "we have a kill switch," you have nothing more than a safety net at the bottom of a cliff.

What regulators are increasingly asking to see - and I'm speaking from direct conversations here - is the architecture above the kill switch. The policy store. The agent registry. The authority decisioning logic. The evidence ledger. The system that makes the kill switch a last resort.

The multi-agent world is arriving faster than most banks are ready for.

Banks today are running a handful of AI agents. That number is going to grow to thousands. And they won't all be native to one platform. You'll have agents from your core banking provider, your CRM, your risk and compliance tooling. Agents from OpenAI, Anthropic, Salesforce, ServiceNow. Each with their own training, their own behavior, their own boundaries - or lack of them.

Without a single authority layer that all of them must pass through before they act, you get chaos operating at machine speed.

This is why agent identity matters so much.

We already had Know Your Customer. We already had Know Your Employee. Now we need Know Your Agent. Every agent operating in your frontline needs a registered identity, an explicit authority profile, and a hard requirement to check in before it acts.

Apply that consistently across every agent regardless of origin. That is what makes a multi-agent environment governable.

Autonomy is earned.

You start with a human reviewing every output before anything executes. That is the lowest-risk entry point. Most current deployments live here - and that's exactly right.

Then you earn the right to delegate. The agent executes certain classes of action autonomously, below defined thresholds and within defined policy boundaries. Human review is reserved for exceptions and high-stakes decisions.

Eventually the governance layer itself becomes the supervisor. Deterministic guardrails. Agent judges reviewing agent decisions. Full auditability at every step. But only after you've proven it holds in production - with evidence, not just in the demo.

The governance architecture is what brings your risk and compliance team with you. All the way from pilot to production.

Five questions to take back.

Do your AI agents have registered identities with explicit authority profiles - or do they operate with no formal record of what they're permitted to do?

Can your risk team pull a complete, automatically generated audit trail for any AI-assisted decision made in the last 30 days - without engineering support?

Are your authority rules deterministic and machine-enforced, or written in a policy document the model is expected to follow?

When the first external agent from a third-party vendor arrives in your environment, do you have a governance architecture it has to pass through - or will it operate outside your control plane?

How does your current governance approach handle the difference between human-in-the-loop, delegated, and autonomous execution?

Autonomy is earned.

The governance architecture is what does the earning.

Thanks for reading,Jouk

‍

Governance: Why fewer than 10% of banks have AI in production at scale

Three things have to be true before AI scales in a bank.

Here is what this looks like in practice.

The kill switch conversation.

Related