AI in banking

Mythos is here: is your architecture ready?

10 June 2026
5
mins read

Every few months, a new frontier model lands and the same ritual plays out. The board asks the CIO: what's our strategy for this? The CIO commissions a task force. The task force runs a pilot. And eighteen months later, the pilot is still a pilot.

We saw it with GPT-4. We saw it with the wave of reasoning models that followed. And now, with Mythos-class models arriving - genuinely more capable than anything that came before - I'm watching banks gear up to repeat the exact same cycle.

Before we go into this topic. Let's get one important fact out the way. The model was never your bottleneck. If your bank couldn't get the full value out of GPT-4. It will hit the same walls with Mythos. Frontier models dropped into a fragmented architecture and operating systems designed for humans, won't turn your bank into a frontier bank. It becomes the world's most articulate intern locked out of every room that matters.

And this time, the cost of getting it wrong isn't just a stalled pilot. Because while you're running your task force, someone else is already running Mythos-class intelligence against you.

You need serious AI to defend against AI

The fraudsters attacking your bank are AI-native. They have no legacy core. No procurement cycle. No model risk committee. The moment a new frontier model becomes available, they're using it - to generate flawless synthetic identities, to clone voices for social engineering, to probe your onboarding flows at machine speed, to write phishing campaigns personalized to individual customers. Their adoption time is measured in days. Yours is measured in quarters. Sometimes years.

That gap is the real threat here. A scam call used to be detectable because the script was clumsy and the accent didn't match. Now the voice is your customer's voice, the story is coherent, and the documents are so perfect they will pass even the most experienced human eyes. The only thing that can catch an AI-generated attack operating at machine speed is AI-powered defense operating at machine speed - agents that can look at a transaction, a session, a device, a behavioral history, and a relationship in real time and say: this doesn't fit this customer.

Not "this account." Not "this card." This customer - the whole human, across every product, every channel, every interaction they've ever had with you.

The walls the older models hit are the walls the new models will hit

Think about what happened with the last generation of models inside your bank.

The pilots that worked were the ones that didn't need your systems: summarizing documents, drafting emails, answering policy questions from a knowledge base. Genuinely useful. Also genuinely marginal.

The pilots that mattered - the ones touching real customer journeys, real decisions, real money - died. And they all died the same death. The model needed to know the customer's full picture, and the full picture lived across 50 to 100 backend systems, each calling the customer entity something slightly different, each holding a fragment of the truth, none of them built to hand context to anything else. The agent could reason brilliantly about the fragment it was given. It just never had the whole.

It's never been about the model. You could swap GPT-4 for Mythos tomorrow and the wall doesn't move an inch. A smarter model reasoning from 20% of the truth gives you more confident wrong answers, faster. In a fraud context, that's not a maassive liability. An agent that blocks a legitimate customer's mortgage payment because it couldn't see the salary deposit sitting in another system, has the same problem as the model that will miss important details in a the case of an attack or a fraud attempt.

Fraud lives in the whitespace between your systems - the same whitespace where your handoffs, exceptions, and manual coordination live. The attackers know your fragmentation better than you do. They engineer for it. A synthetic identity passes onboarding precisely because the system checking the ID can't see the system holding the device fingerprint can't see the system tracking application velocity across your other brands.

The model will do it's work, while your architecture leave's the back door wide open.

If you can't govern older models, you can't govern new models

There's a second wall, and it's the one your risk and compliance teams will build for you if you don't get ahead of it.

Take the GPT-4-class pilot your bank ran in 2024 or 2025 - the one that's still a pilot. Ask why it never reached production. I can tell you the answer before you ask, because I've heard it in dozens of boardrooms, in almost identical words: we couldn't get it past risk. Nobody could say with precision what data the model saw, what policy bounded its decisions, who was accountable when it acted, or how to reconstruct its reasoning six months later. So compliance said no. Because they were asked to sign off on something nobody could explain.

Now ask yourself what changes when you swap in Mythos. Nothing. The regulator's questions are model-agnostic. The PRA, the ECB, the OCC - none of them care which frontier lab trained your model or how many benchmarks it tops. They care whether you can demonstrate control: what the system was authorized to do, what it actually did, on what basis, and how you'd detect and correct it if something went wrong.

This is the trap I see institutions walking into right now. They've concluded that their pilots stalled because the models weren't good enough - too many hallucinations, not enough reasoning. So they're waiting for the next generation to solve it. But a more capable model makes the governance problem harder, not easier. Mythos can take more consequential actions, chain longer sequences of decisions, and operate with more autonomy than anything before it. Greater capability without governance simply gets you a bigger sandbox with a taller fence - and a compliance officer who is even more nervous, for even better reasons.

Governance it an architectural capability you HAVE to build: policy expressed in a form an agent can be bound by, authority that's explicit and tiered, every decision landing in a ledger a supervisor can replay. Build it once, properly, and it governs every model you'll ever deploy. Skip it, and every frontier model you adopt will live and die in pilot, hitting the same wall with the same regulators, forever.

Stop and watch - the deepfake call, two ways

A customer's "daughter" calls the contact center on a Friday afternoon. The voice is perfect - cloned from forty seconds of social media video. She knows the mother's address, her date of birth, the name of her dog. She needs an urgent payment released; mum is in hospital.

In the fragmented bank, the agent on the line sees the servicing system. The voice biometrics flag - if one fired at all - lives in the fraud platform the contact center can't access. The fact that the account password was reset two hours ago from a new device sits in the digital channel logs. The fact that three other customers received near-identical calls this week sits in a case management tool in another department. Each system holds a fragment. Nobody connects the dots. The payment goes out. The bank finds out it was fraud on Tuesday, when the real daughter calls.

Now run it again with a unified customer truth layer and a governed defensive agent in the loop. The moment the call connects, the agent - operating silently alongside the human - has already assembled the full picture from the state graph: the password reset, the new device, the unusual payee, the pattern match against this week's other cases, the fact that this customer has never once made a payment of this type in eleven years. Confidence that this is an attack: high. Within its governed authority, the agent holds the payment, routes a verification challenge through a channel the attacker doesn't control, and writes every step into the decision ledger. The human agent gets one clear screen instead of seven systems. Total elapsed time: ninety seconds.

Same model intelligence available in both scenarios. Completely different outcome. The only variable is architecture.

What governed intelligence needs to function

This is where banks consistently confuse two different things: model intelligence and operational authority. Intelligence is now abundant - you can rent the frontier by the token. What's scarce, what's hard, and what separates one bank from another, is the ability to give that intelligence three things.

Full context. A single, always-updated customer state graph - a unified truth layer that sits above your systems of record. An ontology that defines, once, across all your backend systems, what a customer is, what a relationship is, what normal looks like for this human. You don't get this by replacing your cores. You get it by binding your existing systems to a common business language and persisting the customer state in the middle. In the agentic era, the bank without a persistent customer truth layer is the bank whose agents make expensive guesses.

Governed authority. Explicit, machine-readable boundaries on what each agent may decide alone, what it must escalate, and what it may never touch. Authority that scales with confidence and shrinks with risk. This is what turns "we have a chatbot" into "we have a workforce."

A decision ledger. A persistent, time-series record of every decision made in every customer journey - the score, the policy, the context, the rationale. Banks have discarded this for decades, recording only outcomes. For defense, the ledger is doubly valuable: it's the audit trail that lets compliance say yes to production deployment, and it's the training signal that lets your defensive agents learn what attacks actually look like in your institution. Every fraud attempt you've ever caught - and every one you've missed - is sitting in that history, waiting to be an asset.

Give Mythos those three things and you have something no attacker can match: frontier intelligence with institutional memory and legal authority. Give it none of them and you have an expensive pilot with a press release.

The window is shorter than you thin

Here's what I think happens over the next 36 months.

Frontier models keep commoditizing. Mythos-class capability that feels extraordinary today will be basic capability by 2028, the way GPT-4-class capability already is. Every bank will have access to roughly the same intelligence - and so will every attacker. The fraud arms race goes fully machine-versus-machine, at machine speed, and the institutions still routing exceptions through seven systems and four-day handoffs simply won't be able to play. Their loss rates will climb, their false-positive rates will climb, and their customers - the legitimate ones being blocked by half-blind controls - will leave.

Meanwhile, the banks that did the unglamorous work - the ontology, the state graph, the decision ledger, the governance framework - will be able to adopt each new frontier model in weeks, not years. Deploy it into the same governed slots, point it at the same unified truth, and let it run. Model upgrades become a configuration change, instead of a transformation program. That's what AI-native means: not that you use AI, but that your operating model is built so that every improvement in AI flows straight through to the frontline.

The fragmented banks will buy the same models. They'll hit the same walls they hit in 2024, and they'll write the same internal memos explaining why the pilot didn't scale. The mistake won't be visible in any single quarter. Then there will be a tipping point - the way there was for the retailers who told themselves e-commerce was a niche - and the economics will break fast.

You don't prepare for Mythos by evaluating Mythos. You prepare for Mythos by making your bank the kind of place where Mythos can see the truth, act with authority, and leave a trail the regulators can sign off on. The model is the easy part. It always was.

The practical questions to take back

  1. If an AI-generated attack hit one of your customers today, how many of your systems would each see a fragment of it - and which system, if any, would see all of it?
  2. How long did it take your bank to move the last frontier model from pilot to production in a customer-facing journey? What, specifically, stopped it?
  3. Could your defensive AI explain to a regulator - decision by decision - why it blocked a transaction 18 months ago?
  4. Do your agents have explicitly governed authority - written down, machine-readable, risk-tiered - or does every meaningful action still route to a human queue?
  5. When the next model generation arrives, is adopting it a configuration change for your bank, or another transformation program?

If you can't answer these comfortably, ignore the AI hype - it means you're behind on architecture. The good news: that's fixable - and there are many banks out there actively doing so.

‍

About the author
Jouk Pleiter
Founder & CEO, Backbase

Jouk Pleiter is founder and CEO of Backbase, the AI-native Banking OS he built from a small startup in Amsterdam in 2003 into a profitable, global fintech leader. Under his founder-led stewardship, Backbase scaled organically to hundreds of millions in annual revenue, grew to a 2,000-person team, and built a client roster of 120+ leading banks and financial institutions serving tens of millions of end users worldwide. Known for a pragmatic, long-term leadership style, Jouk stresses close customer partnership and relentless product innovation. He partners closely with bank C-suite leaders to turn strategy into measurable customer and business outcomes.

Table of contents
Vietnam's AI moment is here
From digital access to the AI "factory"
The missing nervous system: data that can keep up with AI
CLV as the north star metric
Augmented, not automated: keeping humans in the loop