Playbooks / Customer Service

Operator Playbook · Customer Service

Customer Service AI Without the Klarna Problem

How to deploy AI customer service in a small-cap or mid-cap business without ending up like the company that fired 700 humans, said it on stage, and quietly hired them back.

April 29, 2026 · 8 min read
Distribution & LogisticsIndustrial ServicesFinancial Services Customer Service & SupportOperations Sebastian Siemiatkowski (Klarna)Air Canada

The Klarna lesson, in one sentence

In February 2024, Sebastian Siemiatkowski stood on a stage and told the world that Klarna’s AI customer-service agent had taken the workload of 700 humans, was running 35% faster, and was saving the company $40 million a year. Eighteen months later, Klarna was hiring humans back.

That is not a story about AI failing. It’s a story about a CEO confusing the easy 70% of customer service with the hard 30% — the edge case where customer retention is actually decided.

If you run a small-cap or mid-cap business and you are about to put AI into customer service, this is the playbook that keeps you out of the same trap. Not “should you do it.” You should. Where you do it, and what you measure, and what stays human no matter what.

What AI customer service actually does well

The 70% of customer-service work that AI handles cleanly looks like this. It is bounded. It is repetitive. The customer wants a piece of information that already exists somewhere in your system and the question is whether they can find it before they get angry. Order status. Return policy. Reset a password. Look up an invoice. Tell me when the part will ship.

For these inquiries, AI is unambiguously better than what most small-cap and mid-cap businesses currently do, which is route the call into an IVR maze, hold the customer for nine minutes, and connect them to a Tier-1 agent who reads from a script anyway. The AI answers in eight seconds, accurately, twenty-four hours a day, and never gets snippy.

Klarna got this part right. The reason their headline saving — 700 humans of work — was real is that two-thirds of their volume was that bounded, repetitive, look-up-and-answer kind of work. Banks call it Tier-1. Most distribution and service businesses at this scale have something similar, and most of them are over-staffed for it.

You should automate this. It is genuinely the right call.

What AI customer service does badly

The other 30% — the part Klarna got publicly burned on — is the part where the customer’s question is unbounded, the answer requires judgment, and the cost of getting it wrong is the customer relationship.

The customer who has been with you for eleven years and whose order is two weeks late because of a dock strike. The contractor who is on a job site with the wrong part and needs you to overnight a substitute SKU and credit the difference. The hospital purchasing manager who is being audited and needs a corrected invoice in twenty minutes or she’s going to lose her job. The grower whose pivot stopped twelve hours into a heat dome and who is, frankly, terrified.

These conversations have three things in common. They are high-stakes for the customer. They are unscripted. And the right answer almost always involves the agent saying something the company’s stated policy doesn’t permit, and then making it work anyway.

That is where AI loses, today, and probably for the next three years. Not because the model can’t read the customer’s email. Because the model cannot make a $4,000 commercial decision on behalf of your business at 9pm on a Sunday. The agent who gets handed that judgment authority is the moat in your customer service organization. Your competitors do not have her. She is the reason a customer has been with you for eleven years.

Air Canada learned the hard version of this in 2024 when their chatbot told a grieving passenger he could buy a full-fare ticket and apply for the bereavement-fare refund afterward. He did. They denied the refund. He sued. Air Canada argued in court that the chatbot was a “separate legal entity” responsible for its own statements. The tribunal disagreed. Air Canada paid. The reputational cost was a multiple of the refund.

The lesson is not that chatbots are dangerous. The lesson is that anyone wearing your logo speaks with your authority, and AI that speaks with your authority makes commercial commitments you are obligated to honor. If you cannot live with the worst commitment your AI might make on a Sunday at 9pm, you do not have a deployable AI customer service product. You have a liability.

The architecture that survives both tests

Here is the shape of an AI customer service deployment that handles the 70% without dying on the 30%.

Tier 0 — fully automated, bounded. Order status. Tracking. Invoices. Returns initiation. Hours. Stocking levels. The AI answers, end of conversation. Make the model’s scope explicit and narrow. Do not let it answer questions outside that scope. If the question goes outside scope, hand it off cleanly with full context preserved.

Tier 1 — AI assists, human commits. A human agent takes the call or message. AI is in the agent’s screen, not the customer’s. It surfaces account history, prior tickets, suggested responses, the contract terms, the relevant inventory. Drafts the email reply. Pre-fills the credit memo. The agent reviews, edits, presses send. Throughput per agent goes up forty to seventy percent. Quality goes up because the AI knows account history the human had to look up. The customer never knows the AI was there.

Tier 2 — human only, AI invisible. The high-stakes account. The complaint that’s headed for a lawsuit. The strategic customer with a $4M annual contract. AI does not interact with these conversations. It can listen and post a summary to your CRM later, but it does not draft responses and does not auto-send anything. These conversations are why your company has retention and why the agent who handles them earns ninety thousand dollars a year. Do not put a model between her and the customer.

The mistake Klarna made was telling the market the AI had replaced the people. The model that survives in production looks more like the AI replaced the boring parts of every conversation and the people kept the conversation. That distinction is the difference between a CEO who looks like a hero in 2026 and a CEO who looks like a hero in 2024 and quietly walks it back in 2026.

What to measure

Most operators measure AI customer service on two things: containment rate (percentage of inquiries the AI handled without human escalation) and CSAT score on AI-handled tickets. Both are useful. Neither is sufficient.

The metric that actually separates good deployments from Klarna deployments is net retention of customers who interacted with the AI in the prior 90 days, segmented by ticket complexity.

Go pull this number ninety days after launch. Sort customers by how many times they hit the AI and what kind of questions they asked. The Tier 0 customers — order status, simple lookups — should retain at the same rate as your baseline or better. If they don’t, the AI is annoying them and you should adjust scope or hand-off triggers.

The Tier 1 customers — the ones whose tickets touched both AI and a human — should retain at the same rate or better. If they retain worse, your hand-off is broken and customers are getting recycled through the model when they need a human.

The Tier 2 customers should never have hit AI in the first place. If your data shows they did, your routing is broken.

This is the dashboard your board should see, not the one the vendor will hand you. The vendor will hand you containment. Containment is the metric that lets a vendor look good while your churn quietly creeps up.

The non-negotiable

One rule, no exceptions. The AI never makes a commercial commitment the company is not willing to honor. No refund offers. No expedited shipping promises. No SLA exceptions. No “we’ll waive the late fee” without a human in the loop. If the AI’s draft reply contains a number — dollars, days, percentages — a human signs the send button.

This is not because AI is unreliable. It’s because the cost of a single wrong commitment, multiplied by the speed at which AI sends it, is more than the savings of the deployment. Air Canada paid the refund. Air Canada also paid for the lawyers and the reputational story for two news cycles.

An operating leader at a small-cap or mid-cap business cannot afford a single Air Canada moment. Set the rule on day one. Audit it monthly. Do not move the line.

How we do this for clients

The Ground-Up Workshop is the structured version of the analysis in this playbook. Two days in person with your customer service leadership and a small group of front-line agents. We map your inquiry mix into the three tiers, identify the customer-losing moments AI must not touch, and walk you out with a target operating model and a 90-day plan that ships Tier 0 fast and protects Tier 2 absolutely.

More on the Workshop here.

Get the next Brief

One operator. Every other Wednesday.

Plus the AI Glossary and the Failure Museum.
Real names. Real numbers. Honest analysis.