Klarna Replaced 700 Customer Service Agents With AI. Then It Hired Them Back. Both Decisions Were Right.
Sebastian Siemiatkowski made the loudest AI-replaces-humans bet in fintech history — and then publicly reversed it. Founder-CEOs should study both moves, because neither was a mistake.
THE CRAFT
In February 2024, Klarna CEO Sebastian Siemiatkowski went on a media tour to announce, with the quiet confidence of a man who knew the number was going to hit, that the company’s AI assistant had done the equivalent work of 700 customer service agents in its first month. The AI handled 2.3 million conversations. Average resolution time dropped from 11 minutes to 2 minutes. Repeat inquiries fell 25%. The estimated annual savings: $40 million. At the time, the number was the most concrete “AI replaced human jobs” data point any public company CEO had put on the record. It was cited in every AI-employment story for the next twelve months. It became the canonical example of what was coming.
Fifteen months later, Klarna started hiring human customer service agents again.
By late 2025, the company had acknowledged publicly that its AI-only customer service strategy was producing unacceptable quality outcomes. Customer complaints had risen. Satisfaction scores had dropped. Complex issues — the ones involving billing disputes, fraud claims, payment plan modifications, and anything where the customer was upset — were getting generic, repetitive, template-sounding responses that made customers feel like they were arguing with a policy document instead of a person. Siemiatkowski admitted on the record that the aggressive AI pivot had “negatively affected service and product quality.” By early 2026, Klarna had shifted to what the CEO described as an “Uber-style” hybrid model: AI handles the routine, high-volume, low-complexity interactions, and a flexible pool of human agents — remote workers, students, parents — picks up the complex, emotional, or high-stakes cases.
This is not a failure story. This is a sequencing story. And the sequence — replace everything with AI first, discover where the AI breaks, then surgically reintroduce humans at the break points — is the most operationally honest version of the “AI in customer service” playbook I have seen from any company at any scale. I am writing this one not because Klarna is in the ICP of this newsletter (it is not — it is a $15 billion pre-IPO fintech), but because the pattern Siemiatkowski just ran in public, with public data, at public cost, is the exact pattern that a $200M founder-CEO with a 30-person customer service team is going to run in the next 18 months. And if you read the Klarna sequence correctly, you can skip the expensive part.
THE OPERATOR
The initial deployment and the $40M number
The timeline matters. In 2023, Klarna partnered with OpenAI to build a custom AI assistant trained on Klarna’s internal knowledge base — product documentation, policy manuals, dispute-resolution playbooks, and years of anonymized customer interaction transcripts. The assistant launched in limited markets in late 2023 and went global in early 2024. It handled first-contact resolution for a broad swath of customer inquiries: payment schedules, order tracking, return policies, account management, basic troubleshooting.
The February 2024 announcement was, to be fair, accurate. The AI assistant did handle 2.3 million conversations in its first month. It did resolve cases in 2 minutes instead of 11. Repeat-contact rates did fall. The cost savings were real, and the $40M annualized number was derived from headcount that Klarna had genuinely attrited and not replaced. Siemiatkowski was not exaggerating the data. He was reporting real operational metrics from a real deployment. The tech press ran with it because the numbers were clean and the story was simple: AI replaces 700 people, saves $40 million, CEO says “the AI is better.”
What nobody asked — and what Siemiatkowski, to be fair, did not volunteer — was which of the 2.3 million conversations the AI was handling well, and which it was merely handling fast.
What broke, and when
The cracks started showing within six months. The pattern was consistent across markets and reported by multiple independent sources:
Simple cases got better. “When is my payment due?” “How do I return this?” “What’s my balance?” For the high-volume, low-emotion, single-lookup interactions that made up the majority of Klarna’s customer contact volume, the AI was genuinely superior to the human agents it replaced. Faster, more consistent, available 24/7, no hold music. The $40M savings was concentrated here, and it was real.
Complex cases got worse. Billing disputes where the customer believed they’d been charged incorrectly. Fraud reports where the customer was scared and needed reassurance. Payment plan modifications where the customer’s financial situation had changed and they needed a human to say “I understand, let’s figure this out.” Multi-turn conversations where the resolution depended on understanding not just what the customer was asking but why they were asking it — the emotional context underneath the policy question.
The AI’s responses to these cases were technically correct but experientially bad. They read like policy documents. They repeated the same phrasing. They did not modulate tone based on customer distress. They could not say “I hear you, this is frustrating, and here’s what I can actually do.” They said “Based on our policy, the resolution for this case is…” — which is the correct answer delivered in a way that makes an already-upset customer angrier.
Customer satisfaction scores dropped. The aggregate numbers told the story: the AI was resolving more cases, faster, at lower cost — and customer satisfaction was declining. This is the paradox that every founder-CEO needs to understand before deploying AI into any customer-facing function: speed and accuracy are not the same as satisfaction, and the gap between them is largest exactly when it matters most — in the cases where the customer is upset, confused, or scared.
Repeat-contact rates told the real story. The 25% reduction in repeat contacts that Siemiatkowski cited in February 2024 started to erode as the months went on. The reason: customers whose complex cases were handled by AI and “resolved” (meaning the AI gave them an answer) were not actually satisfied by the resolution. They were coming back. Some were coming back multiple times. Some were escalating to social media. Some were leaving Klarna entirely. The AI had not reduced repeat contacts — it had shifted them from “customer calls back because the first agent didn’t help” to “customer calls back because the AI gave them a technically correct answer that didn’t address their actual problem.”
The reversal
By spring 2025, Klarna began quietly rebuilding human customer service capacity. The company did not hold a press conference. There was no “we were wrong about AI” moment. What happened was more operationally honest: the team looked at the data, identified the case types where AI quality was unacceptable, and started routing those cases to human agents.
The “Uber-style” model that emerged by late 2025 and into 2026 works like this: AI handles first contact for all incoming inquiries. For simple cases (the majority), AI resolves them end-to-end. For complex, emotional, or multi-turn cases, the AI triages and hands off to a human agent — a remote worker who is typically part-time, flexible-schedule, and paid per-resolution rather than on salary. The human picks up with full context from the AI’s initial interaction, so the customer doesn’t have to repeat themselves. The AI handles the volume; the human handles the trust.
Siemiatkowski’s public framing of the reversal is worth quoting carefully: he described it not as a retreat from AI but as a quality-driven refinement. The AI was doing the volume work. The humans were doing the trust work. Both were necessary. Neither was sufficient alone.
The Craft of AI read
Here is the thing that almost everyone writing about the Klarna reversal is getting wrong, and it is the thing I most want you to understand.
Both decisions were correct. The original decision to deploy AI across all customer service was correct — because it generated the data that showed where AI breaks. The reversal was correct — because it used that data to rebuild a system where AI and humans each do the work they are actually good at. You could not have arrived at the second decision without making the first one.
This is the part that should change how you think about your own AI deployment. The mistake most founder-CEOs make is trying to figure out, in advance, which cases AI should handle and which cases should stay with humans. They build complex scoring rubrics, they consult with AI vendors, they hold workshops to map “complexity tiers.” All of this pre-planning produces a reasonable-looking plan that is specifically wrong — because the actual performance boundary of AI in your specific customer context can only be discovered by deploying the AI and measuring where it fails.
Klarna’s sequence is, in retrospect, the efficient path. Deploy AI everywhere. Measure failure points. Reintroduce humans at the failure points. The alternative — carefully deploying AI only where you think it will work — sounds safer but is actually slower, because your prediction of where it will work is based on assumptions, and the assumptions are wrong in ways you cannot discover without deployment data.
Three specific lessons for the $200M founder-CEO:
Lesson one: your AI customer service deployment will follow the same curve. You will deploy it, celebrate the speed and cost numbers, watch satisfaction drop on complex cases six months later, and reintroduce humans at the break points. This is not a failure. This is the discovery process. Budget for it. Tell your board it is coming. Call it “Phase 1: deployment and calibration” and “Phase 2: hybrid optimization.” If you label both phases before you start, nobody panics when Phase 2 arrives.
Lesson two: the break point is always the same. The cases where AI fails are the cases where the customer’s emotional state is more important than the factual content of their question. This is true in fintech, insurance, healthcare, hospitality, professional services, and every other category where human beings call because they are upset. If your customer service function handles upset customers — and it does — the AI will fail on those calls. Not because the AI is bad. Because the AI does not feel, and the customer needs someone who does. That is the job description for the human in the hybrid model.
Lesson three: the cost math changes completely in the hybrid model. In an all-AI model, cost per interaction is very low but customer lifetime value erodes on the complex cases. In an all-human model, cost per interaction is high but satisfaction holds. In the hybrid model — AI for volume, humans for trust — cost per interaction drops significantly (because the AI handles 70–80% of volume) and satisfaction holds on the complex cases (because a human handles those). The hybrid model is strictly better than either extreme. The question is not “should we use AI or humans.” The question is “where is the handoff line, and how do we build a system that moves the line as the AI gets better over time.”
Things to consider
- If you are planning to deploy AI in customer service, plan the reversal at the same time you plan the deployment. Not because you expect the AI to fail. Because the discovery of where it underperforms is the most valuable output of the first deployment. Budget for it, timeline it, and frame it to your board as Phase 2, not as a retreat.
- Identify your “trust cases” before deployment. Which customer interactions in your business are about trust, emotion, or fear — not information? Billing disputes? Warranty claims? Service failures? Account closures? Those are the cases where AI will technically resolve and experientially fail. You don’t need to hold those back from AI on day one — deploying there generates the data you need — but you need to have the human routing path ready for Phase 2.
- The “Uber-style” flexible agent pool is the operational model to watch. Klarna’s hybrid model uses part-time, remote, per-resolution human agents rather than full-time salary employees. This materially changes the cost structure of the human layer. If you are running a 30-person customer service team, the hybrid model might look like 3–5 full-time human agents handling complex escalations plus AI handling first contact and simple resolution. The total headcount is lower than today, but the total headcount is not zero, and the humans who remain are doing the highest-value work.
- Watch your repeat-contact rate as a leading indicator, not your resolution time. Resolution time will improve immediately when AI is deployed. Repeat-contact rate is the canary in the coal mine for experiential failure — it tells you whether the “resolved” cases are actually resolved in the customer’s mind. If repeat-contact rate rises while resolution time falls, you are in the Klarna curve and Phase 2 is coming.
- Siemiatkowski ran this experiment in public with public money and public data. You do not need to run it yourself. His deployment data — the $40M savings, the satisfaction drop, the reversal, the hybrid model — is, functionally, a free pilot study for every mid-market operator with a customer service team. Steal the timeline. Steal the metrics. Steal the Phase 1 / Phase 2 framing. Thank the man privately in your head.
THE WORKBENCH
Here’s the tactical takeaway for your Monday.
Pull your customer service data for the last 90 days and sort it into two buckets.
- Bucket A: Information cases. The customer is asking for a fact, a status, a how-to, a policy clarification. They are not upset. They want an answer and they want to move on. Examples: “Where’s my order?” “What’s the return policy?” “How do I reset my password?” “When is my next payment due?”
- Bucket B: Trust cases. The customer is upset, confused, scared, or making a consequential decision. They need to feel heard before they need an answer. Examples: “I was charged twice and I need this fixed.” “I think my account was compromised.” “This product broke after a week and I’m frustrated.” “I’m canceling, but I wanted to talk to someone first.”
Now count. What percentage of your total contact volume is Bucket A, and what percentage is Bucket B?
For most mid-market operators, Bucket A is 65–80% of volume and Bucket B is 20–35%. That ratio is your AI deployment map. Bucket A is where the AI goes on day one. Bucket B is where the human stays. The cost savings come from Bucket A — and at 65–80% of volume automated, the savings are material even in a 30-person team. The customer satisfaction holds because Bucket B — the cases that actually move lifetime value — never touches the AI without a human in the loop.
Do this exercise before your next vendor demo. When the vendor quotes you “90% automation rate,” you’ll know whether they’re automating Bucket A (good) or promising to automate Bucket B (dangerous).
THE QUESTION
Here is what I want to know from you this issue.
When was the last time a customer in your business needed to feel heard more than they needed an answer — and what happened?
Not the escalation report. Not the CSAT score. The actual moment. The call where someone was upset, or confused, or scared, and the thing that resolved the situation was not the policy or the refund or the replacement. The thing that resolved it was a human being who said, in whatever language your company uses, “I understand. Let me help.”
That moment is your Bucket B. It is the moment AI cannot replicate. It is also the moment that determines whether the customer stays or leaves. And the number of those moments your company handles per week is the number that should set the floor on how many human agents you keep, no matter how good the AI gets.
Hit reply, or send a note to grant@thecraftofai.com. One story, one sentence, one moment. I read every reply.
— Grant grant@thecraftofai.com
Get The Brief in your inbox.
Bi-weekly deep dives on how founder-CEOs of $100M–$500M operators are actually shipping AI. One story, no scanner filler, reply to Grant directly.