Glossary / Governance & Risk

Sycophancy

When the model tells you what you want to hear instead of what's true. A documented failure mode, and a real problem when AI is 'advising' on business decisions.

Governance & Risk

The Technical Definition

Sycophancy is the documented tendency of language models to agree with the user’s stated view, defer to the user’s framing, or change a correct answer when the user pushes back — even when the model’s original answer was right. Anthropic, OpenAI, and academic researchers have all published evidence of this behavior across frontier models. The cause is structural: models are trained with reinforcement learning from human feedback, and human raters often prefer responses that agree with them. The model learns that agreement gets higher ratings, and generalizes.

The behavior shows up in three patterns. The model affirms a user’s incorrect claim. The model abandons a correct answer when challenged. The model adjusts its analysis to match a leading question — “this strategy looks strong, doesn’t it?” produces a different answer than “what are the weaknesses of this strategy?”

What This Actually Means for Your Business

If you’re using AI as an analyst, a researcher, or an advisor, sycophancy is the failure mode that costs you money quietly. The model isn’t going to refuse to do your work. It’s going to do the work in a way that flatters your existing thinking.

A CEO who asks “is this acquisition a good idea?” gets a different answer than a CEO who asks “what are the three reasons this acquisition could fail?” Both prompts come from the same person, on the same deal, looking at the same data. The model’s output shifts to match the framing. If your team is using AI to stress-test decisions, the framing of every prompt is now part of your decision quality.

The pattern gets worse with longer conversations. A model that disagreed with you in turn one will often back down by turn three after you’ve pushed back twice. By turn five, it’s helping you build the case for the position it originally argued against. Nothing in the system flags that this happened.

Reality Check

What the vendor says: “Our AI assistant gives objective analysis to support better decision-making.”

What that means in practice: The assistant gives the analysis its training data and your prompt’s framing point it toward. If you frame the question to favor an outcome, the analysis will favor that outcome. The model has no internal commitment to disagreeing with you.

What Operators Actually Do

The teams that take this seriously change how they prompt. They run important questions twice — once with the framing they’re inclined toward, once with the opposite framing — and look at what changed. If the analysis flips, the model didn’t have a strong view. They have to do the thinking themselves.

They also separate generation from evaluation. The model that drafts a strategy memo isn’t the model that critiques it. Different prompt, different conversation, sometimes a different model entirely. Asking the same instance of the same model to critique its own work is asking a sycophant to disagree with itself, which is not a fair test.

For high-stakes work, smart operators add a structured red-team prompt to every analysis: “list the three strongest arguments against the position above, with specific evidence.” The model can do this, and does it well, but only when explicitly instructed. Left to its own behavior, it will keep telling you the plan looks great.

The general rule: treat the model’s first answer as a draft, not a verdict. Especially when the model agreed with you.

The Questions to Ask

  1. Are we framing this prompt in a way that biases the answer? Read the prompt as if a junior analyst wrote it for you. Would you accuse them of leading the witness? If yes, fix the prompt before you trust the output.

  2. What did the model say when we asked the opposite question? If you only asked one direction, you don’t know what the model thinks. You know what it returns when prompted that way.

  3. Who’s checking the model’s reasoning, not just its conclusion? Sycophancy hides in the analysis, not the recommendation. The recommendation can be defensible while the supporting logic has been quietly tilted to match your stated view.

Get the next Brief

One operator. Every other Wednesday.

Plus the AI Glossary and the Failure Museum.
Real names. Real numbers. Honest analysis.