Sentiment Analysis
The 2010s NLP use case that vendors still pitch as cutting work. LLMs solved it in two API calls. Here's where it still earns its keep and where vendors oversell it.
The Technical Definition
Sentiment analysis is the classification of text by emotional tone — positive, negative, neutral, and sometimes finer categories like frustration, urgency, or sarcasm. The 2010s version used purpose-built classifiers trained on labeled data, often producing a single score from -1 to +1. The modern version uses an LLM with a structured prompt and skips most of the training complexity. You ask the model to classify the text, you get a label back, you log it.
The shift matters operationally. What used to be a three-month data science project is now a two-API-call integration. The technical barrier collapsed. The business question — what to actually do with the classification — did not.
What This Actually Means for Your Business
Here’s the pitch you’ll hear: “Our AI analyzes the sentiment of every customer interaction so you can respond proactively.” Sounds useful. The problem is the gap between detecting that a customer is frustrated and actually doing something about it that improves the relationship.
Sentiment analysis works as an input to a process. It does not work as a product. The companies getting real value from it have built the process first — a CX team that triages tickets, a brand team that monitors reputation, a support manager who reviews conversations — and then bolted sentiment classification onto the existing workflow to make it faster. The companies failing at it bought a “sentiment dashboard” with no plan for who reads it or what they do when it lights up red.
There are three places this still drives real business value. CX triage: incoming support tickets get auto-classified, the angry ones jump the queue, response times on at-risk customers drop. Brand monitoring: social mentions get scored daily, sudden drops in sentiment trigger a flag for the comms team, you find out about a PR problem in two hours instead of two days. Support coaching: post-call transcripts get scored, agents get feedback on which calls went sideways, training improves over time.
There are also places vendors will pitch this and it won’t work. Sentiment-driven automated responses (the AI sees frustration and replies with a soothing message) almost always make things worse, because the customer wanted resolution, not empathy theater. Aggregate sentiment scores (“our brand sentiment is 0.62 this quarter”) sound like metrics and behave like astrology — the number moves, nobody knows why, no decision changes. And nuanced detection — sarcasm, regional slang, mixed sentiment in a single message — is still where the models stumble, especially in domains they weren’t trained on.
Reality Check
What the vendor says: “Our AI detects customer sentiment in real time across every channel.”
What that means in practice: It classifies messages as positive, negative, or neutral with about 85% accuracy on text that looks like its training data, and substantially worse on industry-specific language, sarcasm, or short messages. The “real time across every channel” part is integration work your team owns.
What Operators Actually Do
The companies extracting real value start with the workflow, not the model. They identify a specific decision that gets made today by a human reading text — which support ticket gets escalated, which social mention gets a reply, which call gets reviewed — and they use sentiment classification to make that decision faster or to make it on a higher volume of inputs. The classification is plumbing. The decision is the product.
They also keep humans in the loop on anything customer-facing. The model flags. A person responds. That’s not a limitation of the technology; it’s a deliberate choice. Automated sentiment-triggered responses tend to land in customer screenshots that go viral for the wrong reasons.
The third pattern: they validate the model on their own data before they trust it. Off-the-shelf sentiment models are trained on product reviews and social media, which sound nothing like an industrial supplier’s customer support tickets or a hospital’s patient feedback. Operators run 200 representative samples through the model, score them by hand, compare. If accuracy is below 80% on their domain, they fine-tune or re-prompt before deploying.
The Questions to Ask
-
What decision changes when the sentiment score moves? If the answer is “we’d look at the dashboard,” it’s not a product, it’s wallpaper. There needs to be a specific action — a ticket gets escalated, a flag gets raised, a call gets reviewed — tied to the output.
-
How accurate is the classifier on our specific data? Vendor benchmarks are run on Twitter and Amazon reviews. Your data is internal support tickets, B2B emails, or industry jargon. Test the model on 100 of your real examples before trusting any of the marketing numbers.
-
What happens when the model is wrong? If a frustrated customer gets classified as neutral, what’s the fallback? If a neutral customer gets classified as angry and triggers a high-priority response, what’s the cost? Build the error-handling path before you scale the system.