Glossary / Models & Architecture

Parameters

The weights inside the model. 7B vs 70B vs 1T sounds like horsepower — but bigger is not always better, and vendors know you don't know the difference.

Models & Architecture

The Technical Definition

Parameters are the numerical weights inside a neural network — the values the model learned during training that determine how it processes input. A model with 7 billion parameters has 7 billion individual numbers tuned through training. A model with 70 billion has 70 billion. The numbers sit inside the layers of the network and get multiplied against your input every time you run a query.

Parameter count is the closest thing AI has to a horsepower spec. It’s the number vendors put on the brochure. It’s also one of the most misleading numbers in the industry.

What This Actually Means for Your Business

The shorthand the industry runs on: more parameters generally means more capability, more cost, more memory required, and slower inference. A 7B (billion) model fits on a single decent GPU and runs fast. A 70B model needs serious hardware. A 400B+ model runs on a cluster.

But “more capable” is not a straight line. A 70B model from 2026 is wildly more capable than a 175B model from 2022, because architecture and training data improved. A well-trained 7B model can outperform a poorly-trained 70B model on specific tasks. Mistral’s 7B beats older 70B models on multiple benchmarks. Phi-3 at 3.8B parameters competes with models 20x its size on reasoning tasks.

What this means for you: when a vendor brags about parameter count, they’re often padding the spec sheet. Ask what the model actually does on your task instead. A salesperson telling you “our model has 200 billion parameters” is the AI equivalent of telling you their car has eight cylinders. Useful context. Not the answer.

Parameter count drives cost in a direct, painful way. Bigger models cost more per token to run, require more memory to host, and run slower under load. If you self-host, the difference between a 7B model and a 70B model is the difference between $5,000 of GPU and $50,000+ of GPU — for the same use case. If you call an API, the bigger models cost 5–20x per token.

The right question isn’t “what’s the biggest model we can use.” It’s “what’s the smallest model that handles our task well enough.” Operators who get this right save 80% on inference costs. Operators who don’t end up running a 1T parameter model to classify support tickets — and then complaining about the bill.

Reality Check

What the vendor says: “Our model has over 500 billion parameters — the largest in the industry.”

What that means in practice: They’re hoping you’ll equate parameter count with quality. Ask them to show you head-to-head performance on your actual task against a 30B or 70B model. If they can’t, or if the gap is small, you’re paying for parameters you don’t need.

What Operators Actually Do

The teams that deploy AI well treat parameter count as a cost dial, not a quality dial. They start with the smallest model that might plausibly work, test it against real examples from their workflow, and only scale up if the results aren’t good enough. This is the opposite of what most vendors want you to do.

A common pattern: a 70B model handles user-facing reasoning where mistakes are visible and expensive. A 7B or 8B model handles internal classification, summarization, and bulk processing. Some teams use 1B–3B models for the most routine tasks (formatting, extraction, simple routing) where the work is so structured that a small model is plenty. The router decides which model gets the request based on the task.

The other pattern that works: testing open-source small models alongside closed frontier models. A fine-tuned Llama 8B running on your own hardware can match GPT-4 quality on a narrow task at 1/100th the inference cost. That’s not theoretical — companies like Klarna and Walmart have published case studies showing exactly this. The frontier model is the prototype. The smaller fine-tuned model is the production system.

The Questions to Ask

  1. What’s the smallest model that meets the quality bar on our task? Test three sizes (small, medium, large) on the same 50 real examples. Look at the cost-quality curve. The right answer is rarely the biggest model.

  2. What hardware does this need to run, and what does that cost annually? Parameter count drives GPU memory, which drives infrastructure cost. A 70B model on premise might require $200K of hardware and a full-time engineer to keep healthy.

  3. Why this size and not a smaller one? If the vendor can’t articulate a reason — beyond “bigger is better” — they haven’t done the work of right-sizing the model to the task. You’ll pay for that gap forever.

Get the next Brief

One operator. Every other Wednesday.

Plus the AI Glossary and the Failure Museum.
Real names. Real numbers. Honest analysis.