Model Card
A one-page document that tells you more about a model than the entire vendor pitch deck. If they don't have one, that itself is the answer.
The Technical Definition
A model card is a standardized document that describes a machine learning model’s intended use, training data, evaluation results, performance across different subgroups, known limitations, and ethical considerations. The format was proposed in a 2018 paper by Margaret Mitchell and colleagues at Google (“Model Cards for Model Reporting”). It has since become standard practice at Google, Hugging Face, OpenAI, Anthropic, and most serious model providers.
A complete model card answers a short list of blunt questions. What is this model? What was it trained on? What is it for? What is it not for? How well does it perform, and on what benchmarks? Where does it fail? Who tested it, and who is responsible for it?
What This Actually Means for Your Business
Your AI vendor has a pitch deck, a website, and a sales engineer. None of those tell you anything about the model. The model card does.
If a vendor cannot produce a model card for the model behind their product, one of three things is true. They don’t know what model they’re using (they’re reselling someone else’s API and haven’t checked). They know but don’t want to share (the model is older or weaker than the pitch implies). Or they know and don’t think it matters (which tells you something about how they handle every other question they don’t want to answer). All three are signals.
If they do produce one, read it for what’s in the limitations section. That’s the part written by the people who know the model best, often grudgingly. Anthropic’s model cards say their models can refuse appropriate requests, hallucinate facts, and be jailbroken. OpenAI’s GPT-4 system card describes the model’s specific weaknesses on math, citations, and certain reasoning tasks. Those statements aren’t disclaimers — they’re a map of where the model will fail in your business, written by the people most likely to know.
A good model card also tells you what the model wasn’t tested on. Performance metrics are usually reported on standard benchmarks. Your business is not a standard benchmark. If the model card reports excellent performance on academic question answering and you’re deploying it for medical claims review, the model card hasn’t told you the model is good for your use case. It has told you the vendor doesn’t know.
Reality Check
What the vendor says: “We can share a detailed performance overview during procurement.”
What that means in practice: They don’t have a model card, or the one they have is worse than they want you to see. A model card is a public-facing document by definition. If it’s NDA-only, it’s not a model card. It’s a custom slide.
What Operators Actually Do
Procurement teams that buy AI seriously start every evaluation with the model card. Before the demo, before the reference call, before the proof of concept. Fifteen minutes with the document tells you whether the rest of the process is worth the calendar time.
They look for four specific things. A clear statement of intended use, written in plain English, that matches what the vendor is trying to sell you. Quantitative performance numbers on benchmarks they’ve heard of, reported with confidence intervals or version numbers attached. A subgroup analysis — performance broken out by demographic, geography, language, or domain — because aggregate numbers hide the failures that get you sued. And a limitations section longer than two sentences.
They also use the model card to write their own acceptance tests. If the card says the model performs well on English-language summarization and your business runs in Spanish, your acceptance test runs in Spanish. If the card says the model has limited performance on long context, your acceptance test uses your longest documents. The model card tells you where the cliff is. Your job is to walk to the edge before you sign the contract.
The operators who skip this step pay for it later, usually in front of a customer.
The Questions to Ask
-
Can I have the model card? If yes, read it before the next call. If no, you’ve already learned the most important thing about how this vendor operates.
-
What’s in the limitations section? Read it out loud on a call with the vendor. Watch how they react to their own document. Discomfort is information.
-
What testing did you do for our specific use case? The published model card is generic. Your use case is not. If they haven’t tested for your domain, your acceptance tests are the audit.