LoRA & Parameter-Efficient Fine-Tuning (PEFT)

Customizing a model by training a tiny add-on instead of the whole thing. A hundred times cheaper than full fine-tuning, and the reason your team can now afford to specialize a model at all.

Models & Architecture

The Technical Definition

Parameter-Efficient Fine-Tuning (PEFT) is a family of techniques for customizing a large model by training only a small fraction of its parameters, while leaving the rest frozen. LoRA — Low-Rank Adaptation — is the most widely used method in the family.

The intuition is this. A standard fine-tune of a 70-billion-parameter model means updating all 70 billion weights, which requires industrial GPU resources and produces a 280-gigabyte output file. A LoRA fine-tune freezes the original weights and trains a small set of additional matrices — sometimes a few million parameters total, less than a hundredth of a percent of the model — that sit alongside the original. At inference time, the original model and the LoRA adapter combine to produce the customized behavior.

The adapter file for a serious LoRA fine-tune is often under a gigabyte. Sometimes under a hundred megabytes. You can train it on a single GPU in hours instead of a cluster in weeks. You can ship it as a small file that plugs into the base model. You can keep dozens of them around, one per use case, and swap them in and out without touching the underlying weights.

What This Actually Means for Your Business

Until LoRA-style techniques became standard, fine-tuning a frontier-class model was a project measured in hundreds of thousands of dollars and weeks of GPU time. The economics meant that customization was something only the model labs and their largest customers did. Everyone else used the off-the-shelf model and engineered prompts to compensate.

That has flipped. A LoRA fine-tune of an open-source 70B model on a domain-specific corpus is now a project that a small internal team can complete in a week, on cloud GPU spend that would not surprise a senior engineering manager. The output is a small adapter file that captures your domain, your terminology, your style, your decision rules — without modifying the underlying model.

The commercial consequence: customization stopped being a procurement decision and became an engineering decision. You no longer have to ask the vendor for a fine-tuned version. Your team can build one. The model labs know this, which is why the better closed-source providers have started offering managed fine-tuning services that deliver LoRA-class economics through an API. The open-source ecosystem has been doing it natively for two years.

For a CEO, the practical question becomes: what specific knowledge or behavior is unique enough to your business that a fine-tune is justified, versus what can be handled with prompting and retrieval? The answer is rarely “everything.” But for the workflows where it does apply — internal style, regulated domain language, repeatable decision logic — a small adapter trained on your data outperforms a generic model with a clever prompt, every time.

Reality Check

What the vendor says: “We’ll fine-tune the model on your company’s data.”

What that means in practice: They will train a LoRA or similar adapter on whatever data you give them, with however much quality control you specified — which is usually less than you assumed. The result is real customization, but it is only as good as the data and the evaluation discipline behind it. Fine-tuning bad data produces a confidently wrong specialized model.

What Operators Actually Do

Teams running serious customization treat the adapter as a product, not a one-time training event. They version it. They evaluate each new version against a held-out test set. They roll out new versions the way they roll out application code — through staging, with rollback. The fine-tune is software, and it deserves the same release discipline.

They also keep the data pipeline separate from the training pipeline. The corpus the adapter trains on gets curated, deduplicated, and labeled before any GPU spins up. The model labs that have fine-tuned thousands of customer adapters will tell you, off the record, that data curation is where 80 percent of the quality comes from. The training itself is the easy part.

The pattern that fails: treating fine-tuning as a substitute for retrieval. If your problem is “the AI doesn’t know our current product catalog,” fine-tuning is the wrong tool — your catalog will be out of date the moment you ship the adapter, and you’ll be retraining every quarter. That’s a job for retrieval against a maintained knowledge base. Fine-tuning is the right tool for stable behaviors: tone of voice, document structure, decision logic, regulated terminology. The split between “what changes often” (retrieval) and “what stays stable” (fine-tune) is the cleanest way to think about it.

The Questions to Ask

What specific behavior or knowledge are we fine-tuning for, and why is retrieval not the right answer? Most fine-tuning projects should not be fine-tuning projects. The first question is whether RAG against a curated knowledge base would solve the same problem with less ongoing maintenance.
What’s the evaluation setup? A LoRA adapter that improves benchmark performance can still degrade behavior in subtle ways your evaluation didn’t catch. Insist on a held-out test set drawn from real workload, scored against the un-fine-tuned baseline, before any adapter goes to production.
Who owns the data curation? The quality of a fine-tune is the quality of its training data. If no one is named as the owner of that corpus — its scope, its curation, its versioning — the adapter is being built on whatever someone happened to export. That’s the project that ships, then fails quietly six months later.

The Technical Definition

What This Actually Means for Your Business

Reality Check

What Operators Actually Do

The Questions to Ask

One operator. Every other Wednesday.