Generative AI Fine-Tuning Models

A pre-trained large language model knows a lot about the world in general. But for specific industries, tasks, or communication styles, a general model may produce generic outputs. Fine-tuning solves this by training an existing model further on a smaller, targeted dataset — making it an expert in a particular domain without rebuilding it from scratch.

What Is Fine-Tuning?

Fine-tuning takes a pre-trained base model and continues training it on a curated, domain-specific dataset. The model retains all its general knowledge but adjusts its weights to prioritize patterns from the new training data.

Pre-trained base model
  │ (trained on 1 trillion tokens of general web data)
  │ → Knows: language, facts, reasoning, coding, science...
  │
  ▼
Fine-tuning on 10,000 medical Q&A examples
  │
  ▼
Fine-tuned medical model
  │ → Knows: everything above + medical terminology, clinical tone,
  │          diagnosis-style reasoning, drug interactions

When to Fine-Tune

Situation	Fine-Tune?	Reason
Need a specific writing tone or style consistently	Yes	Tone and style are hard to enforce with prompts alone
Working with specialized vocabulary (legal, medical, technical)	Yes	Base model may use incorrect terminology
Repeated task with same format every time	Yes	Fine-tuning encodes the format into the model
Need to answer questions about recent events	No	Use RAG instead — fine-tuning doesn't add real-time facts reliably
Want to change the model's behavior on one task only	Maybe	Prompt engineering may be sufficient and cheaper

Types of Fine-Tuning

Full Fine-Tuning

All of the model's weights are updated during training. This produces the most specialized result but requires significant GPU memory and compute time — often too expensive for smaller teams.

Parameter-Efficient Fine-Tuning (PEFT)

Only a small subset of the model's parameters are updated, or small additional layers are added. The rest of the model stays frozen. This dramatically reduces compute requirements while still producing strong specialization.

The most popular PEFT method is LoRA (Low-Rank Adaptation).

LoRA — Low-Rank Adaptation

Standard Fine-Tuning:
  Update all 7 billion parameters → Very expensive (requires 80GB+ VRAM)

LoRA:
  Add small "adapter" matrices alongside original weights
  Train only these tiny adapter matrices (0.1% of total parameters)
  Original model weights remain unchanged
  → Runs on consumer hardware (16GB VRAM or less)

Result: Nearly identical performance to full fine-tuning at a fraction of the cost

The Fine-Tuning Process Step by Step

Step 1: Choose a base model
  → Select a suitable open-source model (e.g., LLaMA 3, Mistral, Phi-3)
       │
       ▼
Step 2: Prepare training data
  → Collect 500–10,000 high-quality instruction-response pairs
       │
       ▼
Step 3: Format data correctly
  → Structure data in the model's expected chat template
  Example:
  {"instruction": "Classify this contract clause as standard or non-standard.",
   "input": "The liability cap shall not exceed $500.",
   "output": "Standard — this is a typical limitation of liability clause."}
       │
       ▼
Step 4: Configure training
  → Set learning rate, batch size, LoRA rank, number of epochs
       │
       ▼
Step 5: Train the model
  → Fine-tune using a framework (Hugging Face Transformers, Axolotl, LLaMA Factory)
       │
       ▼
Step 6: Evaluate
  → Test on held-out examples; compare to base model performance
       │
       ▼
Step 7: Deploy
  → Host the fine-tuned model via API or inference server

Training Data Requirements for Fine-Tuning

Data Size	Expected Outcome
100–500 examples	Style or tone shift, basic behavior change
500–5,000 examples	Task specialization, consistent output format
5,000–50,000 examples	Domain expertise, complex reasoning in specific field
100,000+ examples	Deep specialization equivalent to instruction tuning a full model

Fine-Tuning vs Prompt Engineering vs RAG

APPROACH COMPARISON
────────────────────────────────────────────────────────────────────────
Approach          | Best For                          | Cost / Effort
────────────────────────────────────────────────────────────────────────
Prompt Engineering| Quick experiments, general tasks  | Low
Fine-Tuning       | Consistent style, domain vocab    | Medium–High
RAG               | Current or private knowledge      | Medium
────────────────────────────────────────────────────────────────────────

Popular Tools and Frameworks for Fine-Tuning

Hugging Face Transformers: The standard library for loading, training, and exporting models
Axolotl: Simplified fine-tuning framework with LoRA/QLoRA support
LLaMA Factory: User-friendly interface for fine-tuning Llama-family models
OpenAI Fine-Tuning API: Fine-tune GPT-3.5 / GPT-4o-mini via API without managing infrastructure
Together AI / Replicate: Cloud services for fine-tuning open-source models

Real-World Fine-Tuning Example

Company: A legal tech startup
Base model: Mistral 7B (open-source)
Training data: 3,000 contract review Q&A pairs from senior lawyers

Before fine-tuning:
  Prompt:  "Is this indemnification clause unusual?"
  Output:  "Indemnification clauses protect parties from certain losses.
             Whether this is unusual depends on the contract type..."
  Problem: Generic, hedging, not legally specific

After fine-tuning on legal data:
  Prompt:  "Is this indemnification clause unusual?"
  Output:  "Yes. This clause lacks a mutual indemnification provision,
             shifts full liability to the vendor, and has no cap —
             three non-standard elements for an enterprise SaaS agreement.
             Recommend negotiating mutual indemnification and a liability cap."
  Result: Specific, expert-level, actionable legal analysis

Fine-tuning gives models domain expertise. But it does not give them access to new information. For that, the next technique — Retrieval-Augmented Generation — connects a model to live, searchable knowledge sources.

Previous lesson

Back to course

Next lesson