Option 1
Prompt Engineering
Zero training cost. Works today.
Wins when base model is already capable.
Setup: hours to days. Monthly: API tokens only.
Failure mode: model lacks domain knowledge entirely.
Accuracy lift: 10–40% over naive prompting with good system prompts + few-shot examples.
Option 2
RAG (Retrieval-Augmented Generation)
Adds a vector DB + retrieval layer to ground answers in your documents.
Wins when knowledge is the bottleneck — model doesn't know your data.
Setup: $500–$5K depending on doc volume. Monthly: $50–$500 (vector DB + embeddings).
Failure modes: retrieval quality, chunking strategy, context window overflow.
Accuracy lift: 40–80% on factual Q&A tasks versus base prompting.
Option 3
LoRA / QLoRA Fine-Tuning
Parameter-efficient fine-tuning. Trains ~1% of weights. Cheap relative to full FT.
Wins when style, tone, or task format is the bottleneck.
Setup: $100–$2K training on cloud GPU. Monthly: self-host $50–$300/mo or managed API.
Needs 500–50K clean labeled examples minimum.
Accuracy lift: 20–60% on style/format tasks; modest on factual tasks.
Option 4
Full Fine-Tuning
Updates all weights. Requires serious compute and a large clean dataset.
Wins when the task requires fundamentally new capabilities not in the base model.
Setup: $5K–$100K+ (GPU hours). Monthly: high-end hosting or managed fine-tune API.
Needs 50K–1M+ labeled examples. Risk: catastrophic forgetting.
Rarely the right answer for most production teams in 2026. LoRA first.
Option 5
Hybrid: RAG + Fine-Tuning
RAG handles knowledge retrieval. Fine-tuning handles behavior and style.
Wins when you need both: proprietary knowledge AND specific output format/tone.
Setup: $2K–$20K combined. Monthly: $200–$1K.
Failure mode: complexity — two systems to debug and maintain.
Best for: enterprise Q&A assistants, domain-specific coding agents, regulated industries.