What is DSPy?
DSPy is a framework from Stanford NLP that lets you program language models instead of prompting them. You write Python code describing what you want, and DSPy's optimizers automatically figure out how to make it happen.
TL;DR
DSPy replaces manual prompt engineering with programmatic optimization. Instead of crafting prompts by trial and error, you define signatures and let DSPy's optimizers find the best approach.
GEPA is DSPy's newest optimizer (July 2025) that uses natural language reflection to improve prompts - beating reinforcement learning approaches while using 35x fewer resources.
The Problem: Prompt Engineering is Broken
Here's what building with LLMs looks like today:
1. Write a prompt
2. Test it
3. It fails on edge cases
4. Add more instructions
5. Now it's too long and expensive
6. Simplify
7. It fails differently
8. Repeat forever
The core issues:
| Problem | Reality |
|---|---|
| Brittleness | Prompts break when you change models, add features, or scale up |
| No composability | You can't easily combine prompts like you combine functions |
| Manual optimization | Every improvement requires human intuition and trial-and-error |
| No portability | Prompts optimized for GPT-4 don't transfer to Claude or Llama |
DSPy: Programming, Not Prompting
DSPy separates what you want from how to achieve it:
- You define: "Given a question and context, produce an answer"
- DSPy figures out: The exact prompt, examples, and structure to make it work
A Simple Example
Traditional prompting:
prompt = """You are a helpful assistant. Given the following context and question,
provide a comprehensive answer. Be concise but thorough.
Context: {context}
Question: {question}
Answer:"""
response = llm.complete(prompt.format(context=ctx, question=q))
DSPy:
import dspy
qa = dspy.ChainOfThought("context, question -> answer")
response = qa(context=ctx, question=q)
No prompt template. No manual engineering. DSPy handles the rest.
How DSPy Works
DSPy has three core concepts: Signatures, Modules, and Optimizers.
1. Signatures: What You Want
Signatures declare input/output behavior:
# Simple signature
"question -> answer"
# With types
"question: str -> answer: float"
# Multiple inputs and outputs
"context: list[str], question: str -> reasoning: str, answer: str"
2. Modules: How to Execute
Modules implement signatures with different strategies:
| Module | What It Does |
|---|---|
dspy.Predict | Basic prediction |
dspy.ChainOfThought | Adds step-by-step reasoning |
dspy.ProgramOfThought | Generates code to solve problems |
dspy.ReAct | Agent with tool use |
3. Optimizers: Automatic Improvement
Optimizers tune your program to maximize a metric:
from dspy.teleprompt import MIPROv2
def metric(example, prediction):
return prediction.answer.lower() == example.answer.lower()
optimizer = MIPROv2(metric=metric, auto="medium")
optimized_qa = optimizer.compile(qa, trainset=examples)
Why GEPA Changes Everything
GEPA (Genetic-Pareto) is DSPy's breakthrough optimizer from July 2025.
The Key Insight
Instead of treating optimization as an RL problem, GEPA treats it as a reflection problem:
- Sample trajectories: Run the program, collect traces
- Reflect in language: Ask the LLM to diagnose what went wrong
- Propose improvements: Generate new prompt variations
- Test and combine: Use Pareto optimization to find the best
The Results
| Comparison | GEPA Improvement |
|---|---|
| vs GRPO (RL-based) | +10-20% better, 35x fewer rollouts |
| vs MIPROv2 | +10% across multiple LLMs |
Real-World Impact
DSPy is in production at JetBlue, Replit, Databricks, Sephora, VMware, and Moody's.
Benchmark Improvements
| Task | Before | After | Gain |
|---|---|---|---|
| RAG (SemanticF1) | 42% | 61% | +19% |
| ReAct Agent | 24% | 51% | +27% |
| Multi-hop QA | 31% | 59% | +28% |
Getting Started
Installation
pip install -U dspy
Basic Setup
import dspy
# Configure your LM
lm = dspy.LM('anthropic/claude-sonnet-4-5-20250929', api_key='YOUR_KEY')
dspy.configure(lm=lm)
# Simple QA with reasoning
qa = dspy.ChainOfThought("question -> answer")
result = qa(question="What is 15% of 80?")
print(result.reasoning) # Shows step-by-step math
print(result.answer) # 12
DSPy and Claude Skills
DSPy and Claude Skills solve different problems:
| Aspect | DSPy | Claude Skills |
|---|---|---|
| Focus | Optimizing LLM programs | Teaching workflows |
| When to use | Production systems, complex pipelines | Developer productivity, reusable instructions |
| Optimization | Automatic via algorithms | Manual via writing |
They're complementary. Use Skills for developer workflows and DSPy for production ML pipelines.
Resources
- Documentation: dspy.ai
- GitHub: github.com/stanfordnlp/dspy (30k+ stars)
- Papers: DSPy (ICLR 2024), GEPA (July 2025)
Conclusion
DSPy represents a paradigm shift from artisanal prompt crafting to systematic AI engineering. Combined with GEPA's efficient optimization, it's the foundation for building reliable LLM applications.
The teams still hand-crafting prompts in 2025 are competing against people with better tools.