Back to skills
Sql Optimization
SQL query optimization patterns including EXPLAIN plan analysis, index strategies, query rewriting, and N+1 query prevention. Use when optimizing slow database queries, analyzing query performance, designing indexes, or debugging database bottlenecks. Works with PostgreSQL, MySQL, SQLite, and other SQL databases. Typical improvements: 10x-1000x query speedup.
17 stars
0 votes
0 copies
0 views
Added 12/19/2025
data-aipythongosqldebuggingdatabaseperformance
Install via CLI
$
openskills install applied-artificial-intelligence/claude-code-toolkitFiles
SKILL.md
---
name: sql-optimization
description: SQL query optimization patterns including EXPLAIN plan analysis, index strategies, query rewriting, and N+1 query prevention. Use when optimizing slow database queries, analyzing query performance, designing indexes, or debugging database bottlenecks. Works with PostgreSQL, MySQL, SQLite, and other SQL databases. Typical improvements: 10x-1000x query speedup.
---
# SQL Query Optimization Patterns
Comprehensive guide to optimizing SQL queries for performance, including EXPLAIN plan analysis, index design strategies, query rewriting patterns, and N+1 query detection. Works across PostgreSQL, MySQL, and SQLite.
---
## Quick Reference
**When to use this skill:**
- Slow database queries (>100ms for simple queries, >1s for complex)
- High database CPU usage
- Analyzing EXPLAIN plans
- Designing database indexes
- Debugging N+1 query problems
- Optimizing JOIN operations
- Reducing table scans
**Common triggers:**
- "This query is too slow"
- "How do I optimize this SQL"
- "What indexes should I add"
- "Explain this EXPLAIN plan"
- "Fix N+1 queries"
- "Database CPU at 100%"
**Typical improvements:**
- 3 seconds → 50ms (60x faster)
- Full table scan → Index scan
- 1000 queries → 2 queries (N+1 elimination)
---
## Part 1: Understanding EXPLAIN Plans
### Reading PostgreSQL EXPLAIN
```sql
EXPLAIN ANALYZE
SELECT u.name, p.title
FROM users u
JOIN posts p ON u.id = p.user_id
WHERE u.created_at > '2024-01-01'
ORDER BY p.created_at DESC
LIMIT 10;
```
**Key Metrics to Watch**:
- **Seq Scan** (bad): Full table scan, reads every row
- **Index Scan** (good): Uses index, reads only needed rows
- **Cost**: Estimated computational cost (lower is better)
- **Actual time**: Real execution time in milliseconds
- **Rows**: Number of rows processed at each step
### Bad EXPLAIN Example
```
Seq Scan on users u (cost=0.00..1234.00 rows=1000 width=50)
(actual time=0.123..45.678 rows=950 loops=1)
Filter: (created_at > '2024-01-01'::date)
Rows Removed by Filter: 50000
Planning Time: 0.234 ms
Execution Time: 3456.789 ms
```
**Problems**:
- **Seq Scan**: Reading entire table (50,950 rows)
- **Rows Removed by Filter**: Filtering after reading (wasteful)
- **Execution Time**: 3.5 seconds (way too slow)
### Good EXPLAIN Example (After Index)
```
Index Scan using users_created_at_idx on users u
(cost=0.29..123.45 rows=950 width=50)
(actual time=0.012..3.456 rows=950 loops=1)
Index Cond: (created_at > '2024-01-01'::date)
Planning Time: 0.123 ms
Execution Time: 4.567 ms
```
**Improvements**:
- **Index Scan**: Using index (only reads needed rows)
- **Index Cond**: Filtering during index scan (efficient)
- **Execution Time**: 4.5ms (750x faster!)
---
## Part 2: Index Design Strategies
### When to Add an Index
**✅ Add index when:**
- Column frequently in WHERE clauses
- Column frequently in JOIN conditions
- Column frequently in ORDER BY
- Query does Seq Scan on large table (>10K rows)
- Query execution time >100ms
**❌ Don't add index when:**
- Table has <1000 rows (Seq Scan is fast enough)
- Column rarely queried
- Column has very low cardinality (e.g., boolean with 50/50 distribution)
- Table has heavy writes (indexes slow down INSERTs/UPDATEs)
### Single-Column Indexes
```sql
-- Create index on frequently queried column
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_posts_user_id ON posts(user_id);
CREATE INDEX idx_orders_created_at ON orders(created_at);
```
**Query that benefits**:
```sql
-- Before index: Seq Scan (3000ms)
-- After index: Index Scan (5ms)
SELECT * FROM users WHERE email = 'user@example.com';
```
### Composite Indexes (Multi-Column)
**Order matters!** Put most selective column first.
```sql
-- Good: Specific to general
CREATE INDEX idx_orders_user_status ON orders(user_id, status, created_at);
-- Query that uses all columns
SELECT * FROM orders
WHERE user_id = 123
AND status = 'pending'
ORDER BY created_at DESC;
```
**Index usage rules**:
```sql
-- ✅ Uses index (matches from left)
WHERE user_id = 123
WHERE user_id = 123 AND status = 'pending'
WHERE user_id = 123 AND status = 'pending' AND created_at > '2024-01-01'
-- ❌ Doesn't use index (skips first column)
WHERE status = 'pending'
WHERE created_at > '2024-01-01'
```
### Partial Indexes (Filtered)
**Index only subset of rows** (smaller, faster):
```sql
-- Only index active users
CREATE INDEX idx_users_active_email
ON users(email)
WHERE status = 'active';
-- Query that uses partial index
SELECT * FROM users
WHERE email = 'user@example.com'
AND status = 'active'; -- Must include filter condition!
```
**Benefits**:
- Smaller index (faster scans, less storage)
- Only indexes rows you actually query
- Great for status fields with skewed distribution
### Covering Indexes (INCLUDE)
**Include columns in index** to avoid table access:
```sql
-- Postgres: INCLUDE clause
CREATE INDEX idx_users_email_covering
ON users(email)
INCLUDE (name, created_at);
-- Query doesn't need to access table (all data in index)
SELECT name, created_at FROM users WHERE email = 'user@example.com';
```
**Benefit**: Index-Only Scan (even faster than Index Scan)
---
## Part 3: Query Rewriting Patterns
### Problem 1: SELECT * (Fetching Unnecessary Data)
**❌ Bad** (fetches all columns):
```sql
SELECT * FROM users WHERE id = 123;
```
**✅ Good** (fetches only needed columns):
```sql
SELECT id, name, email FROM users WHERE id = 123;
```
**Impact**:
- Smaller result set
- Less memory usage
- Faster network transfer
- Can use covering indexes
### Problem 2: N+1 Queries
**❌ Bad** (1 query for posts + N queries for users):
```python
# 1 query
posts = db.execute("SELECT * FROM posts LIMIT 10")
# N queries (10 separate queries!)
for post in posts:
user = db.execute("SELECT * FROM users WHERE id = ?", post.user_id)
print(f"{user.name}: {post.title}")
```
**✅ Good** (2 queries total):
```python
# 1 query for posts
posts = db.execute("SELECT * FROM posts LIMIT 10")
# 1 query for all users
user_ids = [p.user_id for p in posts]
users = db.execute("SELECT * FROM users WHERE id IN (?)", user_ids)
users_by_id = {u.id: u for u in users}
# No additional queries
for post in posts:
user = users_by_id[post.user_id]
print(f"{user.name}: {post.title}")
```
**Even Better** (1 query with JOIN):
```sql
SELECT u.name, p.title
FROM posts p
JOIN users u ON p.user_id = u.id
LIMIT 10;
```
### Problem 3: Implicit Type Conversion
**❌ Bad** (index not used due to implicit conversion):
```sql
-- user_id is INTEGER, but '123' is string
SELECT * FROM posts WHERE user_id = '123'; -- Seq Scan!
```
**✅ Good** (explicit type, index used):
```sql
SELECT * FROM posts WHERE user_id = 123; -- Index Scan
```
### Problem 4: Function Calls in WHERE Clause
**❌ Bad** (can't use index):
```sql
-- Function prevents index usage
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
```
**✅ Good** (use functional index):
```sql
-- Create index on function result
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
-- Now query can use index
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
```
**✅ Even Better** (store lowercase, regular index):
```sql
-- Store email in lowercase
CREATE INDEX idx_users_email ON users(email);
-- Query with lowercase value
SELECT * FROM users WHERE email = LOWER('user@example.com');
```
### Problem 5: OR Conditions
**❌ Bad** (often does Seq Scan):
```sql
SELECT * FROM users
WHERE email = 'user@example.com'
OR username = 'johndoe';
```
**✅ Good** (use UNION, allows index usage):
```sql
SELECT * FROM users WHERE email = 'user@example.com'
UNION
SELECT * FROM users WHERE username = 'johndoe';
```
### Problem 6: NOT IN with Subquery
**❌ Bad** (very slow, especially with large subquery):
```sql
SELECT * FROM users
WHERE id NOT IN (SELECT user_id FROM blocked_users);
```
**✅ Good** (use LEFT JOIN with NULL check):
```sql
SELECT u.*
FROM users u
LEFT JOIN blocked_users b ON u.id = b.user_id
WHERE b.user_id IS NULL;
```
---
## Part 4: JOIN Optimization
### JOIN Order Matters
Database optimizes join order, but you can help:
**Small table first, large table second**:
```sql
-- Good: Join small table (categories: 10 rows) to large table (products: 10M rows)
SELECT p.*
FROM categories c
JOIN products p ON c.id = p.category_id
WHERE c.name = 'Electronics';
```
### Avoid Cartesian Products
**❌ Bad** (produces rows1 × rows2 × rows3):
```sql
SELECT *
FROM users, posts, comments; -- 100 × 10000 × 50000 = 50 billion rows!
```
**✅ Good** (proper JOIN conditions):
```sql
SELECT *
FROM users u
JOIN posts p ON u.id = p.user_id
JOIN comments c ON p.id = c.post_id;
```
### EXISTS vs IN for Subqueries
**Scenario**: Find users who have at least one post
**❌ Slower** (IN loads all post IDs):
```sql
SELECT * FROM users
WHERE id IN (SELECT user_id FROM posts);
```
**✅ Faster** (EXISTS stops at first match):
```sql
SELECT * FROM users u
WHERE EXISTS (SELECT 1 FROM posts p WHERE p.user_id = u.id);
```
### JOIN vs Subquery
**Use JOIN when** you need data from both tables:
```sql
SELECT u.name, p.title
FROM users u
JOIN posts p ON u.id = p.user_id;
```
**Use Subquery when** you only need filtering:
```sql
SELECT * FROM users
WHERE id IN (SELECT DISTINCT user_id FROM posts);
```
---
## Part 5: Aggregation Optimization
### COUNT(*) Optimization
**Problem**: COUNT(*) on large tables is slow
**❌ Slow** (counts all rows):
```sql
SELECT COUNT(*) FROM orders; -- Full table scan on 10M rows
```
**✅ Faster** (approximate count from statistics):
```sql
-- PostgreSQL: Use statistics (fast but approximate)
SELECT reltuples::bigint AS estimate
FROM pg_class
WHERE relname = 'orders';
```
**✅ Also Good** (exact count with index):
```sql
-- If you have an index, COUNT can use index-only scan
CREATE INDEX idx_orders_id ON orders(id);
SELECT COUNT(*) FROM orders; -- Index-only scan
```
### GROUP BY Optimization
**❌ Bad** (groups after fetching all data):
```sql
SELECT user_id, COUNT(*)
FROM posts
GROUP BY user_id;
```
**✅ Good** (index helps grouping):
```sql
-- Create index on grouped column
CREATE INDEX idx_posts_user_id ON posts(user_id);
SELECT user_id, COUNT(*)
FROM posts
GROUP BY user_id; -- Can use index for grouping
```
### HAVING vs WHERE
**❌ Bad** (filters after grouping):
```sql
SELECT user_id, COUNT(*) as post_count
FROM posts
GROUP BY user_id
HAVING user_id > 1000; -- Filters after grouping all users
```
**✅ Good** (filters before grouping):
```sql
SELECT user_id, COUNT(*) as post_count
FROM posts
WHERE user_id > 1000 -- Filters before grouping (can use index)
GROUP BY user_id;
```
---
## Part 6: Pagination Optimization
### OFFSET is Slow for Large Offsets
**❌ Bad** (OFFSET scans all skipped rows):
```sql
-- Page 1000 (skips 50,000 rows, then returns 50)
SELECT * FROM posts
ORDER BY created_at DESC
LIMIT 50 OFFSET 50000; -- Scans 50,050 rows!
```
**✅ Good** (keyset pagination with WHERE):
```sql
-- Save last seen ID from previous page
SELECT * FROM posts
WHERE created_at < '2024-01-15 10:30:00' -- Last seen timestamp
ORDER BY created_at DESC
LIMIT 50; -- Only fetches 50 rows
```
**Implementation**:
```python
# Page 1
results = db.execute("""
SELECT id, created_at, title
FROM posts
ORDER BY created_at DESC, id DESC
LIMIT 50
""")
last_created_at = results[-1].created_at
last_id = results[-1].id
# Page 2 (using last seen values)
results = db.execute("""
SELECT id, created_at, title
FROM posts
WHERE (created_at, id) < (%s, %s)
ORDER BY created_at DESC, id DESC
LIMIT 50
""", last_created_at, last_id)
```
**Benefits**:
- Consistent performance for any page number
- No OFFSET overhead
- Works with millions of rows
---
## Part 7: Database-Specific Patterns
### PostgreSQL-Specific
**1. EXPLAIN ANALYZE** (shows actual execution):
```sql
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM users WHERE email = 'user@example.com';
```
**2. pg_stat_statements** (find slow queries):
```sql
-- Enable extension
CREATE EXTENSION pg_stat_statements;
-- Find slowest queries
SELECT
calls,
mean_exec_time,
total_exec_time,
query
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
```
**3. VACUUM and ANALYZE**:
```sql
-- Update statistics after bulk changes
VACUUM ANALYZE table_name;
```
### MySQL-Specific
**1. EXPLAIN FORMAT=JSON** (detailed output):
```sql
EXPLAIN FORMAT=JSON
SELECT * FROM users WHERE email = 'user@example.com';
```
**2. SHOW PROFILE** (detailed timing):
```sql
SET profiling = 1;
SELECT * FROM users WHERE email = 'user@example.com';
SHOW PROFILE FOR QUERY 1;
```
**3. Query Cache** (deprecated in MySQL 8.0):
```sql
-- Old MySQL versions only
SHOW VARIABLES LIKE 'query_cache%';
```
### SQLite-Specific
**1. EXPLAIN QUERY PLAN**:
```sql
EXPLAIN QUERY PLAN
SELECT * FROM users WHERE email = 'user@example.com';
```
**2. ANALYZE** (update statistics):
```sql
ANALYZE table_name;
```
**3. Indexes are critical** (SQLite has no query optimizer):
```sql
-- SQLite relies heavily on indexes
CREATE INDEX idx_users_email ON users(email);
```
---
## Part 8: Monitoring and Diagnosis
### Find Missing Indexes (PostgreSQL)
```sql
-- Queries doing sequential scans on large tables
SELECT
schemaname,
tablename,
seq_scan,
seq_tup_read,
idx_scan,
seq_tup_read / seq_scan AS avg_seq_read
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 25;
```
### Find Unused Indexes (PostgreSQL)
```sql
-- Indexes that are never used
SELECT
schemaname,
tablename,
indexname,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE '%_pkey' -- Exclude primary keys
ORDER BY pg_relation_size(indexrelid) DESC;
```
### Check Index Bloat
```sql
-- Postgres: Check for bloated indexes
SELECT
schemaname,
tablename,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
ORDER BY pg_relation_size(indexrelid) DESC;
-- Rebuild bloated indexes
REINDEX INDEX index_name;
```
---
## Part 9: Common Patterns by Use Case
### Pattern 1: Recent Records with Pagination
```sql
-- Create composite index
CREATE INDEX idx_posts_created_user ON posts(created_at DESC, user_id);
-- Query uses index efficiently
SELECT * FROM posts
WHERE user_id = 123
ORDER BY created_at DESC
LIMIT 20;
```
### Pattern 2: Search Across Multiple Columns
```sql
-- Create GIN index for full-text search (Postgres)
CREATE INDEX idx_posts_search ON posts
USING gin(to_tsvector('english', title || ' ' || content));
-- Fast full-text search
SELECT * FROM posts
WHERE to_tsvector('english', title || ' ' || content) @@
to_tsquery('english', 'database & optimization');
```
### Pattern 3: Range Queries
```sql
-- Create index on range column
CREATE INDEX idx_orders_created_at ON orders(created_at);
-- Query uses index
SELECT * FROM orders
WHERE created_at BETWEEN '2024-01-01' AND '2024-01-31'
ORDER BY created_at;
```
### Pattern 4: Distinct Values with Filter
```sql
-- Create composite index
CREATE INDEX idx_posts_category_user ON posts(category_id, user_id);
-- Query uses index
SELECT DISTINCT category_id
FROM posts
WHERE user_id = 123;
```
---
## Part 10: Quick Optimization Checklist
**Diagnosis**:
- [ ] Run EXPLAIN ANALYZE on slow queries
- [ ] Check for Seq Scan on tables >10K rows
- [ ] Look for missing indexes
- [ ] Identify N+1 query patterns
- [ ] Check query execution time (aim for <100ms)
**Index Strategy**:
- [ ] Index columns in WHERE clauses
- [ ] Index columns in JOIN conditions
- [ ] Index columns in ORDER BY
- [ ] Use composite indexes for multi-column queries
- [ ] Consider partial indexes for filtered queries
- [ ] Remove unused indexes
**Query Rewriting**:
- [ ] Select only needed columns (not SELECT *)
- [ ] Use JOIN instead of N+1 queries
- [ ] Use proper types (avoid implicit conversion)
- [ ] Avoid functions in WHERE (use functional indexes)
- [ ] Use UNION instead of OR for multiple conditions
- [ ] Use EXISTS instead of IN for large subqueries
**Pagination**:
- [ ] Use keyset pagination (not OFFSET) for large datasets
- [ ] Include stable sort column (id) in pagination
- [ ] Index pagination columns
**Monitoring**:
- [ ] Enable query logging for slow queries
- [ ] Monitor index usage
- [ ] Check for index bloat
- [ ] Regularly VACUUM ANALYZE (Postgres)
---
## Resources
**PostgreSQL**:
- EXPLAIN docs: https://www.postgresql.org/docs/current/sql-explain.html
- Index types: https://www.postgresql.org/docs/current/indexes-types.html
- Performance tips: https://wiki.postgresql.org/wiki/Performance_Optimization
**MySQL**:
- EXPLAIN output: https://dev.mysql.com/doc/refman/8.0/en/explain-output.html
- Optimization: https://dev.mysql.com/doc/refman/8.0/en/optimization.html
**Tools**:
- pgAdmin (Postgres GUI)
- MySQL Workbench
- DBeaver (multi-database)
- explain.depesz.com (Postgres EXPLAIN visualizer)
Attribution
Comments (0)
No comments yet. Be the first to comment!
