Split large documentation (10K+ pages) into focused sub-skills with intelligent routing. Use for massive doc sites like Godot, AWS, or MSDN.
Install via CLI
openskills install jmagly/ai-writing-guide---
name: doc-splitter
description: Split large documentation (10K+ pages) into focused sub-skills with intelligent routing. Use for massive doc sites like Godot, AWS, or MSDN.
tools: Read, Write, Bash, Glob
---
# Documentation Splitter Skill
## Purpose
Single responsibility: Split large documentation sites into multiple focused sub-skills with an optional router skill for intelligent navigation. (BP-4)
## Grounding Checkpoint (Archetype 1 Mitigation)
Before executing, VERIFY:
- [ ] Total page count is known (run estimation first)
- [ ] Documentation categories are identifiable
- [ ] Target skill size determined (default: 5,000 pages per skill)
- [ ] Router strategy selected (category, size, or hybrid)
**DO NOT split without understanding documentation structure.**
## Uncertainty Escalation (Archetype 2 Mitigation)
ASK USER instead of guessing when:
- Category boundaries unclear
- Optimal skill size uncertain for target use case
- Cross-references between sections complicate splitting
- Router vs flat structure decision needed
**NEVER arbitrarily split - seek user guidance on boundaries.**
## Context Scope (Archetype 3 Mitigation)
| Context Type | Included | Excluded |
|--------------|----------|----------|
| RELEVANT | Doc structure, categories, page counts | Actual page content |
| PERIPHERAL | Similar large doc examples | Other documentation |
| DISTRACTOR | Content quality concerns | Individual page issues |
## Size Guidelines
| Documentation Size | Recommendation | Strategy |
|-------------------|----------------|----------|
| < 5,000 pages | One skill | No splitting |
| 5,000 - 10,000 pages | Consider splitting | Category-based |
| 10,000 - 30,000 pages | Recommended | Router + Categories |
| 30,000+ pages | Strongly recommended | Router + Categories |
## Workflow Steps
### Step 1: Estimate Documentation Size (Grounding)
```bash
# Quick estimation with skill-seekers
skill-seekers estimate configs/large-docs.json
# Output:
# 📊 ESTIMATION RESULTS
# ✅ Pages Discovered: 28,450
# 📈 Estimated Total: 32,000
# ⏱️ Time Elapsed: 2.1 minutes
# 💡 Recommended: Split into 6-7 sub-skills
```
### Step 2: Analyze Category Structure
```bash
# Identify natural category boundaries
skill-seekers analyze --config configs/large-docs.json --categories
# Output:
# Categories detected:
# - scripting: 8,200 pages
# - 2d: 5,400 pages
# - 3d: 9,100 pages
# - physics: 4,300 pages
# - networking: 2,800 pages
# - editor: 2,200 pages
```
### Step 3: Choose Split Strategy
| Strategy | Best For | Description |
|----------|----------|-------------|
| `category` | Clear topic divisions | Split by documentation sections |
| `size` | Uniform distribution | Split every N pages |
| `router` | User navigation | Hub skill + specialized sub-skills |
| `hybrid` | Complex docs | Categories + size limits per category |
### Step 4: Execute Split
**Option A: With skill-seekers**
```bash
# Category-based split
skill-seekers split --config configs/godot.json --strategy category
# Router-based split (recommended for large docs)
skill-seekers split --config configs/godot.json --strategy router
# Size-based split
skill-seekers split --config configs/godot.json --strategy size --pages-per-skill 5000
```
**Option B: Manual split configuration**
```json
{
"name": "godot",
"max_pages": 40000,
"split_strategy": "router",
"split_config": {
"target_pages_per_skill": 5000,
"create_router": true,
"categories": {
"scripting": {
"patterns": ["/scripting/", "/gdscript/", "/c_sharp/"],
"max_pages": 8000
},
"2d": {
"patterns": ["/2d/", "/sprite/", "/tilemap/"],
"max_pages": 6000
},
"3d": {
"patterns": ["/3d/", "/mesh/", "/spatial/"],
"max_pages": 10000
},
"physics": {
"patterns": ["/physics/", "/collision/", "/rigidbody/"],
"max_pages": 5000
}
}
}
}
```
### Step 5: Scrape Sub-Skills
```bash
# Scrape all sub-skills in parallel
for config in configs/godot-*.json; do
skill-seekers scrape --config $config &
done
wait
# Or sequentially with progress
for config in configs/godot-*.json; do
echo "Processing: $config"
skill-seekers scrape --config $config
done
```
### Step 6: Generate Router Skill
```bash
# Auto-generate router from sub-skills
skill-seekers generate-router configs/godot-*.json
# Creates godot-router skill that intelligently routes queries
```
### Step 7: Validate Split Results
```bash
# Check sub-skill sizes
for dir in output/godot-*/; do
echo "$dir: $(find $dir -name "*.md" | wc -l) files"
done
# Verify router coverage
cat output/godot-router/SKILL.md | grep -A 50 "## Sub-Skills"
```
## Recovery Protocol (Archetype 4 Mitigation)
On error:
1. **PAUSE** - Note which sub-skill failed
2. **DIAGNOSE** - Check error type:
- `Category overlap` → Refine URL patterns
- `Uneven split` → Adjust page limits
- `Orphan pages` → Add catch-all category
- `Router incomplete` → Regenerate after all sub-skills done
3. **ADAPT** - Modify split configuration
4. **RETRY** - Re-split affected category (max 3 attempts)
5. **ESCALATE** - Present split preview, ask user for boundary adjustments
## Checkpoint Support
State saved to: `.aiwg/working/checkpoints/doc-splitter/`
```
checkpoints/doc-splitter/
├── estimation.json # Page count results
├── category_analysis.json # Category breakdown
├── split_plan.json # Planned split configuration
├── progress/
│ ├── godot-scripting.json
│ ├── godot-2d.json
│ └── ...
└── router_draft.md # Router skill draft
```
## Output Structure
After splitting large documentation:
```
configs/
├── godot.json # Original config
├── godot-scripting.json # Generated sub-config
├── godot-2d.json
├── godot-3d.json
├── godot-physics.json
└── godot-router.json # Router config
output/
├── godot-scripting/ # Sub-skill
│ ├── SKILL.md
│ └── references/
├── godot-2d/ # Sub-skill
├── godot-3d/ # Sub-skill
├── godot-physics/ # Sub-skill
└── godot-router/ # Router skill
├── SKILL.md # Routing logic
└── references/
└── routing-table.md
```
## Router Skill Structure
The generated router skill:
```markdown
# Godot Documentation Router
## Purpose
Route queries to the appropriate specialized Godot sub-skill.
## Sub-Skills
| Topic | Skill | Coverage |
|-------|-------|----------|
| GDScript, C#, scripting patterns | godot-scripting | 8,200 pages |
| 2D graphics, sprites, tilemaps | godot-2d | 5,400 pages |
| 3D graphics, meshes, materials | godot-3d | 9,100 pages |
| Physics, collisions, rigid bodies | godot-physics | 4,300 pages |
## Routing Rules
1. **Scripting questions** → godot-scripting
- Keywords: script, gdscript, c#, function, variable, class
2. **2D graphics questions** → godot-2d
- Keywords: sprite, 2d, tilemap, animation2d, canvas
3. **3D graphics questions** → godot-3d
- Keywords: mesh, 3d, spatial, material, shader, camera3d
4. **Physics questions** → godot-physics
- Keywords: physics, collision, rigidbody, area, raycast
## Usage
Ask your question naturally. This router will direct you to the appropriate specialized skill.
Example:
- "How do I create a player movement script?" → godot-scripting
- "How do I set up tilemap collisions?" → godot-2d
- "How do I apply materials to a mesh?" → godot-3d
```
## Troubleshooting
| Issue | Diagnosis | Solution |
|-------|-----------|----------|
| Uneven splits | Category size varies | Use hybrid strategy with max_pages |
| Orphan pages | URL patterns incomplete | Add catch-all or refine patterns |
| Router confusion | Overlapping keywords | Make routing rules more specific |
| Too many skills | Over-segmented | Merge related categories |
## References
- Skill Seekers Large Documentation: https://github.com/jmagly/Skill_Seekers/blob/main/docs/LARGE_DOCUMENTATION.md
- REF-001: Production-Grade Agentic Workflows (BP-4, BP-9 KISS)
- REF-002: LLM Failure Modes (Archetype 3 context filtering, Archetype 4 recovery)
No comments yet. Be the first to comment!