Llms Txt Support

Name: Llms Txt Support
Author: jmagly
Detect and use llms.txt files for LLM-optimized documentation. Use when checking if a site has LLM-ready docs before scraping.
44 stars
0 votes
0 copies
1 views
Added 12/21/2025
data-aigobashgitapidocumentation
Works with

api
Install via CLI
$openskills install jmagly/ai-writing-guide
Files
SKILL.md
---
name: llms-txt-support
description: Detect and use llms.txt files for LLM-optimized documentation. Use when checking if a site has LLM-ready docs before scraping.
tools: Read, Write, WebFetch
---

# llms.txt Support Skill

## Purpose

Single responsibility: Detect, fetch, and utilize llms.txt files that provide LLM-optimized documentation, enabling 10x faster documentation ingestion. (BP-4)

## Background

The llms.txt standard (https://llmstxt.org/) provides a convention for websites to expose LLM-friendly documentation. Instead of scraping entire sites, check for llms.txt first.

**File hierarchy (check in order):**
1. `llms-full.txt` - Complete documentation (largest)
2. `llms.txt` - Standard documentation
3. `llms-small.txt` - Condensed documentation (smallest)

## Grounding Checkpoint (Archetype 1 Mitigation)

Before executing, VERIFY:

- [ ] Base URL is accessible
- [ ] Check all three llms.txt variants in order
- [ ] Validate file content is actual documentation (not error page)
- [ ] Confirm file size is reasonable for the documentation scope

**DO NOT assume llms.txt exists. Always probe first.**

## Uncertainty Escalation (Archetype 2 Mitigation)

ASK USER instead of guessing when:

- Multiple llms.txt variants found - which size to use?
- llms.txt content appears partial or outdated
- File returns but content seems like error page
- Site has llms.txt but content doesn't match expected documentation

**NEVER assume llms.txt quality without verification.**

## Context Scope (Archetype 3 Mitigation)

| Context Type | Included | Excluded |
|--------------|----------|----------|
| RELEVANT | Target base URL, llms.txt content | Full site scraping |
| PERIPHERAL | llms.txt spec reference | Other sites' llms.txt |
| DISTRACTOR | Previous scraping attempts | Unrelated documentation |

## Workflow Steps

### Step 1: Detect llms.txt (Grounding)

```bash
# Check for llms.txt variants (in order of preference)
curl -I https://example.com/llms-full.txt
curl -I https://example.com/llms.txt
curl -I https://example.com/llms-small.txt

# Check common alternate locations
curl -I https://example.com/.well-known/llms.txt
curl -I https://docs.example.com/llms.txt
```

### Step 2: Validate Content

```bash
# Fetch and inspect first 100 lines
curl -s https://example.com/llms.txt | head -100

# Check file size
curl -sI https://example.com/llms.txt | grep -i content-length

# Verify it's not an error page
curl -s https://example.com/llms.txt | grep -i "not found\|error\|404" && echo "WARNING: May be error page"
```

### Step 3: Choose Variant

| Variant | Size | Use Case |
|---------|------|----------|
| `llms-full.txt` | Large (1MB+) | Complete documentation, full API reference |
| `llms.txt` | Medium | Standard use, balanced coverage |
| `llms-small.txt` | Small (<100KB) | Quick reference, limited context windows |

**Decision tree:**
1. If context window is limited → `llms-small.txt`
2. If need complete coverage → `llms-full.txt`
3. Default → `llms.txt`

### Step 4: Fetch and Process

```bash
# Download llms.txt
curl -o docs/llms.txt https://example.com/llms.txt

# Convert to skill format (if using skill-seekers)
skill-seekers scrape --llms-txt docs/llms.txt --name myskill

# Or process manually
# llms.txt is already LLM-optimized markdown
cp docs/llms.txt output/myskill/references/complete.md
```

### Step 5: Validate Output

```bash
# Check content structure
head -50 output/myskill/references/complete.md

# Verify sections
grep "^#" output/myskill/references/complete.md | head -20

# Check for code examples
grep -c '```' output/myskill/references/complete.md
```

## Recovery Protocol (Archetype 4 Mitigation)

On error:

1. **PAUSE** - Note which variant failed
2. **DIAGNOSE** - Check error type:
   - `404 Not Found` → Try next variant or alternate location
   - `403 Forbidden` → May need authentication or user-agent
   - `Timeout` → Retry with longer timeout
   - `Invalid content` → Fall back to traditional scraping
3. **ADAPT** - Try alternate approach
4. **RETRY** - Next variant (max 3 attempts per variant)
5. **ESCALATE** - Inform user llms.txt unavailable, suggest scraping

## Checkpoint Support

State saved to: `.aiwg/working/checkpoints/llms-txt-support/`

```
checkpoints/llms-txt-support/
├── detection_results.json    # Which variants found
├── selected_variant.txt      # Which was chosen
└── content_hash.txt          # For cache validation
```

## llms.txt Format Reference

Standard llms.txt structure:

```markdown
# Project Name

> Brief description of the project

## Overview
[High-level explanation]

## Installation
[Setup instructions]

## Quick Start
[Getting started guide]

## API Reference
[Detailed API documentation]

## Examples
[Code examples]

## FAQ
[Common questions]
```

## Detection Results Output

```json
{
  "base_url": "https://example.com",
  "detected": {
    "llms-full.txt": {
      "found": true,
      "url": "https://example.com/llms-full.txt",
      "size": 1523456,
      "last_modified": "2025-01-15T10:30:00Z"
    },
    "llms.txt": {
      "found": true,
      "url": "https://example.com/llms.txt",
      "size": 245678,
      "last_modified": "2025-01-15T10:30:00Z"
    },
    "llms-small.txt": {
      "found": false
    }
  },
  "recommended": "llms.txt",
  "reason": "Standard size, good for most use cases"
}
```

## Known Sites with llms.txt

Sites known to support llms.txt (verify before use):

- Anthropic documentation
- Many modern API documentation sites
- Framework documentation following the standard

**Always verify - this list may be outdated.**

## Troubleshooting

| Issue | Diagnosis | Solution |
|-------|-----------|----------|
| No llms.txt found | Site doesn't support | Fall back to doc-scraper |
| Content seems wrong | Error page or redirect | Check actual content, verify URL |
| File too large | llms-full.txt overwhelming | Use llms.txt or llms-small.txt |
| Outdated content | llms.txt not maintained | Consider scraping + llms.txt merge |

## Integration with doc-scraper

If llms.txt is incomplete or outdated, combine approaches:

```bash
# 1. Fetch llms.txt as base
curl -o base.md https://example.com/llms.txt

# 2. Scrape for additional/updated content
skill-seekers scrape --config config.json --skip-covered-by base.md

# 3. Merge results
# llms.txt provides structure, scraping fills gaps
```

## References

- llms.txt Standard: https://llmstxt.org/
- Skill Seekers llms.txt Detection: https://github.com/jmagly/Skill_Seekers/blob/main/docs/LLMS_TXT_SUPPORT.md
- REF-001: Production-Grade Agentic Workflows (BP-4, BP-9)
- REF-002: LLM Failure Modes (Archetype 1-4 mitigations)
Llms Txt Support

Works with

Attribution

Comments (0)