Skip to content

Token Savings Benchmark

Real API measurements comparing three methods of having an LLM triage 20 documents. No estimates — every number comes from actual claude-haiku-4-5 API responses.

  • 20 documents across 4 categories: AI/ML, web tech, business, general topics
  • Average document size: ~784 tokens
  • Task: “Find the 3 most relevant documents about AI agent protocols and how they communicate with external tools”
  • Model: Claude Haiku 4.5
  • Date: April 15, 2026
MethodWhat the LLM receives
A: Raw documentsAll 20 full document bodies (~15,000 tokens)
B: ACP frontmatter + deep read20 frontmatter blocks for triage, then 3 full documents for confirmation
C: ACP frontmatter only20 frontmatter blocks only (~200 tokens each)
MethodInput tokensOutput tokensTotal costLatency
A: Raw documents13,682244$0.00374.9s
B: ACP frontmatter + deep read6,323516$0.00227.6s
C: ACP frontmatter only4,415206$0.00143.6s
ComparisonToken reductionCost reductionSpeed
A → B (frontmatter + deep read)51% fewer40% cheaperSlower (2 API calls)
A → C (frontmatter only)67% fewer64% cheaper26% faster

All three methods selected the exact same 3 documents.

Documents selected: MCP overview (doc 1), AI Agent Frameworks (doc 3), REST vs GraphQL vs tRPC (doc 8 — discusses MCP as an agent protocol).

ACP’s structured metadata (summary, tags, entities, classification) provides sufficient signal for accurate triage without reading full documents.

MetricValue
One-time enrichment cost (20 docs)$0.024
Cost per document~$0.001
Savings per triage (A vs C)$0.0024
Break-even11 triages

After 11 triage operations, the enrichment has paid for itself. Every subsequent triage saves $0.0024. For a corpus that gets triaged daily, enrichment ROI is reached in under 2 weeks.

Corpus sizeEnrichment costSavings per triageBreak-even
20 docs$0.024$0.002411 triages
100 docs$0.12$0.01211 triages
1,000 docs$1.20$0.1211 triages
10,000 docs$12.00$1.2011 triages

The break-even ratio is constant — it scales linearly. A 10,000-document knowledge base costs $12 to enrich and saves $1.20 on every triage pass.

ACP enrichment creates a ~200 token frontmatter layer for each document containing:

  • Summary — 2-sentence overview (replaces reading the full body for triage)
  • Tags — keyword classification (enables filtering before reading)
  • Key entities — typed entities with confidence scores (enables structured queries)
  • Classification — content type (reference, tutorial, analysis, etc.)
  • Token counts — exact size, so agents can budget context windows

An agent triaging with frontmatter reads 200 tokens per document instead of 500–5,000 tokens. The accuracy is identical because the metadata captures the essential signals.

Terminal window
git clone https://github.com/atomic-content-protocol/sdk.git
cd sdk && npm install
export ANTHROPIC_API_KEY=sk-ant-...
npx tsx examples/benchmark-token-savings.ts

The benchmark generates 20 documents, enriches them, runs all three methods, and produces a report. Total cost: ~$0.05.