Cost Evidence¶
This page presents cost data from a controlled comparison between raw Claude Code and Lore on the same task. All conditions have N=10 runs. Platform: Claude Code (CLI).
If you're skeptical, good. Read the limitations first, then run the test yourself.
Methodology¶
Task: Determine whether current inventory can fill all orders from last week. Two mock APIs (orders and inventory) with deliberate gotchas — version traps, required headers, non-obvious query parameters.
Five conditions, same task, same day:
| Condition | N | Framework | Orchestrator | Workers | Prior Knowledge |
|---|---|---|---|---|---|
| Raw Cold | 10 | None | Opus 4.6 | — (inline) | None |
| Lore Cold | 10 | Lore v0.11.0 | Opus 4.6 | Haiku 4.5 | None |
| Lore Warm | 10 | Lore v0.11.0 | Opus 4.6 | Haiku 4.5 | Skills + env docs from cold |
| Lore Hot | 10 | Lore v0.11.0 | Opus 4.6 | varies | Skills + env docs, writes runbook |
| Lore Runbook | 10 | Lore v0.11.0 | Opus 4.6 | Haiku 4.5 | Full knowledge + runbook |
Each Lore condition ran on a separate instance (lore-01 through lore-10), one session per condition per instance. "Cold" = no prior knowledge. "Warm" = knowledge captured from cold. "Hot" = knowledge from cold+warm, writes a runbook. "Runbook" = full knowledge stack, pure execution.
Pricing basis: Cache-aware API rates. Tests ran on Claude Max (flat-rate subscription) — cost figures represent API-equivalent spend.
| Model | Input | Output | Cache Read | Cache Create |
|---|---|---|---|---|
| Opus 4.6 | $5.00/MTok | $25.00/MTok | $0.50/MTok | $6.25/MTok |
| Haiku 4.5 | $1.00/MTok | $5.00/MTok | $0.10/MTok | $1.25/MTok |
Source: Anthropic API Pricing
Timing: Execution time from durationMs — Claude Code's per-turn active processing timer, which measures actual model computation excluding idle time between turns.
Cost split: Each session's cost is split into "getting the answer" (API exploration, reasoning, final output) and "capture" (writing skills, environment docs, runbooks). This isolates the one-time knowledge investment from the recurring answer cost.
Results¶
Summary Table¶
| Condition | N | Answer (median) | Capture (mean) | Total (median) | vs Raw Cold | Answer Time | Capture Time |
|---|---|---|---|---|---|---|---|
| Raw Cold | 10 | $0.4532 | — | $0.4532 | — | 1m 12s | — |
| Lore Cold | 10 | $0.3226 | $0.0814 | $0.3941 | -29% | 1m 22s | 0m 35s |
| Lore Warm | 10 | $0.2429 | $0.0087 | $0.2429 | -46% | 1m 4s | — |
| Lore Hot | 10 | $0.2446 | $0.0795 | $0.3389 | -46% | 0m 57s | 0m 37s * |
| Lore Runbook | 10 | $0.1870 | — | $0.1870 | -59% | 0m 40s | — |
* Lore Hot capture time is operator-initiated runbook creation — the operator explicitly asked the agent to write a runbook after getting the answer. This is not automatic framework behavior.
Progression: Each Knowledge Layer Reduces Cost¶
| Transition | Answer (median) | Reduction | What Changed |
|---|---|---|---|
| Raw Cold → Lore Cold | $0.45 → $0.32 | -29% | Delegation to Haiku workers |
| Lore Cold → Lore Warm | $0.32 → $0.24 | -25% | Skills + env docs eliminate API discovery |
| Lore Warm → Lore Runbook | $0.24 → $0.19 | -23% | Runbook provides step-by-step procedure |
| Raw Cold → Lore Runbook | $0.45 → $0.19 | -59% | Full stack: delegation + knowledge + runbook |
Key Findings¶
Delegation alone is cheaper. On a cold run with zero prior knowledge, the answer-only median ($0.32) is 29% cheaper than raw Claude ($0.45). The savings come from routing API exploration to Haiku workers ($0.10/MTok cache read) instead of running it inline on Opus ($0.50/MTok cache read). No knowledge base needed.
Knowledge compounds. Each layer — skills, environment docs, runbook — removes another chunk of exploration. By the runbook stage, workers don't explore at all. They execute known procedures with known endpoints.
The steady state is 59% cheaper and 44% faster. Lore Runbook ($0.19 median, 0m 40s) vs Raw Cold ($0.45 median, 1m 12s). This is the cost of a task after knowledge has been captured — which is the normal operating mode after initial setup.
Runbook is the most predictable. Standard deviation drops from $0.11 (raw cold) to $0.04 (runbook). The cost range tightens from 2.0x ($0.30–$0.61) to 1.9x ($0.14–$0.27). Structured knowledge reduces both cost and variance.
Capture ROI¶
Capture is a one-time investment that pays off on every subsequent run.
| Phase | Mean Capture Cost | What Was Captured |
|---|---|---|
| Cold | $0.0814 | Skills (API gotchas), environment docs (endpoints, params, headers) |
| Warm | $0.0087 | Minor environment doc updates (1 of 10 sessions) |
| Hot | $0.0795 | Runbook (step-by-step procedure for the full task) |
| Total investment | $0.17 |
| Metric | Value |
|---|---|
| Steady-state savings per run (runbook vs raw cold) | $0.27 |
| Break-even | 1st runbook run (saves $0.27, cost $0.17 to capture) |
| Net savings after 5 runs | $1.16 |
| Net savings after 10 runs | $2.49 |
The capture investment pays for itself on the very first reuse.
Where the Savings Come From¶
| Mechanism | First Run | Steady State | Impact |
|---|---|---|---|
| Model tiering | Yes | Yes | Workers on Haiku ($0.10/MTok cache read) instead of Opus ($0.50/MTok) — 5x cheaper per token |
| Context isolation | Yes | Yes | Workers run in fresh contexts instead of accumulating in one growing Opus thread |
| Knowledge reuse | No | Yes | Workers skip exploration — endpoints and gotchas are documented |
| Runbook procedure | No | Yes | Orchestrator follows a known procedure instead of reasoning from scratch |
Model tiering and context isolation are structural — they apply from the first run. Knowledge reuse and runbook procedures compound over time.
For how these mechanisms work, see How Delegation Works and Configuration.
Limitations¶
Read before citing these numbers
- One task type. API exploration with mock services. Results may differ for code generation, refactoring, debugging, or other task patterns.
- One platform. Tested on Claude Code (CLI) with Claude models (Opus 4.6 orchestrator, Haiku 4.5 workers). Other platforms and model combinations may produce different results.
- Subscription pricing. Tests ran on Claude Max (flat-rate). Cost figures are API-equivalent estimates, relevant for understanding relative efficiency.
- No cross-tool validation. Knowledge captured on Claude Code should benefit Cursor and OpenCode sessions — not yet tested.
Reproduce It¶
The test services and verification scripts are open source:
git clone https://github.com/lorehq/lore-gotcha-demo.git
cd lore-gotcha-demo
# Start services
node orders-service.js &
node inventory-service.js &
# Run the test prompt on raw Claude Code
# Then install Lore and run the same prompt
npx create-lore
# Verify results match ground truth
bash verify-fulfillment.sh
See the repo README for full setup, ground truth values, and the verification scripts.