Does Caveman Mode Actually Work? I Benchmarked It (Properly This Time).

TL;DR: Caveman is a Claude plugin that adds a ~700-token system prompt telling the model to talk tersely. I benchmarked it with n=3 runs × 15 prompts × 6 conditions × 2 models via the direct Anthropic API (540+ calls). Then I QA’d my own analysis and found it was riddled with methodology errors. Here’s the actually-honest story. Task type Caveman vs terse (“Answer concisely.”) Statistical significance Short-answer, 4-7 (math, code Q&A) +2.5% to +10% (caveman costs MORE) Not significant Short-answer, 4-6 -3% to -16% Not significant Long-form, 4-7 (tutorials, docs) -5% to +10% (caveman barely helps or hurts) Not significant Long-form, 4-6 -55% to -59% Large effect; paired t-test p≈0.012-0.028 (n=5 prompts) Caveman’s effect is concentrated almost entirely in one cell: long-form generation on claude-opus-4-6. Everywhere else it’s either noise or a slight cost increase. ...

April 20, 2026 · 9 min · npow

We Benchmarked MCP Against Code Generation. MCP Won (Mostly).

TL;DR: For a small, well-designed API (10 tools), structured MCP tool calls consistently outscore code generation on correctness — 0.99 vs 0.97 — and the gap concentrates in tasks where domain-specific logic matters. Adding a reference document to MCP tools costs 6–14% more tokens with zero accuracy gain. Cloudflare’s search+execute pattern matches MCP accuracy but uses more tokens. With MCP tools, Haiku is within 1% of Opus at 1/12 the cost. ...

March 18, 2026 · 12 min · npow

Finding the Human in the Machine

50+ open-source orgs are rebuilding how they evaluate contributors. Here’s what’s emerging.

March 6, 2026 · 8 min · npow

Hidden Technical Debt in Agentic Systems

Agentic systems are replaying hidden technical debt from early ML, and the missing control plane is where the biggest risk accumulates.

March 5, 2026 · 8 min · npow

The Workflow Orchestration Landscape — March 2026

A comparative map of workflow orchestration platforms as of March 2026, covering execution maturity, product vision, and market positioning.

March 4, 2026 · 13 min · npow