Best Practices

Controlling Claude Code Costs: A Budget-First Approach

Practical strategies to keep Claude Code automation costs under control—without sacrificing capability.

O
OpenHelm Team· Engineering
··10 min read
Controlling Claude Code Costs: A Budget-First Approach

Claude Code is powerful precisely because it's agentic—it reads your codebase, runs commands, learns from feedback, and iterates toward a goal. But power has a cost. An open-ended goal, a large codebase, a session that hangs: these are all ways a Claude Code task becomes expensive quickly.

The key insight is that most of the cost isn't in the successful runs—it's in the failures and inefficiencies. A goal so vague it needs three iterations instead of one. A codebase so large Claude Code spends 30,000 tokens just reading files. A hanging session that runs for four hours when it should have taken twenty minutes.

This is a practical guide to keeping costs reasonable. Not by being cheap with capability, but by being intentional about scope, goals, and failure modes.

How Claude Code Costs Are Calculated

Claude Code billing follows the Anthropic API model: you pay per token, for input and output. Input tokens (code it reads, context it gets) are cheaper than output tokens (responses it generates).

A typical Claude Code session costs money in three places:

  1. Initial context. Reading your codebase to understand what it's working with. A 50,000-line codebase with good structure: ~20,000 tokens. The same codebase with poor structure or all in one file: ~40,000 tokens.
  1. Reasoning and iteration. Running commands, getting output, deciding what to do next. Each iteration costs tokens for input (the command output) and output (the next action).
  1. Failure recovery. A misunderstanding, a wrong approach, something unexpected. The session continues, exploring alternatives. Unbounded exploration is expensive.

A typical 30-minute session that succeeds cleanly costs £2–4. A 30-minute session that fails, backtracks, and retries costs £8–15. A 4-hour session that should have been 30 minutes costs £40–60.

The economic insight: most of the cost is variance, not baseline. Controlling variance is the highest-leverage place to focus.

Strategy 1: Be Specific About Scope

The single biggest cost lever is codebase size. Claude Code reads code to understand what it's working with. Smaller codebases are cheaper to understand, faster to navigate, and more likely to succeed on the first try.

Instead of:

Goal: Improve the authentication system

Try:

Goal: Add input validation to the three functions in src/auth/handlers.ts (login, register, logout). Ensure all existing tests pass. Commit the changes.

The second is more expensive to read (you specified the file), but dramatically cheaper overall because Claude Code isn't wasting time understanding modules you didn't ask it to touch.

Quantified benefit: Scoping a 200,000-line codebase to a single module reduces initial context reading by 60–70%. On a task that iterates 3–4 times, that's often the difference between a £4 task and a £12 task.

Strategy 2: Success Criteria, Not Open-Ended Goals

Open-ended goals are the silent killer of API budgets.

"Refactor the API layer"

This has no clear finish line. Claude Code will iterate, exploring optimisations, debating structure, trying ideas. After 15 iterations, it might still think there's more to improve. The cost is linear in the number of iterations.

Instead:

"Consolidate the three separate database connection functions (connectPrimary, connectReplica, connectCache) into a single parameterised function. Ensure all existing tests pass. Done when npm test exits with code 0."

Clear success criterion: tests passing. Clear scope: three functions. Clear finish line: test output.

Quantified benefit: Tasks with explicit success criteria usually finish in 2–3 iterations. Open-ended tasks average 6–8 iterations. That's a 3–4x difference in cost.

Strategy 3: Silence Detection (Or Timeouts)

A session that hangs—stuck waiting for input, in an infinite loop, networking timeout—is an uncontrolled cost. It'll run until your API credits are exhausted or the system forcibly kills it.

OpenHelm includes silence detection: if no output is generated for 10 minutes, the run stops. This is a hard cost ceiling. An OpenHelm task cannot cost more than the cost of 10 minutes of inactivity plus the cost of whatever it was actually doing.

If you're not using OpenHelm, set a timeout at the OS level:

timeout 1800 claude-code < prompt.txt  # kills after 30 minutes

Or if you're using a cloud scheduler, set a function timeout (Lambda, Cloud Functions, etc. all support this).

Quantified benefit: Without silence detection, a hanging session can cost 100x more than expected. With it, your cost is bounded.

Strategy 4: Codebase Structure Matters

The way you organise your code affects how expensive Claude Code is to reason about.

Well-structured codebases with clear separation of concerns: Claude Code reads the relevant module, understands it, and acts. It doesn't spend tokens exploring dead ends.

Poorly-structured codebases with circular dependencies, unclear naming, or all code in one file: Claude Code spends time reading and rereading, trying to construct a mental model.

You can't always refactor the codebase just to save on Claude Code costs. But it's worth knowing that this is one lever. If you're running the same automation repeatedly and costs are creeping up, codebase structure might be the reason.

Strategy 5: Pilot Before You Automate

The most expensive Claude Code runs are the ones you schedule without testing.

Test any goal you're planning to schedule manually first, in the same environment, with the same codebase state. Observe:

  • How long does it take?
  • What's the actual cost (check Anthropic Console)?
  • Does it hang anywhere?
  • Does it make mistakes?

Once you've tested it manually and observed the behaviour, you can schedule it with confidence and accurate cost forecasting.

Quantified benefit: A manually tested goal that succeeds on schedule costs the expected amount. An untested goal that fails and needs retry costs 3–4x more. Manual testing costs 1 hour of developer time, saves £20–50 in unnecessary API costs.

Strategy 6: Automated Retry with Context

When a scheduled Claude Code task fails, OpenHelm can automatically retry, passing the failure output back as context. This increases the chance of eventual success without human intervention.

Initial run fails: "Database connection timeout"

>

Automatic retry (OpenHelm): Same goal, but includes the failure output in the context, so Claude Code tries a different approach (maybe with retry logic, or a different connection string, or waiting for the service to be healthy).

This is cheaper than human debugging because it happens immediately, within the same session. The alternative (human discovers failure in the morning, manually retries) is slower and more expensive.

A Worked Example: Daily Linting Job

A team wants to automate daily linting across their codebase:

Initial approach (expensive):

Goal: Run ESLint and fix all violations.

Cost per run: £15–25 (varies wildly; sometimes it gets stuck exploring "improvements").

Optimised approach:

Goal: Run npm run lint against src/ only. For each ESLint violation reported, apply the recommended fix. Run npm test. If tests pass, commit. If tests fail, report which test failed and why, and do not commit.

Cost per run: £3–5 (specific scope, clear success criterion—tests passing or reporting failure).

With scheduled runs: They schedule this daily via OpenHelm at 2 AM.

Weekly cost: £15–25 (5 successful runs × £3–5 each).

Alternative without optimisation: £75–125 (5 runs × £15–25 each).

Savings: £50–100/week, or ~£2,500/year, achieved entirely through scope reduction and clear success criteria. No loss of capability; just intentionality.

FAQ

Can I monitor costs in real-time?

Yes. The Anthropic Console shows per-token costs for each session. Check it weekly when you're starting out—it builds intuition for what different tasks cost. Once you have a few runs, you'll calibrate expectations and can check less frequently.

What's the difference between input and output token costs?

Input tokens (code Claude Code reads) are cheaper per token than output tokens (responses it generates). This is why codebase scoping matters: smaller initial context = cheaper input tokens. But it also means iteration cost is the dominant factor—each iteration generates output tokens, and that's where real expense lives.

Should I use gpt-4 instead of Claude to save money?

GPT-4 tokens are more expensive per unit. But Claude Code uses Claude, not GPT-4. The choice isn't really yours—Claude Code is built on Claude. If you're comparing cost to a different workflow, make sure you're comparing apples to apples.

What's a reasonable budget for scheduled Claude Code automation?

For a solo developer running 1–2 nightly tasks: £5–15/month is reasonable. For a team running 5–10 scheduled jobs per day: £100–300/month is typical. It depends entirely on your task scope and frequency. Monitor your actual spend and adjust.

Is silence detection available without using OpenHelm?

OpenHelm includes silence detection built-in. For other approaches (cron, systemd, cloud schedulers), you can implement timeouts at the OS or function level. It's not as elegant as OpenHelm's output-stream monitoring, but it prevents runaway sessions.

More from the blog