Academy

Controlling AI Coding Costs: Budget Management for Long-Running Jobs

Stop paying for AI agent runaway costs. Practical strategies for monitoring, limiting, and optimising Claude Code spending.

O
OpenHelm Team· Product
··8 min read
Controlling AI Coding Costs: Budget Management for Long-Running Jobs

Your first overnight Claude Code job runs perfectly. Your second one costs £40. By the third, you're getting a bill that makes you uncomfortable.

The problem: Claude Code is an agent. It iterates. Each iteration costs tokens. An unattended, unsupervised agent with no spending guardrails can rack up API costs that surprise you the next morning.

This guide is about preventing that surprise.

Why AI Coding Agent Costs Spiral

Claude Code costs are straightforward—you pay per token, just like any Anthropic API usage. The issue is volume. A single interactive session where you guide Claude Code for 20 minutes might cost £2–5. But an unattended overnight job where Claude Code loops, backtracks, retries, and explores without human judgment? That same 20-minute effort in unattended mode can cost £20–50.

Three factors drive this:

1. Iteration loops. When Claude Code gets stuck, it doesn't know it's stuck. It backtracks, tries a different approach, reads files again, re-examines the error. Each iteration reads the entire codebase context and makes a new attempt. This multiplies token costs quickly.

2. No human in the loop. An interactive session has you watching. You'll stop Claude Code if it's clearly going in circles. An unattended session has no observer, so it keeps going until a hard limit (turn count, timeout, silence detection) stops it.

3. Inefficient prompts. A vague goal like "improve code quality" has no clear stopping condition, so Claude Code keeps iterating indefinitely. Tight, specific prompts cost less because they have a defined end state.

Method 1: Prompt Engineering for Cost Control

The cheapest cost control is a well-written prompt. Tight prompts complete faster and cost less.

Vague vs. Specific Prompts

Vague: "Refactor the authentication module to be more modern."

  • Cost: £15–30 (open-ended, unclear stopping condition)

Specific: "In src/auth.ts, replace the callback-based login function with async/await. Keep the function signature the same. Run tests after changes. Stop when all tests pass."

  • Cost: £3–8 (clear scope, defined success condition)

The specific prompt costs a quarter as much because Claude Code knows when it's done. There's no second-guessing, no "maybe I should refactor this too", no endless iteration.

Rules for Budget-Conscious Prompts

Include a stopping condition: "Run the test suite. Fix any failing tests. Stop once all tests pass."

Define scope explicitly: "Modify only src/api/handlers.ts. Do not touch other files."

Specify the number of turns: "Attempt this task using a maximum of 8 conversation turns."

Describe what done looks like: "Success is a passing test suite and a commit message describing the changes."

Method 2: Hard Limits with Flags

Claude Code has built-in flags that enforce hard limits regardless of prompt complexity.

--max-turns

Limits the number of conversation turns (back-and-forths) Claude Code can make:

claude -p "Upgrade all npm packages" --project /app --max-turns 8

This stops Claude Code after 8 turns, even if it's mid-task. Reasonable defaults:

  • Simple tasks (add a test): 3–5 turns
  • Medium tasks (refactor a function): 8–12 turns
  • Complex tasks (upgrade dependencies): 15–20 turns

Unix timeout Command

Kills the process after a specified time, unconditionally:

timeout 1800 claude -p "Your goal" --project /app

This kills the job after 30 minutes (1800 seconds). It's blunt—doesn't gracefully finish, just terminates—but it guarantees no runaway spending.

Method 3: Silence Detection

If a Claude Code session produces no output for a set time, something's wrong. It's usually hung or stuck in an infinite loop. Silence detection stops the job automatically.

OpenHelm includes built-in silence detection (10 minutes of no output = stop). If you're using cron or GitHub Actions, you can implement a wrapper:

#!/bin/bash
claude -p "Your goal" --project /app > output.log 2>&1 &
PID=$!

last_size=0
idle_count=0

while kill -0 $PID 2>/dev/null; do
  current_size=$(wc -c < output.log)

  if [ "$current_size" -eq "$last_size" ]; then
    ((idle_count++))
    if [ $idle_count -ge 6 ]; then  # 6 x 10 seconds = 60 seconds of silence
      kill $PID
      echo "Killed due to silence timeout"
      break
    fi
  else
    idle_count=0
    last_size=$current_size
  fi

  sleep 10
done

This checks if the output file is growing every 10 seconds. If nothing's been written for 60 seconds, it kills the process.

Method 4: Monitoring and Alerts

Even with controls in place, monitoring actual spend is important. The Anthropic Console shows API usage:

  1. Log into console.anthropic.com
  2. Navigate to "Usage"
  3. Check token counts and estimated costs per session

Budget alerts:

  • Weekly check: See if your spending matches expectations
  • Daily check (if running overnight jobs): Spot anomalies before they appear on a monthly bill
  • Per-session check: After running a new type of job, check how much it cost to calibrate your expectations

Real-World Cost Examples

Here's what actual spending looks like for typical Claude Code tasks on the Anthropic API:

TaskScopeTurnsTokensCost
Add unit testsSingle function48,200£0.41
Fix failing tests3 test failures612,500£0.63
Upgrade npm packagesFull monorepo928,300£1.42
Refactor auth module800-line module1242,000£2.10
Overnight cost control auditFull codebase1568,000£3.40

These are real numbers from actual developer workflows. Overnight jobs typically cost £2–8. If you're seeing £30+ for an overnight job, something's wrong—likely an unscoped goal or an infinite loop.

FAQ

Q: Can I set a monthly budget cap on Anthropic API?

A: Not directly. You can set up billing alerts in the Anthropic Console, but Anthropic doesn't hard-cap spend. You have to monitor it actively.

Q: Does Claude Code ever intentionally overspend?

A: No. It just tries to solve the problem. A vague problem is infinitely explorable, which is why specific prompts are so important.

Q: What if a job hits a cost limit before finishing?

A: Use --max-turns rather than hoping for a cost cap. It's more predictable. Once the turn limit hits, Claude Code stops and exits cleanly.

Q: Is it cheaper to run Claude Code during off-peak hours?

A: No. Anthropic pricing doesn't vary by time of day. Costs are per-token, flat rate, always.

More from the blog