How is LLM API cost calculated?

Most providers bill per token, with separate rates for input (your prompt) and output (the model’s response). Cost per request = (input tokens ÷ 1,000,000 × input price) + (output tokens ÷ 1,000,000 × output price). Multiply by your monthly request volume for a monthly estimate.

Why is output usually more expensive than input?

Generating tokens is more compute-intensive than reading them, so output is typically 3–5× the input price. On reasoning models the output figure also includes the model’s internal thinking tokens.

Are these the exact prices I will pay?

They are current published list prices, but real bills vary. Prompt caching can cut repeated input by ~90%, batch APIs often give ~50% off, and enterprise agreements differ. Use the calculator for planning and confirm against each provider’s pricing page.

Free tool · no signup

LLM API Cost Calculator

Work out what an LLM API workload will cost. Enter input tokens, output tokens and monthly volume, then compare price per request and per month across current Claude, GPT and Gemini models.

Input tokens / request

Output tokens / request

Requests / month

At this volume, GPT-4.1 nano is the cheapest at $4.00/mo, about 113× less than Claude Fable 5 ($450.00/mo).

Model	In $/M	Out $/M	Per request	Per month
GPT-4.1 nanoOpenAI	$0.1	$0.4	$0.00040	$4.00
GPT-4.1 miniOpenAI	$0.4	$1.6	$0.00160	$16.00
Gemini 2.5 FlashGoogle	$0.3	$2.5	$0.00185	$18.50
Claude Haiku 4.5Anthropic	$1	$5	$0.00450	$45.00
Gemini 2.5 ProGoogle	$1.25	$10	$0.00750	$75.00
GPT-4.1OpenAI	$2	$8	$0.00800	$80.00
GPT-5.4OpenAI	$2.5	$15	$0.0125	$125.00
Claude Sonnet 4.6Anthropic	$3	$15	$0.0135	$135.00
Claude Opus 4.8Anthropic	$5	$25	$0.0225	$225.00
GPT-5.5OpenAI	$5	$30	$0.0250	$250.00
Claude Fable 5Anthropic	$10	$50	$0.0450	$450.00

Estimates use list prices per 1M tokens, last verified June 2026. Cached input, batch and volume discounts are not applied. Always confirm against the provider:

Anthropic pricing ↗OpenAI pricing ↗Google pricing ↗

This LLM API cost calculator estimates and compares what a workload will cost across current Claude, GPT and Gemini models. Enter the input tokens, output tokens and monthly request volume, and see the price per request and per month for each model side by side.

It’s a fast way to sanity-check an AI feature’s unit economics before you build, and to spot when a cheaper model would do the job for a fraction of the price.

How LLM API pricing works

Almost every provider bills per token, with separate rates for input (your prompt) and output (the model’s response). Cost per request is (input tokens ÷ 1,000,000 × input price) + (output tokens ÷ 1,000,000 × output price). Output is usually three to five times more expensive than input because generating text is more compute-intensive than reading it.

On reasoning models, the output figure also includes the model’s internal thinking tokens, so verbose reasoning can quietly inflate a bill.

How to estimate your monthly cost

Start with a realistic average prompt and response size, not the maximum. Multiply the per-request cost by your expected monthly volume. If your prompts vary a lot, run the calculator twice, once for a typical request and once for a heavy one, to bracket the range.

Tips to reduce API costs

Prompt caching can cut the cost of repeated context (system prompts, long documents) by around 90%. Batch APIs typically offer roughly 50% off for non-urgent work. Choosing a smaller model for simple tasks, classification, extraction, short replies, is often the single biggest saving. Always confirm current rates on each provider’s pricing page, since prices change regularly.

Want the work done, not just the tool?

OpenHelm runs AI agents in a secure cloud environment to do the actual task, research, outreach, reporting, monitoring, and hands back the result for your sign-off.

See how OpenHelm works Browse use cases →

More free tools

MCP Server Config Generator AI Automation ROI Calculator LLM Token Counter llms.txt Generator JSON-LD Schema Generator Cron Expression Generator

Frequently asked questions