This LLM API cost calculator estimates and compares what a workload will cost across current Claude, GPT and Gemini models. Enter the input tokens, output tokens and monthly request volume, and see the price per request and per month for each model side by side.
It’s a fast way to sanity-check an AI feature’s unit economics before you build, and to spot when a cheaper model would do the job for a fraction of the price.
How LLM API pricing works
Almost every provider bills per token, with separate rates for input (your prompt) and output (the model’s response). Cost per request is (input tokens ÷ 1,000,000 × input price) + (output tokens ÷ 1,000,000 × output price). Output is usually three to five times more expensive than input because generating text is more compute-intensive than reading it.
On reasoning models, the output figure also includes the model’s internal thinking tokens, so verbose reasoning can quietly inflate a bill.
How to estimate your monthly cost
Start with a realistic average prompt and response size, not the maximum. Multiply the per-request cost by your expected monthly volume. If your prompts vary a lot, run the calculator twice — once for a typical request and once for a heavy one — to bracket the range.
Tips to reduce API costs
Prompt caching can cut the cost of repeated context (system prompts, long documents) by around 90%. Batch APIs typically offer roughly 50% off for non-urgent work. Choosing a smaller model for simple tasks — classification, extraction, short replies — is often the single biggest saving. Always confirm current rates on each provider’s pricing page, since prices change regularly.