Why LLM Costs Are Hard to Predict
Unlike traditional SaaS pricing (flat monthly fee), LLM APIs charge per token — a unit of text roughly equal to ¾ of a word. The challenge is that costs are not linear: they're driven by the interaction of four variables that compound differently for every use case:
- Input token count — your system prompt + user message + conversation history
- Output token count — the model's response, which you can't fully control
- Model tier — flagship models cost 10–50× more than mini variants
- Request volume — the number of API calls per day/month
A chat application that sends a 2,000-token system prompt on every request is burning budget before the user types a single word. Getting this right starts with a clean formula.
Your system prompt is usually the biggest hidden cost. A 2,000-token system prompt on 100,000 daily requests adds ~60M input tokens per month to your bill — roughly $150 on GPT-4o mini, or $4,500 on GPT-4o. Measure it.
The Cost Formula
Every LLM provider uses the same underlying structure. Once you know this, any pricing page is readable in seconds:
+ (output_tokens × output_price_per_1M / 1,000,000)
That's the whole model. Every calculator, every spreadsheet, every cost estimate is just this equation applied at scale. The only things that change between providers are the two price variables.
2026 LLM Pricing Table
All prices are USD, pay-as-you-go, per 1 million tokens, as of June 2026. These do not include discounts from Batch API, prompt caching, or committed-use agreements.
| Model | Provider | Input / 1M tok | Output / 1M tok | Best for |
|---|---|---|---|---|
| GPT-4o | OpenAI | $2.50 | $10.00 | General flagship |
| GPT-4o miniBEST VALUE | OpenAI | $0.15 | $0.60 | High-volume, simple tasks |
| o1 | OpenAI | $15.00 | $60.00 | Complex reasoning |
| Claude 3.5 Sonnet | Anthropic | $3.00 | $15.00 | Code, analysis, long context |
| Claude 3.5 Haiku | Anthropic | $0.80 | $4.00 | Fast, cost-efficient tasks |
| Claude 3 Opus | Anthropic | $15.00 | $75.00 | Highest capability tasks |
| Gemini 1.5 Pro | $1.25 | $5.00 | Long context (1M tokens) | |
| Gemini 1.5 FlashCHEAPEST | $0.075 | $0.30 | Classification, extraction |
Prices above reflect public pay-as-you-go API rates. Batch API (OpenAI, Anthropic) typically halves input costs. Prompt caching (Anthropic, Google) can reduce repeated-context costs by 75–90%. These are modeled separately in the full calculator.
🧮 Calculate your exact monthly cost
Plug in your model, token counts, and request volume — get a live estimate in seconds. No signup, runs in your browser.
Open LLM Cost Calculator →Three Real-World Examples
Abstract pricing tables don't help you budget. Here are three worked examples covering common production scenarios.
Example 1: Customer Support Chatbot
A mid-sized SaaS product with 500 daily support conversations. Each turn uses a 1,500-token system prompt, 300-token user message, and receives a 400-token response.
Customer Support Bot GPT-4o mini
The same workload on GPT-4o (flagship): $51 + $60 = $111/month. That's a 14× cost difference for the same task. For a simple support bot, GPT-4o mini is the obvious choice unless you need the flagship's reasoning quality.
Example 2: Code Review Pipeline
An internal tool that reviews pull requests. Large context (5,000-token diffs), detailed output (2,000 tokens), 200 PRs/day.
Code Review Tool Claude 3.5 Sonnet
Anthropic's prompt caching can reduce input cost by up to 90% for repeated system prompts or static context blocks. If your code review prompt is consistent across reviews, caching would bring the $90 input line to ~$9 — saving $81/month with one API flag.
Example 3: Document Analysis at Scale
A legal-tech startup analyzing contracts. 10,000 documents/month, 8,000 tokens each, with structured 1,500-token JSON output.
Contract Analysis Gemini 1.5 Pro
The same on GPT-4o: $200 + $150 = $350/month. Gemini 1.5 Pro wins here — its 1M context window means it can process entire contract bundles in a single request rather than chunking, which eliminates retrieval complexity and reduces total request count.
5 Ways to Cut Your LLM Bill
1. Use the cheapest model that passes your quality bar
This is the highest-leverage decision. Run an A/B evaluation between GPT-4o mini and GPT-4o on your actual prompts and tasks. For classification, extraction, summarization, and simple Q&A, mini/flash models are often indistinguishable from flagship models at 10× lower cost.
2. Shrink your system prompt
Every token in your system prompt is billed on every request. Audit it ruthlessly. Remove examples if your model already does the task correctly. Use concise instruction phrasing. If you cut 500 tokens from a 100,000-request/month pipeline, you save 50M input tokens — $12.50/month on GPT-4o mini, $125/month on GPT-4o.
3. Enable prompt caching for static context
Both Anthropic and Google offer prompt caching that dramatically reduces the cost of repeatedly sending the same context (RAG documents, system prompts, few-shot examples). Cache hits are typically billed at 10–25% of the normal input price.
4. Use Batch API for async workloads
OpenAI and Anthropic offer a Batch API that processes requests asynchronously (results within 24h) at 50% of standard pricing. If you have document processing, overnight analytics, or any workload that doesn't need real-time response, batch processing is free money.
5. Cap output tokens explicitly
Models will generate to their context limit by default. Set max_tokens to the maximum you'd realistically use. A response that drifts from 500 to 2,000 tokens quadruples your output cost on that request. Measure your P95 output length in production and set a cap slightly above it.
Multi-turn conversational apps often pass the full conversation history on every request. A 20-turn conversation has an input token count that grows quadratically. Implement a context truncation strategy (sliding window, summarization, or selective retrieval) before hitting production at scale.
Frequently Asked Questions
How much does GPT-4o cost per 1 million tokens?
GPT-4o costs $2.50 per million input tokens and $10.00 per million output tokens (pay-as-you-go, June 2026). Cached input tokens and Batch API can reduce input cost by up to 50%.
How much does Claude 3.5 Sonnet cost?
Claude 3.5 Sonnet costs $3.00 per million input tokens and $15.00 per million output tokens via the Anthropic API (June 2026). It is notably strong on coding and long-context analysis.
Is Gemini cheaper than GPT-4o?
Gemini 1.5 Flash is dramatically cheaper ($0.075/$0.30 per million) but is a lighter model. Gemini 1.5 Pro at $1.25/$5.00 is roughly half the input price of GPT-4o and offers a 1-million-token context window — a meaningful advantage for long-document tasks.
What is the cheapest LLM API for production in 2026?
For quality-sensitive workloads: GPT-4o mini ($0.15/$0.60) or Claude 3.5 Haiku ($0.80/$4.00) are the best capability-per-dollar options. For simple classification/extraction: Gemini 1.5 Flash ($0.075/$0.30) is the cheapest at scale. For raw cost with no SLA: open-source models self-hosted on spot instances can undercut all of these at sufficient volume.
🔤 Try it with your own numbers
Enter your model, tokens per request, and monthly volume — the calculator updates live with your total and a per-request breakdown.
Open LLM Cost Calculator →Last updated: June 2, 2026. Pricing data is reviewed monthly — email us if you spot a discrepancy.