How do I calculate my monthly LLM API cost?

Monthly cost = (avg input tokens × input price/1M + avg output tokens × output price/1M) × number of requests per month. Use our free LLM cost calculator at apicalculators.com/#llm to compute this instantly.

LLM API Cost Calculator Guide 2026: GPT-4o, Claude, Gemini Pricing

Q: How much does GPT-4o cost per 1 million tokens?

GPT-4o costs $2.50 per million input tokens and $10.00 per million output tokens (pay-as-you-go, June 2026). Cached input tokens and Batch API can reduce input cost by up to 50%.

Q: How much does Claude 3.5 Sonnet cost?

Claude 3.5 Sonnet costs $3.00 per million input tokens and $15.00 per million output tokens (Anthropic API, June 2026).

Q: Is Gemini cheaper than GPT-4o?

Gemini 1.5 Flash is significantly cheaper at $0.075/M input and $0.30/M output tokens, but it is a lighter model. Gemini 1.5 Pro ($1.25/$5.00) is closer in capability to GPT-4o and about half the price for input tokens.

Q: What is the cheapest LLM API for production in 2026?

For cost-sensitive production workloads, GPT-4o mini ($0.15/$0.60 per million tokens) and Claude 3.5 Haiku ($0.80/$4.00) offer the best capability-per-dollar. Gemini 1.5 Flash is cheaper still for less complex tasks.

Running LLM APIs in production without a cost model is how companies end up with a five-figure AWS bill on a Monday morning. This guide gives you the exact formula, a current 2026 pricing table across every major provider, and three worked real-world examples — so you can budget before you build.

Why LLM Costs Are Hard to Predict

Unlike traditional SaaS pricing (flat monthly fee), LLM APIs charge per token — a unit of text roughly equal to ¾ of a word. The challenge is that costs are not linear: they're driven by the interaction of four variables that compound differently for every use case:

Input token count — your system prompt + user message + conversation history
Output token count — the model's response, which you can't fully control
Model tier — flagship models cost 10–50× more than mini variants
Request volume — the number of API calls per day/month

A chat application that sends a 2,000-token system prompt on every request is burning budget before the user types a single word. Getting this right starts with a clean formula.

✦ Quick tip

Your system prompt is usually the biggest hidden cost. A 2,000-token system prompt on 100,000 daily requests adds ~60M input tokens per month to your bill — roughly $150 on GPT-4o mini, or $4,500 on GPT-4o. Measure it.

The Cost Formula

Every LLM provider uses the same underlying structure. Once you know this, any pricing page is readable in seconds:

Per-request cost

cost = (input_tokens × input_price_per_1M / 1,000,000)
+ (output_tokens × output_price_per_1M / 1,000,000)

Monthly cost

monthly = cost_per_request × requests_per_day × 30

That's the whole model. Every calculator, every spreadsheet, every cost estimate is just this equation applied at scale. The only things that change between providers are the two price variables.

2026 LLM Pricing Table

All prices are USD, pay-as-you-go, per 1 million tokens, as of June 2026. These do not include discounts from Batch API, prompt caching, or committed-use agreements.

Model	Provider	Input / 1M tok	Output / 1M tok	Best for
GPT-4o	OpenAI	$2.50	$10.00	General flagship
GPT-4o miniBEST VALUE	OpenAI	$0.15	$0.60	High-volume, simple tasks
o1	OpenAI	$15.00	$60.00	Complex reasoning
Claude 3.5 Sonnet	Anthropic	$3.00	$15.00	Code, analysis, long context
Claude 3.5 Haiku	Anthropic	$0.80	$4.00	Fast, cost-efficient tasks
Claude 3 Opus	Anthropic	$15.00	$75.00	Highest capability tasks
Gemini 1.5 Pro	Google	$1.25	$5.00	Long context (1M tokens)
Gemini 1.5 FlashCHEAPEST	Google	$0.075	$0.30	Classification, extraction

ℹ Data note

Prices above reflect public pay-as-you-go API rates. Batch API (OpenAI, Anthropic) typically halves input costs. Prompt caching (Anthropic, Google) can reduce repeated-context costs by 75–90%. These are modeled separately in the full calculator.

🧮 Calculate your exact monthly cost

Plug in your model, token counts, and request volume — get a live estimate in seconds. No signup, runs in your browser.

Open LLM Cost Calculator →

Three Real-World Examples

Abstract pricing tables don't help you budget. Here are three worked examples covering common production scenarios.

Example 1: Customer Support Chatbot

A mid-sized SaaS product with 500 daily support conversations. Each turn uses a 1,500-token system prompt, 300-token user message, and receives a 400-token response.

Customer Support Bot GPT-4o mini

Input tokens / request1,800

Output tokens / request400

Requests / month15,000

Input cost (1.8K × $0.15/1M × 15K)$4.05

Output cost (400 × $0.60/1M × 15K)$3.60

Monthly total$7.65

The same workload on GPT-4o (flagship): $51 + $60 = $111/month. That's a 14× cost difference for the same task. For a simple support bot, GPT-4o mini is the obvious choice unless you need the flagship's reasoning quality.

Example 2: Code Review Pipeline

An internal tool that reviews pull requests. Large context (5,000-token diffs), detailed output (2,000 tokens), 200 PRs/day.

Code Review Tool Claude 3.5 Sonnet

Input tokens / request5,000

Output tokens / request2,000

Requests / month6,000

Input cost (5K × $3.00/1M × 6K)$90.00

Output cost (2K × $15.00/1M × 6K)$180.00

Monthly total$270.00

✦ Cost optimization

Anthropic's prompt caching can reduce input cost by up to 90% for repeated system prompts or static context blocks. If your code review prompt is consistent across reviews, caching would bring the $90 input line to ~$9 — saving $81/month with one API flag.

Example 3: Document Analysis at Scale

A legal-tech startup analyzing contracts. 10,000 documents/month, 8,000 tokens each, with structured 1,500-token JSON output.

Contract Analysis Gemini 1.5 Pro

Input tokens / request8,000

Output tokens / request1,500

Requests / month10,000

Input cost (8K × $1.25/1M × 10K)$100.00

Output cost (1.5K × $5.00/1M × 10K)$75.00

Monthly total$175.00

The same on GPT-4o: $200 + $150 = $350/month. Gemini 1.5 Pro wins here — its 1M context window means it can process entire contract bundles in a single request rather than chunking, which eliminates retrieval complexity and reduces total request count.

5 Ways to Cut Your LLM Bill

1. Use the cheapest model that passes your quality bar

This is the highest-leverage decision. Run an A/B evaluation between GPT-4o mini and GPT-4o on your actual prompts and tasks. For classification, extraction, summarization, and simple Q&A, mini/flash models are often indistinguishable from flagship models at 10× lower cost.

2. Shrink your system prompt

Every token in your system prompt is billed on every request. Audit it ruthlessly. Remove examples if your model already does the task correctly. Use concise instruction phrasing. If you cut 500 tokens from a 100,000-request/month pipeline, you save 50M input tokens — $12.50/month on GPT-4o mini, $125/month on GPT-4o.

3. Enable prompt caching for static context

Both Anthropic and Google offer prompt caching that dramatically reduces the cost of repeatedly sending the same context (RAG documents, system prompts, few-shot examples). Cache hits are typically billed at 10–25% of the normal input price.

4. Use Batch API for async workloads

OpenAI and Anthropic offer a Batch API that processes requests asynchronously (results within 24h) at 50% of standard pricing. If you have document processing, overnight analytics, or any workload that doesn't need real-time response, batch processing is free money.

5. Cap output tokens explicitly

Models will generate to their context limit by default. Set max_tokens to the maximum you'd realistically use. A response that drifts from 500 to 2,000 tokens quadruples your output cost on that request. Measure your P95 output length in production and set a cap slightly above it.

⚠ Common mistake

Multi-turn conversational apps often pass the full conversation history on every request. A 20-turn conversation has an input token count that grows quadratically. Implement a context truncation strategy (sliding window, summarization, or selective retrieval) before hitting production at scale.

Frequently Asked Questions

How much does GPT-4o cost per 1 million tokens?

GPT-4o costs $2.50 per million input tokens and $10.00 per million output tokens (pay-as-you-go, June 2026). Cached input tokens and Batch API can reduce input cost by up to 50%.

How much does Claude 3.5 Sonnet cost?

Claude 3.5 Sonnet costs $3.00 per million input tokens and $15.00 per million output tokens via the Anthropic API (June 2026). It is notably strong on coding and long-context analysis.

Is Gemini cheaper than GPT-4o?

Gemini 1.5 Flash is dramatically cheaper ($0.075/$0.30 per million) but is a lighter model. Gemini 1.5 Pro at $1.25/$5.00 is roughly half the input price of GPT-4o and offers a 1-million-token context window — a meaningful advantage for long-document tasks.

What is the cheapest LLM API for production in 2026?

For quality-sensitive workloads: GPT-4o mini ($0.15/$0.60) or Claude 3.5 Haiku ($0.80/$4.00) are the best capability-per-dollar options. For simple classification/extraction: Gemini 1.5 Flash ($0.075/$0.30) is the cheapest at scale. For raw cost with no SLA: open-source models self-hosted on spot instances can undercut all of these at sufficient volume.

🔤 Try it with your own numbers

Enter your model, tokens per request, and monthly volume — the calculator updates live with your total and a per-request breakdown.

Open LLM Cost Calculator →

🧮

APICalculators Team

We build free, privacy-first cost calculators for developers and AI engineers. Pricing data is sourced directly from official provider documentation and verified monthly. We've analyzed thousands of API billing structures so you don't have to.

Twitter →GitHub →

Last updated: June 2, 2026. Pricing data is reviewed monthly — email us if you spot a discrepancy.

How to Estimate Your LLM API Costs in 2026 — GPT-4o, Claude 3.5 & Gemini Compared

Why LLM Costs Are Hard to Predict

The Cost Formula

2026 LLM Pricing Table

🧮 Calculate your exact monthly cost

Three Real-World Examples

Example 1: Customer Support Chatbot

Customer Support Bot GPT-4o mini

Example 2: Code Review Pipeline

Code Review Tool Claude 3.5 Sonnet

Example 3: Document Analysis at Scale

Contract Analysis Gemini 1.5 Pro

5 Ways to Cut Your LLM Bill

1. Use the cheapest model that passes your quality bar

2. Shrink your system prompt

3. Enable prompt caching for static context

4. Use Batch API for async workloads

5. Cap output tokens explicitly

Frequently Asked Questions

How much does GPT-4o cost per 1 million tokens?

How much does Claude 3.5 Sonnet cost?

Is Gemini cheaper than GPT-4o?

What is the cheapest LLM API for production in 2026?

🔤 Try it with your own numbers

Related guides