How AI pricing works, why different models cost differently, and how to make smart selection decisions — the finance leader's guide to AI economics.
AI models don't read words like you do. They break text into tokens — small pieces that the model processes one at a time. Think of it like a cash register that counts items, not bags.
Common words = 1 token. Long or rare words split into pieces.
Both what you send (input) and what AI generates (output) cost tokens.
"$4,200.50" = 5+ tokens. Financial data costs more than narrative text.
Markdown uses 60% fewer tokens than HTML for the same content.
| Content | Approximate tokens | Analogy |
|---|---|---|
| A short question ("Assess this merchant") | ~5 tokens | A single sentence on a Post-it |
| A paragraph of merchant data (10 lines) | ~150 tokens | Half a page of notes |
| Our engineered prompt template | ~400 tokens | A one-page memo |
| A full risk assessment output (8 sections) | ~800 tokens | A two-page report |
| Total per assessment (input + output) | ~1,350 tokens | A three-page document |
MC-8842, Kopi Corner, $15,600, 4.1%, SGD uses ~25 tokens — while the same length of English text uses only ~12 tokens. Numbers and special characters are "expensive" because the tokenizer never learned to compress them efficiently.
AI pricing is simple: you pay per token, both in and out. Think of it like a taxi meter — the meter runs while you talk (input) AND while the AI responds (output). Output tokens cost 3–5× more because generating text is computationally harder than reading it.
Prices vary dramatically — from fractions of a cent to dollars per million tokens. Here's the landscape of models available on Amazon Bedrock:
| Model | Provider | Input / 1M | Output / 1M | Best for |
|---|---|---|---|---|
| Nova Micro | Amazon | $0.035 | $0.14 | Classification, routing |
| Nova Lite | Amazon | $0.06 | $0.24 | Drafts, summaries |
| Llama 4 Maverick 17B | Meta | $0.24 | $0.97 | Multimodal, cost-effective |
| DeepSeek V3.2 | DeepSeek | $0.27 | $1.10 | Coding, general tasks |
| Mistral Large 3 | Mistral AI | $0.50 | $1.50 | Multilingual, structured |
| Llama 3.3 70B | Meta | $0.72 | $0.72 | Open-weight balanced |
| Nova Pro | Amazon | $0.80 | $3.20 | Reports, analysis |
| Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 | Quality + speed balance |
| Nova Premier | Amazon | $2.50 | $10.00 | Complex multimodal |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | Complex reasoning |
| Claude Opus 4.7 | Anthropic | $5.00 | $25.00 | Deepest multi-step tasks |
Pricing as of May 2026 (on-demand, US regions). Check aws.amazon.com/bedrock/pricing for current rates. Additional models available: Qwen3, Kimi K2, NVIDIA Nemotron, Writer Palmyra, and more.
Model names change every few months. Instead of memorizing names, think in tiers — match your task complexity to the right level of capability. It's like hiring: you don't need a senior consultant for data entry.
Simple tasks
Pattern matching
$0.04–$1/M tokens
Junior analyst
Moderate reasoning
Quality + speed
$1–$5/M tokens
Senior analyst
Complex analysis
Multi-step logic
$3–$75/M tokens
Expert consultant
| Finance task | Tier | Why |
|---|---|---|
| Document classification (invoice vs receipt) | ⚡ Fast | Simple pattern matching — speed matters most |
| Invoice data extraction (fields → JSON) | ⚡ Fast | Structured extraction, no deep reasoning needed |
| Customer complaint response drafts | 🎯 Balanced | Needs empathy and nuance, not deep analysis |
| Monthly settlement reconciliation | 🎯 Balanced | Structured comparison, moderate complexity |
| Merchant risk assessment narrative | 🧠 Deep | Multi-factor reasoning, data citation, recommendations |
| Regulatory impact assessment | 🧠 Deep | Cross-referencing documents, nuanced interpretation |
| Bulk monthly assessments (200+ merchants) | ⚡ Fast | Cost-effective at scale — cheapest model that meets quality bar |
Models on Amazon Bedrock grouped by tier. Click a tier to explore the models inside.
Explore the 3 tiers: Fast & Cheap (under $1), Balanced ($1–$2), and Deep Reasoning ($2–$5). Each tier has multiple models from different providers.
Pick a task, adjust the volume, and watch how cost changes across tiers. The right model choice can save your team thousands per month.
Once you've picked the right model tier, these strategies reduce cost further. Ordered by impact:
The biggest lever. Use Nova Micro for classification, Sonnet for complex analysis. Model choice matters more than anything else.
Up to 90% savings. Cache your template — pay full price once, 10% for every reuse. Perfect for repeated tasks.
50% savings. Submit requests in bulk (not real-time). Ideal for monthly portfolio assessments.
Up to 30% savings. Bedrock auto-routes simple tasks to cheaper models, complex ones to powerful models.
10–40% savings. Remove redundant instructions, use shorter examples, constrain output length. Markdown instead of HTML saves 60% on formatting tokens.
The context window is the maximum text the model can process at once — your prompt + the AI's response must fit within it.
| Model | Context window | Equivalent | Practical meaning |
|---|---|---|---|
| Nova Micro | 128K tokens | ~100 pages | A short book |
| Nova Pro | 300K tokens | ~230 pages | A long report |
| Claude Sonnet 4 | 200K tokens | ~150 pages | A full policy manual |
| Tool | Who picks the model | What you control |
|---|---|---|
| Claude (Cowork) | Anthropic (by plan tier) | Your prompt quality |
| Kiro | Auto-selected by task | Your prompt quality |
| Cursor | You choose per conversation | Model + prompt quality |
| Bedrock Playground | You choose explicitly | Model + prompt + parameters |
| Concept from this page | Where you'll apply it |
|---|---|
| Token estimation | Understanding why prompt length matters for cost and quality |
| Model tiers | Day 1 Demo: Model Arena — compare 3 models on the same task |
| Cost optimization | Making the business case for AI adoption in your team |
| Context windows | Managing long conversations — knowing when to start fresh |
| Decision framework | Day 2: Planning your first agent's cost profile |