← Workshop Home

💰 Tokens, Cost & Model Selection

How AI pricing works, why different models cost differently, and how to make smart selection decisions — the finance leader's guide to AI economics.

What Are Tokens?

AI models don't read words like you do. They break text into tokens — small pieces that the model processes one at a time. Think of it like a cash register that counts items, not bags.

📝

1 token ≈ ¾ word

Common words = 1 token. Long or rare words split into pieces.

💵

You pay per token

Both what you send (input) and what AI generates (output) cost tokens.

🔢

Numbers are expensive

"$4,200.50" = 5+ tokens. Financial data costs more than narrative text.

📊

Format matters

Markdown uses 60% fewer tokens than HTML for the same content.

Token Estimation for Finance Work

ContentApproximate tokensAnalogy
A short question ("Assess this merchant")~5 tokensA single sentence on a Post-it
A paragraph of merchant data (10 lines)~150 tokensHalf a page of notes
Our engineered prompt template~400 tokensA one-page memo
A full risk assessment output (8 sections)~800 tokensA two-page report
Total per assessment (input + output)~1,350 tokensA three-page document
⚠️ Key insight for finance: A CSV row like MC-8842, Kopi Corner, $15,600, 4.1%, SGD uses ~25 tokens — while the same length of English text uses only ~12 tokens. Numbers and special characters are "expensive" because the tokenizer never learned to compress them efficiently.
💡 Why this matters for your team: When you send a spreadsheet to AI, you're paying for every comma, dollar sign, and decimal point. Summarizing data in narrative form ("Revenue grew 271% from $4,200 to $15,600") is cheaper than pasting raw tables — and often produces better AI output too.

How AI Pricing Works

AI pricing is simple: you pay per token, both in and out. Think of it like a taxi meter — the meter runs while you talk (input) AND while the AI responds (output). Output tokens cost 3–5× more because generating text is computationally harder than reading it.

THE PRICING FORMULA
Cost = (Input tokens × Input price) + (Output tokens × Output price) Example: Merchant Risk Assessment on Claude Sonnet 4 Input: 550 tokens × $3.00/million = $0.00165 Output: 800 tokens × $15.00/million = $0.01200 Total per assessment: $0.01365 That's 1.4 cents per assessment. An analyst takes 30 minutes ($25).

The Price Spectrum

Prices vary dramatically — from fractions of a cent to dollars per million tokens. Here's the landscape of models available on Amazon Bedrock:

ModelProviderInput / 1MOutput / 1MBest for
Nova MicroAmazon$0.035$0.14Classification, routing
Nova LiteAmazon$0.06$0.24Drafts, summaries
Llama 4 Maverick 17BMeta$0.24$0.97Multimodal, cost-effective
DeepSeek V3.2DeepSeek$0.27$1.10Coding, general tasks
Mistral Large 3Mistral AI$0.50$1.50Multilingual, structured
Llama 3.3 70BMeta$0.72$0.72Open-weight balanced
Nova ProAmazon$0.80$3.20Reports, analysis
Claude Haiku 4.5Anthropic$1.00$5.00Quality + speed balance
Nova PremierAmazon$2.50$10.00Complex multimodal
Claude Sonnet 4.6Anthropic$3.00$15.00Complex reasoning
Claude Opus 4.7Anthropic$5.00$25.00Deepest multi-step tasks

Pricing as of May 2026 (on-demand, US regions). Check aws.amazon.com/bedrock/pricing for current rates. Additional models available: Qwen3, Kimi K2, NVIDIA Nemotron, Writer Palmyra, and more.

✅ The key insight: The same task can cost 1 cent or 1 dollar depending on which model you choose. Picking the right model for each task is the single biggest cost lever — far more impactful than optimizing prompt length.

Data Privacy — Why Bedrock Is Different

💡 With Amazon Bedrock: Your data stays in your AWS account — it is not used to train the models. You control the region, encryption, and access. All API calls are logged and auditable via CloudTrail. This is different from using ChatGPT or Claude.ai directly — Bedrock provides enterprise-grade data isolation.

The 3 Model Tiers

Model names change every few months. Instead of memorizing names, think in tiers — match your task complexity to the right level of capability. It's like hiring: you don't need a senior consultant for data entry.

Fast & Cheap

Simple tasks
Pattern matching
$0.04–$1/M tokens
Junior analyst

🎯

Balanced

Moderate reasoning
Quality + speed
$1–$5/M tokens
Senior analyst

🧠

Deep Reasoning

Complex analysis
Multi-step logic
$3–$75/M tokens
Expert consultant

Which Tier for Which Finance Task?

Finance taskTierWhy
Document classification (invoice vs receipt)⚡ FastSimple pattern matching — speed matters most
Invoice data extraction (fields → JSON)⚡ FastStructured extraction, no deep reasoning needed
Customer complaint response drafts🎯 BalancedNeeds empathy and nuance, not deep analysis
Monthly settlement reconciliation🎯 BalancedStructured comparison, moderate complexity
Merchant risk assessment narrative🧠 DeepMulti-factor reasoning, data citation, recommendations
Regulatory impact assessment🧠 DeepCross-referencing documents, nuanced interpretation
Bulk monthly assessments (200+ merchants)⚡ FastCost-effective at scale — cheapest model that meets quality bar
✅ The golden rule: Start with the cheapest tier that might work. Test it. If quality isn't good enough, move up one tier. Don't start with Deep Reasoning for a task that Fast can handle — you'll pay dollars for something that costs pennies.
💡 Why models perform differently: More parameters = more "knowledge" stored, but also slower and more expensive. A 70B-parameter model has seen more patterns than a 7B model. Some models use mixture-of-experts (MoE) where only a fraction of parameters activate per token — making them faster without losing quality.

Cost vs. Capability Spectrum

Models on Amazon Bedrock grouped by tier. Click a tier to explore the models inside.

Cost per 1M tokens → Intelligence → $0 $0.50 $1.00 $2.00 $3.00 $5.00 Low Med High
Click a tier or model to see details

Explore the 3 tiers: Fast & Cheap (under $1), Balanced ($1–$2), and Deep Reasoning ($2–$5). Each tier has multiple models from different providers.

5 models in Fast tier 3 models in Balanced tier 3 models in Deep tier
Amazon Anthropic Meta Mistral AI DeepSeek Data: Artificial Analysis · May 2026

Model Selection Simulator

Pick a task, adjust the volume, and watch how cost changes across tiers. The right model choice can save your team thousands per month.

1. What's the task?

🛡️Risk Assessment
🧾Invoice Extraction
💬Complaint Response
📋Credit Narrative
📂Doc Classification

2. How many per month?

Volume
200
Best value
Fast & Cheap
$0.02
per month
Quality fit
Recommended
🎯
Balanced
$0.54
per month
Quality fit
Premium
🧠
Deep Reasoning
$2.64
per month
Quality fit
💸 vs. manual processing: An analyst costs $5,000/month for this task
99.9% saved
Put it this way: 200 risk assessments with Claude Sonnet 4 costs less than a single cup of coffee ($2.64). The same work would take an analyst 100 hours.
💡 Recommendation: For merchant risk assessments, use Deep Reasoning (Claude Sonnet 4). The task requires structured reasoning, data citation, and actionable recommendations. Lightweight models produce surface-level output that wouldn't pass compliance review.

5 Cost Optimization Levers

Once you've picked the right model tier, these strategies reduce cost further. Ordered by impact:

🎚️

1. Right-size your model

The biggest lever. Use Nova Micro for classification, Sonnet for complex analysis. Model choice matters more than anything else.

💾

2. Prompt Caching

Up to 90% savings. Cache your template — pay full price once, 10% for every reuse. Perfect for repeated tasks.

📦

3. Batch Processing

50% savings. Submit requests in bulk (not real-time). Ideal for monthly portfolio assessments.

🔀

4. Intelligent Routing

Up to 30% savings. Bedrock auto-routes simple tasks to cheaper models, complex ones to powerful models.

✂️

5. Optimize Prompts

10–40% savings. Remove redundant instructions, use shorter examples, constrain output length. Markdown instead of HTML saves 60% on formatting tokens.

Context Windows: How Much Can the Model "See"?

The context window is the maximum text the model can process at once — your prompt + the AI's response must fit within it.

ModelContext windowEquivalentPractical meaning
Nova Micro128K tokens~100 pagesA short book
Nova Pro300K tokens~230 pagesA long report
Claude Sonnet 4200K tokens~150 pagesA full policy manual
💡 For finance: A typical merchant data file + prompt template + policy document fits easily within any model's context window. You'd only hit limits with very large documents (100+ page regulatory filings). When you do, use RAG to feed only the relevant sections.

What You Control in Each Tool

ToolWho picks the modelWhat you control
Claude (Cowork)Anthropic (by plan tier)Your prompt quality
KiroAuto-selected by taskYour prompt quality
CursorYou choose per conversationModel + prompt quality
Bedrock PlaygroundYou choose explicitlyModel + prompt + parameters
✅ Key takeaway: In most AI tools, you don't choose the model — the tool does. Focus on writing great prompts and designing good workflows. The prompt engineering skills you learn today work regardless of which model or tool you use. When you DO have model choice (Cursor, Bedrock), use the tier framework.

Workshop Connection

Concept from this pageWhere you'll apply it
Token estimationUnderstanding why prompt length matters for cost and quality
Model tiersDay 1 Demo: Model Arena — compare 3 models on the same task
Cost optimizationMaking the business case for AI adoption in your team
Context windowsManaging long conversations — knowing when to start fresh
Decision frameworkDay 2: Planning your first agent's cost profile