Tokenization, Pricing & Model Selection — AnyCompany Leader Workshop

What Are Tokens?

AI models don't read words like you do. They break text into tokens — small pieces that the model processes one at a time. Think of it like a cash register that counts items, not bags.

📝

1 token ≈ ¾ word

Common words = 1 token. Long or rare words split into pieces.

💵

You pay per token

Both what you send (input) and what AI generates (output) cost tokens.

🔢

Numbers are expensive

"$4,200.50" = 5+ tokens. Financial data costs more than narrative text.

📊

Format matters

Markdown uses 60% fewer tokens than HTML for the same content.

Token Estimation for Finance Work

Content	Approximate tokens	Analogy
A short question ("Assess this merchant")	~5 tokens	A single sentence on a Post-it
A paragraph of merchant data (10 lines)	~150 tokens	Half a page of notes
Our engineered prompt template	~400 tokens	A one-page memo
A full risk assessment output (8 sections)	~800 tokens	A two-page report
Total per assessment (input + output)	~1,350 tokens	A three-page document

⚠️ Key insight for finance: A CSV row like MC-8842, Kopi Corner, $15,600, 4.1%, SGD uses ~25 tokens — while the same length of English text uses only ~12 tokens. Numbers and special characters are "expensive" because the tokenizer never learned to compress them efficiently.

💡 Why this matters for your team: When you send a spreadsheet to AI, you're paying for every comma, dollar sign, and decimal point. Summarizing data in narrative form ("Revenue grew 271% from $4,200 to $15,600") is cheaper than pasting raw tables — and often produces better AI output too.

How AI Pricing Works

AI pricing is simple: you pay per token, both in and out. Think of it like a taxi meter — the meter runs while you talk (input) AND while the AI responds (output). Output tokens cost 3–5× more because generating text is computationally harder than reading it.

THE PRICING FORMULA

Cost = (Input tokens × Input price) + (Output tokens × Output price) Example: Merchant Risk Assessment on Claude Sonnet 4 Input: 550 tokens × $3.00/million = $0.00165 Output: 800 tokens × $15.00/million = $0.01200 Total per assessment: $0.01365 That's 1.4 cents per assessment. An analyst takes 30 minutes ($25).

The Price Spectrum

Prices vary dramatically — from fractions of a cent to dollars per million tokens. Here's the landscape of models available on Amazon Bedrock:

Model	Provider	Input / 1M	Output / 1M	Best for
Nova Micro	Amazon	$0.035	$0.14	Classification, routing
Nova Lite	Amazon	$0.06	$0.24	Drafts, summaries
Llama 4 Maverick 17B	Meta	$0.24	$0.97	Multimodal, cost-effective
DeepSeek V3.2	DeepSeek	$0.27	$1.10	Coding, general tasks
Mistral Large 3	Mistral AI	$0.50	$1.50	Multilingual, structured
Llama 3.3 70B	Meta	$0.72	$0.72	Open-weight balanced
Nova Pro	Amazon	$0.80	$3.20	Reports, analysis
Claude Haiku 4.5	Anthropic	$1.00	$5.00	Quality + speed balance
Nova Premier	Amazon	$2.50	$10.00	Complex multimodal
Claude Sonnet 4.6	Anthropic	$3.00	$15.00	Complex reasoning
Claude Opus 4.7	Anthropic	$5.00	$25.00	Deepest multi-step tasks

Pricing as of May 2026 (on-demand, US regions). Check aws.amazon.com/bedrock/pricing for current rates. Additional models available: Qwen3, Kimi K2, NVIDIA Nemotron, Writer Palmyra, and more.

✅ The key insight: The same task can cost 1 cent or 1 dollar depending on which model you choose. Picking the right model for each task is the single biggest cost lever — far more impactful than optimizing prompt length.

Data Privacy — Why Bedrock Is Different

💡 With Amazon Bedrock: Your data stays in your AWS account — it is not used to train the models. You control the region, encryption, and access. All API calls are logged and auditable via CloudTrail. This is different from using ChatGPT or Claude.ai directly — Bedrock provides enterprise-grade data isolation.

The 3 Model Tiers

Model names change every few months. Instead of memorizing names, think in tiers — match your task complexity to the right level of capability. It's like hiring: you don't need a senior consultant for data entry.

⚡

Fast & Cheap

Simple tasks
Pattern matching
$0.04–$1/M tokens
Junior analyst

🎯

Balanced

Moderate reasoning
Quality + speed
$1–$5/M tokens
Senior analyst

🧠

Deep Reasoning

Complex analysis
Multi-step logic
$3–$75/M tokens
Expert consultant

Which Tier for Which Finance Task?

Finance task	Tier	Why
Document classification (invoice vs receipt)	⚡ Fast	Simple pattern matching — speed matters most
Invoice data extraction (fields → JSON)	⚡ Fast	Structured extraction, no deep reasoning needed
Customer complaint response drafts	🎯 Balanced	Needs empathy and nuance, not deep analysis
Monthly settlement reconciliation	🎯 Balanced	Structured comparison, moderate complexity
Merchant risk assessment narrative	🧠 Deep	Multi-factor reasoning, data citation, recommendations
Regulatory impact assessment	🧠 Deep	Cross-referencing documents, nuanced interpretation
Bulk monthly assessments (200+ merchants)	⚡ Fast	Cost-effective at scale — cheapest model that meets quality bar

✅ The golden rule: Start with the cheapest tier that might work. Test it. If quality isn't good enough, move up one tier. Don't start with Deep Reasoning for a task that Fast can handle — you'll pay dollars for something that costs pennies.

💡 Why models perform differently: More parameters = more "knowledge" stored, but also slower and more expensive. A 70B-parameter model has seen more patterns than a 7B model. Some models use mixture-of-experts (MoE) where only a fraction of parameters activate per token — making them faster without losing quality.

Cost vs. Capability Spectrum

Models on Amazon Bedrock grouped by tier. Click a tier to explore the models inside.

Click a tier or model to see details

Explore the 3 tiers: Fast & Cheap (under $1), Balanced ($1–$2), and Deep Reasoning ($2–$5). Each tier has multiple models from different providers.

5 models in Fast tier 3 models in Balanced tier 3 models in Deep tier

Amazon Anthropic Meta Mistral AI DeepSeek Data: Artificial Analysis · May 2026

Model Selection Simulator

Pick a task, adjust the volume, and watch how cost changes across tiers. The right model choice can save your team thousands per month.

1. What's the task?

🛡️Risk Assessment

🧾Invoice Extraction

💬Complaint Response

📋Credit Narrative

📂Doc Classification

2. How many per month?

Volume

200

Best value

⚡

Fast & Cheap

$0.02

per month

Quality fit

Recommended

🎯

Balanced

$0.54

per month

Quality fit

Premium

🧠

Deep Reasoning

$2.64

per month

Quality fit

💸 vs. manual processing: An analyst costs $5,000/month for this task

99.9% saved

☕ Put it this way: 200 risk assessments with Claude Sonnet 4 costs less than a single cup of coffee ($2.64). The same work would take an analyst 100 hours.

💡 Recommendation: For merchant risk assessments, use Deep Reasoning (Claude Sonnet 4). The task requires structured reasoning, data citation, and actionable recommendations. Lightweight models produce surface-level output that wouldn't pass compliance review.

5 Cost Optimization Levers

Once you've picked the right model tier, these strategies reduce cost further. Ordered by impact:

🎚️

1. Right-size your model

The biggest lever. Use Nova Micro for classification, Sonnet for complex analysis. Model choice matters more than anything else.

💾

2. Prompt Caching

Up to 90% savings. Cache your template — pay full price once, 10% for every reuse. Perfect for repeated tasks.

📦

3. Batch Processing

50% savings. Submit requests in bulk (not real-time). Ideal for monthly portfolio assessments.

🔀

4. Intelligent Routing

Up to 30% savings. Bedrock auto-routes simple tasks to cheaper models, complex ones to powerful models.

✂️

5. Optimize Prompts

10–40% savings. Remove redundant instructions, use shorter examples, constrain output length. Markdown instead of HTML saves 60% on formatting tokens.

Context Windows: How Much Can the Model "See"?

The context window is the maximum text the model can process at once — your prompt + the AI's response must fit within it.

Model	Context window	Equivalent	Practical meaning
Nova Micro	128K tokens	~100 pages	A short book
Nova Pro	300K tokens	~230 pages	A long report
Claude Sonnet 4	200K tokens	~150 pages	A full policy manual

💡 For finance: A typical merchant data file + prompt template + policy document fits easily within any model's context window. You'd only hit limits with very large documents (100+ page regulatory filings). When you do, use RAG to feed only the relevant sections.

What You Control in Each Tool

Tool	Who picks the model	What you control
Claude (Cowork)	Anthropic (by plan tier)	Your prompt quality
Kiro	Auto-selected by task	Your prompt quality
Cursor	You choose per conversation	Model + prompt quality
Bedrock Playground	You choose explicitly	Model + prompt + parameters

✅ Key takeaway: In most AI tools, you don't choose the model — the tool does. Focus on writing great prompts and designing good workflows. The prompt engineering skills you learn today work regardless of which model or tool you use. When you DO have model choice (Cursor, Bedrock), use the tier framework.

Workshop Connection

Concept from this page	Where you'll apply it
Token estimation	Understanding why prompt length matters for cost and quality
Model tiers	Day 1 Demo: Model Arena — compare 3 models on the same task
Cost optimization	Making the business case for AI adoption in your team
Context windows	Managing long conversations — knowing when to start fresh
Decision framework	Day 2: Planning your first agent's cost profile

💰 Tokens, Cost & Model Selection

What Are Tokens?

1 token ≈ ¾ word

You pay per token

Numbers are expensive

Format matters

Token Estimation for Finance Work

How AI Pricing Works

The Price Spectrum

Data Privacy — Why Bedrock Is Different

The 3 Model Tiers

Fast & Cheap

Balanced

Deep Reasoning

Which Tier for Which Finance Task?

Cost vs. Capability Spectrum

Model Selection Simulator

1. What's the task?

2. How many per month?

5 Cost Optimization Levers

1. Right-size your model

2. Prompt Caching

3. Batch Processing

4. Intelligent Routing

5. Optimize Prompts

Context Windows: How Much Can the Model "See"?

What You Control in Each Tool

Workshop Connection