📘 Where AI Wins (and Where It Doesn't) for Finance
Not every finance task needs Generative AI. Pick the wrong use case and you'll waste budget, erode trust, and slow your team. This module gives you a decision framework, finance-specific use cases by function, and a priority matrix you can take to your team this week.
Day 1: FoundationDecision Framework32 Finance Use CasesBy Function
The Question Every Finance Leader Should Answer First
Your team has 200 ideas for "where we could use AI." Most of them won't survive a 90-day pilot. The ones that succeed share three traits: they tackle unstructured work, they have a human in the loop for the final call, and they replace repeated cognitive effort — not one-off creative work.
⏱️
The cost of picking wrong
A pilot that runs for 6 months on the wrong use case doesn't just cost the licensing — it costs credibility. Once a finance team has seen "AI didn't work here," getting buy-in for the next attempt is twice as hard.
🎯
The shape of a winning use case
High volume, repeated cognitive work, judgement-required (not pure math), tolerates a human review step, has bounded scope. Variance commentary fits. Calculating tax liability does not.
🚦
Lead with use cases, not technology
"We need to use AI" is the wrong starting point. "We spend 6 hours a week writing month-end variance commentary" is the right one. The use case picks the technology, never the other way around.
📈 What good looks like for finance: ExxonMobil saved 30,000 hours on a major capital project through AI document review. McKinsey estimates supply-chain cost reduction potential at $290B–$550B across industries. AWS reports engineers spend ~60% of their time searching for data — most of that is recoverable. These aren't AI numbers; they're finding the right use case numbers.
The "Lead with Use Cases" Test
Before any new AI initiative, run it through these four questions. If you can't answer all four, the use case isn't ready.
#
Question
What a good answer looks like
1
What is the specific business problem?
"Our AP team takes 5–10 minutes per invoice on extraction. We process 8,000/month. We want to cut that by 60%."
2
Who benefits, and how do we measure success?
"AP analysts get hours back. Success = avg time per invoice ≤ 90 sec by month 3, with ≤ 2% error rate."
3
Is the data available and reliable?
"Yes — we have 12 months of historical invoices and PO data in our ERP. Quality is good."
4
What's the ethical / safety / regulatory angle?
"Approval threshold is SGD 50K — anything above is human-only. Audit trail required for SOX. No PII in prompts."
💡 Reality check for finance leaders: Question 4 is where most finance pilots stall in audit review. Answer it before building, not after. The Day 1 Governance & Trust module (Module 7) covers this in depth.
GenAI vs Traditional ML vs Rules vs Manual
The most expensive mistake in AI adoption is using the wrong technique. GenAI is not a hammer for every nail. Here's the decision framework finance teams actually need.
Criteria
✨ Generative AI
📊 Traditional ML
📋 Rules / Scripts
👤 Manual
Best for
Narrative, summarization, classification with judgement, Q&A over documents, drafting
Numeric prediction, anomaly detection, pattern recognition from structured data
Deterministic calculations, lookups, threshold checks, standard procedures
Structured: tabular, time-series, labelled training data
Structured: lookup tables, formulas, decision trees
Whatever the human consumes
Output
Generated text, summaries, classifications with reasoning, drafts
Predictions, scores, classifications (no reasoning text)
Exact deterministic results, pass/fail flags
The judgement itself
Finance example
"Draft variance commentary for the May P&L vs forecast"
"Score this transaction 0–1 for fraud risk based on historical patterns"
"If invoice > SGD 50K, route to Head of Finance"
"Should we acquire this competitor?"
Accuracy posture
Good draft, human reviews
High precision (95%+ on the trained task)
Exact & auditable
The human is the audit trail
Explainability
Can cite sources (RAG); reasoning is probabilistic
Feature importance, partial
Fully transparent — every step traceable
Whatever the human writes down
Audit defensibility
Medium — needs human sign-off + source attribution
Medium — model card + drift monitoring required
High — every rule is in code
High — the human signs
Decision shortcuts
✨
Use GenAI when…
Large volumes of unstructured documents to read or summarize
Repetitive narrative work (commentary, disclosures, briefs)
Classification that needs context (this is RED because…)
Q&A over a body of policies, contracts, or regulations
Multi-step tasks involving judgement under ambiguity
📊
Use Traditional ML when…
Predicting future events from labelled historical data
Detecting anomalies in time-series (unusual spend, fraud patterns)
Need high precision with measurable confidence scores
The output is a number, score, or class — not text
Have ≥10K labelled examples to train on
📋
Use Rules / Scripts when…
Same input always produces same output (currency conversion, GST)
Threshold-based routing or escalation
Regulatory compliance checks with binary pass/fail
Volume too low to justify model spend
Audit trail must be 100% deterministic
👤
Keep Manual when…
The decision is irreversible (M&A, write-offs, executive sign-off)
No precedent or training data exists
Stakes are too high for any failure mode
Personal accountability is the point (CFO certification)
🤝 The hybrid is usually the answer. The strongest finance solutions combine techniques. ML detects an anomalous transaction → GenAI explains it in plain English. Rules enforce the SGD 50K threshold → GenAI drafts the escalation memo. Think of GenAI as the communication layer on top of precise computational systems.
Use Cases by Function
Tailored to the cohort. Each function has the highest-impact GenAI use cases — drawn from real finance org transformations and matched to the daily reality of your team. Click your function to expand.
Persona: Cluster Procurement Heads, Regional Category Heads, Category Managers (SG/MY/PH). The largest cohort in this workshop — 7 of 28 participants. Daily reality: vendor onboarding, contract review, supplier scorecards, RFP evaluation, spend analysis.
📑Contract clause review
Read a 40-page MSA and flag deviations from your standard template — payment terms, liability caps, IP clauses, indemnities. Generates a redline summary with severity ratings. Human signs final.
GenAI + RAGQuick Win
🏷️Supplier risk scorecards
Synthesize public-source signals (news, sanctions lists, financial filings) + your internal data into a GREEN/AMBER/RED rating with reasoning. Quarterly refresh, exception alerts in between.
Hybrid (ML + GenAI)High Impact
📊RFP / RFQ response evaluation
Score 12 vendor proposals against your evaluation criteria. Extract pricing, SLAs, certifications. Highlight gaps and outliers. Reduces 2-day evaluation to 2 hours of human review.
GenAI + RAGQuick Win
💬Negotiation prep briefs
"Brief me for tomorrow's call with Vendor X" — combines historical spend, current contract terms, recent service issues, market benchmarks. Replaces 3 hours of pre-meeting research.
When intercompany balances don't tie, GenAI reads both sides' GL postings + descriptions and proposes the most likely reason — timing diff, FX, missed accrual, classification mismatch. Speeds up dispute resolution.
Hybrid (Rules + GenAI)Quick Win
📝Period-close narrative
Auto-draft the month-end commentary: "Revenue +4.2% MoM driven by SG market expansion; opex flat; one-off SGD 800K legal accrual reversed in May." Pulls from your close pack data; human reviews and signs.
GenAI + Data PluginHigh Impact
🧾Journal entry review assistant
Check journal descriptions for completeness, flag entries with vague memos ("adjustment per K"), suggest the audit-trail-grade rewrite. SOX-friendly. Catches what a tired reviewer misses at hour 9 of close.
GenAIQuick Win
📚Statutory accounts disclosure draft
Draft notes-to-accounts disclosures from underlying movement schedules. Knows the IFRS / local-GAAP language. Speeds up the most tedious part of statutory filing season.
Auto-draft the "why did revenue miss by 6%" commentary. Combines actuals vs forecast deltas + driver-tree decomposition + qualitative context from Slack/email threads. The flagship FP&A use case.
GenAI + Data PluginFlagship
🎲Scenario narrative drafting
"Given +10% wage inflation in PH, model the impact and draft the board summary." Runs the model (Excel/Python) → GenAI writes the executive narrative. Hybrid use case showing the deterministic + generative pattern.
Hybrid (Script + GenAI)High Impact
📈Board pack first draft
Convert your monthly KPI dashboard into a 3-page CFO deck — exec summary, key drivers, risks & opportunities, ask of the board. Saves 4–6 hours of layout + first-pass writing per cycle.
GenAIQuick Win
🤝Business-partner Q&A prep
Before your business review with the BU lead, ask: "What 5 questions will they push back on?" GenAI mines historical reviews + current variance and gives you the likely challenges with backing data.
GenAIDaily Use
Persona: Heads + Senior Managers, Audit Innovation & Analytics, Technology Audit, Insure Audit, AnyCompany Integrity Unit. 5 participants. Will pressure-test hallucination examples — care most about defensibility, controls, and audit trail.
🔍Control testing memo drafting
Given a control description + sample evidence, draft the test memo: design effectiveness, operating effectiveness, exceptions, conclusion. Human signs. Speeds up routine SOX cycle work.
GenAI + RAGQuick Win
⚠️Anomaly explanation
ML model flags an unusual transaction → GenAI explains in plain English why it's unusual (vs the population), what controls should have caught it, and what evidence to request. The "translator" pattern.
Hybrid (ML + GenAI)High Impact
📋Sample selection narrative
Document the rationale for your sample selection (risk-based, attribute-based, monetary-unit). Explains the methodology to reviewers. Audit-defensible by design.
GenAIQuick Win
🚨Whistleblower / integrity case triage
First-pass triage of integrity reports: classify by category, summarize the allegation, suggest evidence to gather, route to the right investigator. Human always reviews — high-stakes domain.
GenAISensitive — Human Always Reviews
Persona: Head of Tax, Senior Tax Manager, Assistant Manager Tax Management. 3 participants. The single most "RAG-shaped" persona — daily work is reading regulations and writing positions. Tax bench is small but every member benefits.
📜Regulation tracking & impact summary
"Singapore IRAS just released a new GST circular — what changes for our merchant settlement flow?" GenAI reads the circular against your current tax positions and drafts the impact memo. Cross-jurisdictional version applies SEA-wide.
GenAI + RAGFlagship
🌏Treaty & transfer pricing research
"What's our withholding rate for a SG → VN service payment under the DTA?" Q&A grounded in your library of treaties, OECD guidelines, and historical positions. Cites the source clause every time.
GenAI + RAGFlagship
📝Tax position memo drafting
Convert your conclusion + supporting analysis into a structured tax memo (background, position, alternatives considered, citation, conclusion). The tax-firm format leaders recognise. Defensible & review-ready.
GenAI + RAGQuick Win
❓Internal tax helpdesk
Internal teams ask "is this expense GST-claimable?" or "do I need a withholding cert?" — GenAI answers from your tax policy library + recent rulings, with citations. Triages routine queries; escalates novel ones.
GenAI + RAGDaily Use
Persona: Head of Corporate Reporting, Group Reporting Managers (incl. Regional). 3 participants. Daily reality: disclosure drafting, MD&A, analyst-facing materials, reporting cycles for board / auditors / regulators.
📰MD&A first draft
From your KPI book + forecast variance, draft the "Management Discussion & Analysis" section in your house style. Addresses the questions analysts ask. Human revises & signs.
GenAI + RAGFlagship
🗂️Disclosure note drafting
Draft IFRS / local-GAAP notes from underlying movement schedules. Maintains consistency across periods. Speeds up the most format-heavy work in any reporting cycle.
GenAI + RAGQuick Win
🎤Analyst Q&A prep
"What will analysts ask on Tuesday's call?" Trained on past transcripts + this quarter's actuals + competitor performance. Generates likely questions and your strongest factual answer for each.
GenAIDaily Use
📑Cross-period consistency check
Compare this quarter's draft against the last 4 quarters' filed versions — flag tone shifts, numerical inconsistencies, vanished disclosures. The "auditor's first question" check.
GenAI + RAGQuick Win
Persona: Senior Manager, Finance & Treasury. 1 participant. Smaller bench but high-leverage role — cash forecasting, FX exposure, banking relationships, debt servicing.
💵Cash forecast commentary
Draft the weekly cash narrative — opening position, inflows/outflows by category, forecast vs actual variance, exception items requiring attention. Pairs with your existing cash model.
Hybrid (Script + GenAI)Quick Win
💱FX exposure briefings
"What's our SGD/IDR exposure this month, and what are the hedge implications if IDR moves ±5%?" Combines exposure data + market commentary into an executive brief.
Hybrid (ML + GenAI)High Impact
🏦Bank covenant compliance check
Read the loan agreement → check current ratios & metrics against covenants → flag headroom and breach risk → draft the compliance certificate language for the CFO to sign.
GenAI + RAGQuick Win
💼Bank pitch deck prep
Pulling together a refinancing or new-facility pitch — current capital structure, financial highlights, ratings rationale. GenAI structures the first draft from your standard data sources.
GenAIProject
Persona: Assistant Manager, Finance Data Solutions. 1 participant. The bridge persona between finance & IT — translates business questions into data work and back.
🗂️Ad-hoc query translator
Finance user asks "show me top 10 vendors by spend in Q2" — GenAI converts to SQL against your data warehouse, runs it, narrates the result. Reduces the queue of low-priority data requests.
GenAI + Data PluginFlagship
📊Dashboard narrative generation
Auto-generate the "what does this dashboard say" commentary that executives actually read. Updates as numbers refresh. Replaces the email with the screenshot.
Hybrid (Script + GenAI)Quick Win
🧰Data quality issue triage
When a metric jumps, GenAI cross-checks ETL logs, source-system changes, and recent dimension updates to suggest the most likely root cause. Speeds up the "why is this number wrong?" investigation.
Hybrid (Logs + GenAI)Quick Win
📚Data dictionary & lineage Q&A
"Where does 'net revenue' come from in our P&L?" Q&A over your data dictionary, lineage docs, and ELT definitions. Onboards new analysts in days, not weeks.
GenAI + RAGOnboarding
📌 Pattern across all 8 functions: Almost every flagship use case involves RAG (grounding to your documents) or hybrid (deterministic + GenAI). Pure GenAI alone is rare in finance — your data and your rules are too valuable to leave on the table. Module 9 (RAG) and the Day 2 agent build cover both patterns.
What GenAI Doesn't Do Well — Yet — for Finance
Knowing the failure modes is more useful than knowing the wins. Here's where finance leaders should not reach for GenAI, and what to do instead.
Don't use GenAI for…
Why it fails
What to use instead
Calculating tax liability
Requires exact arithmetic + interpretation of statute. GenAI may hallucinate a section number or miscompute. Tax authorities won't accept "the model said so."
Tax engine (rules) for the calculation; GenAI for the explanatory memo.
Reconciling balances to the cent
Floating-point summation, currency conversion, rounding rules. Exactness is the point. GenAI's "almost right" is wrong.
Recon engine (rules); GenAI to explain the resulting break.
Approval decisions above threshold
Material exposure means accountability stays with a named human. Audit and regulators expect it.
Hybrid: AI drafts the recommendation + reasoning; human signs.
Predicting fraud probability scores
Better solved by a trained classifier on labelled fraud history. GenAI doesn't know your fraud signature.
Traditional ML for the score; GenAI to explain why the score is high.
Anything novel without precedent
GenAI works from patterns it has seen. New regulation? New product? New jurisdiction? It will improvise — and improvisation in finance is risk.
Human first, AI second. Once you have 10–20 examples, revisit.
Real-time financial data feeds
The model's training data is static; it doesn't know today's rate, today's balance, today's posting. Without grounding, it makes up plausible numbers.
Connect to live data via plugins/MCP. Always cite the data timestamp.
Pure summarization of numbers
"Summarize this P&L" — a chart does this better. GenAI text adds nothing if the numbers themselves are the message.
Visualization. Use GenAI for the narrative around the numbers.
The five common failure modes — name them in your team
🎭
1. Confident hallucination
The model invents a regulation, a section number, a historical figure. Sounds right. Catch with: source attribution, RAG, cross-check.
📅
2. Stale knowledge
Model's training cutoff was months ago. Doesn't know about the new IRAS circular issued last week. Fix with: RAG over your current document library.
🔢
3. Arithmetic drift
Long multiplications, percentages, currency conversions — the model gets close but not exact. Always verify numbers; route them through a script.
💼
4. Tone & style drift
Without explicit style guidance, output sounds generic. Audit committee disclosures don't read like a marketing email. Fix: persona prompts + style examples.
🪞
5. Sycophancy
Models often agree with the framing of the question. "Is this control adequate?" gets a more positive answer than "Audit this control for adequacy and weakness." Frame for challenge, not agreement.
🔄
6. Inconsistency between runs
Same prompt, different answers. Acceptable for drafting; problematic for regulated outputs. Fix: low temperature, RAG grounding, structured-output schemas.
🛡️ The verification posture for finance. Every GenAI output that reaches a human or a system needs three things: (1) a citation to the source data, (2) a confidence statement when uncertainty is material, (3) a path back to the input the model received. If you can't supply those three things, the output isn't ready for finance use. Module 7 (Governance & Trust) builds this in detail.
Priority Matrix — Where to Start
You've got 28 use cases (across 8 functions) on the previous tab. You can't pursue all of them at once. The Value × Effort matrix below is the standard way leaders pick the first three.
Do first — flagship wins
Evaluate — high value, needs investment
Learning — try them to build team confidence
Avoid — wrong tool for the job
How to score your own use case
For each candidate, score Value (1–5) and Effort (1–5). Map onto the quadrants. Pick 1–3 from Do First for your first 90 days; 1 from Evaluate as a structured pilot; ignore Avoid.
Score this
1 (low)
5 (high)
Value — hours saved/week
< 2 hours
> 20 hours
Value — error reduction
Marginal
Material risk reduction
Value — strategic fit
Tangential
Critical to org priority
Effort — data readiness
Clean & available
Needs extraction & cleansing
Effort — process change
Drop-in tool
New SOP + training + change mgmt
Effort — governance
Low risk, no PII
Sensitive data, audit-grade trail required
🎯 The 90-day rule for finance: Pick one "Do First" use case. Run it for 90 days with a small team. Measure hard outcomes (time, error rate, satisfaction). Only after the first one ships do you start the second. Parallel pilots without a win first is the fastest way to lose credibility.
Implementation Phases — Realistic Timeline
Adoption isn't a switch. It's a maturity curve. Here's what good progress looks like for a finance team going from zero to running multiple agents — phased to manage risk and build confidence.
Phase 1
🌱 Quick Wins
Months 1–3
Replace repetitive narrative work with prompt templates. No agents, no automation, no IT involvement. Each individual saves 2–4 hours/week.
Variance commentary template (FP&A)
Period-close narrative template (Controllership)
Tax memo first-draft template (Tax)
Contract clause review template (Procurement)
Outcome: one prompt template per person, used daily
Phase 2
🔁 Operational Use
Months 3–6
Convert templates to Skills in Claude Cowork. Add Project Instructions for governance. Connect to one source of truth (e.g., your data warehouse via Plugins). Team-wide adoption.
Saved Skills replace ad-hoc prompting
Project Instructions enforce house rules (SGD default, no PII, escalation thresholds)
One agent runs daily on a single use case (e.g., invoice processing)
Outcome: 1 saved Skill per use case, 1 agent in production at L1–L2
Phase 3
🚀 Transformation
Months 6–12
Multiple agents in parallel. Scheduled Tasks running overnight. Human review by exception only. ML + GenAI hybrids on flagship workflows. Governance baked in from day one.
3–5 agents in production, each owning a workflow
Scheduled overnight runs reduce daily backlog
ML + GenAI hybrid on the highest-volume workflow (e.g., audit anomaly triage)
Outcome: team scope expanded; AI handles routine work; humans focus on judgement
What you should be doing differently in 12 months
Today (baseline)
12 months from now
Each analyst writes their own variance commentary from scratch
FP&A team uses a shared Skill; output is consistent and reviewed by exception
Tax answers come from individual research + Word memos
Tax helpdesk Skill answers 60% of routine queries with citations; tax bench focuses on novel positions
Procurement reviews contracts manually one at a time
Contract review Skill flags deviations in seconds; analysts work the exceptions
Audit's anomaly investigation starts at "look at the data"
ML flags it; GenAI explains it; auditor starts at "is the explanation reasonable?"
Period-close narrative drafted manually each cycle
Auto-drafted from close pack; reviewer adjusts and signs
What good adoption looks like — leading indicators
📊
Usage signals
People log into Cowork at least 3 days/week. Skills are activated daily. Project Instructions are updated when policy changes. Activity is the leading indicator of value.
⏱️
Outcome signals
Time-to-draft on flagship outputs (variance commentary, MD&A, tax memos) drops 40%+ within 6 months. Error rate stays flat or improves. Reviewer effort shifts from drafting to challenging.
🛡️
Governance signals
Audit trail captures inputs + outputs + reviewer. Policy violations are flagged not silently fixed. Quarterly review of active Skills catches stale ones. This is the difference between AI adoption and AI risk.
🎓 Where the rest of Day 1 + Day 2 fit. The remaining Day 1 modules give you the tools (M3–M6: how LLMs work + costs, M8–M9: prompt engineering + RAG). Module 7 (Governance) gives you the safety. Day 2 gives you the execution (build your first agent in Cowork). By the end of Day 2 you'll have a Phase 1 quick win running and a Phase 2 plan on paper.