Why your team's AI assistant hits a wall — and the four ideas that turn it into something that can actually do the work.
Your team uses Claude or AnyCompany GPT every day. They draft variance commentary, summarise contracts, classify expense categories. That's a chatbot. When the work involves checking a system, deciding what to do, and taking action across multiple steps — chatbots fall over. That gap is what agents are built to close.
Reading text, drafting text, transforming text. Single-turn, single-pass, single-output tasks where the human reads the answer and decides what to do next.
The moment the task needs fresh data ("what's the current FX rate?"), system access ("look up this PO"), or action ("flag this invoice for approval") — the chatbot can only describe what should happen, not make it happen.
An agent is an LLM plus tools, plus a reasoning loop. It can check live systems, decide which step is next, take that step, observe the result, and decide what to do next — all without re-prompting the human at each step.
Imagine a finance ops analyst asks an LLM: "Vendor #4521 says we processed an invoice for SGD 23,800 last Tuesday but the refund hasn't landed. The vendor wants confirmation today. What should I tell them?"
A standalone LLM can write a polite reply — but it cannot check the payment ledger, query the bank rail, verify the chargeback queue, or trigger a status update. Watch the failure cascade:
The temptation is to write a longer, more detailed prompt. That fixes nothing. The chatbot still has no eyes on your systems, no hands on your tools, and no working memory between calls. The wall is structural, not prompt-quality.
Agents didn't appear from a single breakthrough — they emerged from four innovations that, combined, broke through the chatbot wall. Click any card to see why each one matters.
The LLM can call your APIs and read the result.
Break a goal into steps, then run them.
Read PDFs, photos, screenshots — not just text.
The plumbing that runs the loop for you.
By default an LLM only produces text. Tool use changes the contract: the model is told "these functions exist — get_invoice(id), check_po(id), post_journal(entry) — call any of them when you need to". The LLM responds with a structured tool call; the framework executes it and feeds the result back. The loop continues until the goal is met.
For finance: Your existing systems — ERP queries, bank rail status, the chargeback ledger, the spend cube — become things the agent can use. The LLM does the reasoning ("which invoices need follow-up?"), your code does the action (the actual database query, the actual post). You keep control of the action layer.
AI for the enterprise has moved through four distinct phases. Each one keeps everything from the previous, then adds a capability. Click any phase to see what changed and how it shows up in finance.
The starting point. The model takes a prompt and returns a completion based on patterns it learned during training. Fast, cheap, and powerful for transformations like summarisation, classification, and drafting. But entirely isolated — no eyes on your systems, no hands.
Each phase pushes more of the work onto the system and less onto the human. That's the only axis that matters: how much of the loop is the system running, and how much do you still have to do yourself?
| Capability | 📝 LLM | 💬 Assistant | 🤖 Agent | 🌐 Agentic System |
|---|---|---|---|---|
| Reads your prompt | ✅ | ✅ | ✅ | ✅ |
| Remembers conversation | — | ✅ | ✅ | ✅ |
| Searches your documents | — | ✅ | ✅ | ✅ |
| Calls your systems | — | — | ✅ | ✅ |
| Plans multi-step work | — | — | ✅ | ✅ |
| Coordinates with other agents | — | — | — | ✅ |
Every limitation of the LLM-only world maps to something finance teams already do by hand. Agents turn each one into a candidate for automation — with humans staying in the approval loop.
| Scenario | LLM only | Agent NEW |
|---|---|---|
| "Where's our refund for INV-4521?" | Drafts a polite holding reply | Checks payment ledger + bank rail status, calculates revised settlement window, posts confirmation back to vendor portal |
| "Variance vs forecast — Q3 EBIT" | Summarises numbers you paste | Pulls actuals from the cube, joins to last forecast, writes the commentary, flags the three drivers worth a meeting |
| "Is this expense GST-claimable?" | Quotes general principles | Reads the receipt, cross-checks against your tax policy library, drafts the position memo with citations |
| "Approve invoice batch for AP run" | Tells you what to look for | Validates 200 invoices against POs, surfaces the 7 exceptions for human approval, queues the rest for payment |
Mapped to the eight functions in the room. Don't worry about which to pick yet — Day 2 ends with you choosing one for your team. These are the candidates worth knowing about.