โ† Back to Workshop Home

Day 2 Afternoon โ€” Agent Design & AI Judge Competition

๐Ÿค– Agent Design Canvas

Design an AI agent for a real process from your team. Submit for AI Judge scoring. The canvas becomes your brief to the tech team.

โฑ 45 minutes ยท Claude Cowork ยท No coding ยท Pairs

Why This Exercise Matters

This morning you learned the 4 workflow patterns and built your first simple agent. Now you'll design a bigger, more strategic agent for a real process from your team โ€” the kind of automation that saves hours per week.

The Agent Design Canvas is a one-page strategic document โ€” not code, not a technical spec. It captures: what the agent does, how it works, what it must never do, and what success looks like. This is the document you hand to your tech team and say: "Build this."

What You'll DoTime
1๏ธโƒฃPick a workflow from your team5 min
2๏ธโƒฃChoose the right workflow pattern3 min
3๏ธโƒฃFill the Agent Design Canvas (with Claude's help)25 min
4๏ธโƒฃSubmit for AI Judge scoring + leaderboard5 min
5๏ธโƒฃTranslate to Claude Cowork (your next step)7 min
๐ŸŽฏ Deliverable: A completed Agent Design Canvas (markdown) scored by AI Judge. Top designs get recognized. Your canvas becomes the brief you share with your tech team next week.

๐Ÿ” How AI Judge Scores Your Canvas (Full Transparency)

Before you start, here's exactly what the AI evaluates. No hidden criteria โ€” design for these 6 dimensions:

DimensionWhat the AI Looks ForScore
Problem Clarity Is the mission one clear sentence? Can someone outside your team understand what this agent does? Is the trigger event specific (not "when needed")? / 5
Pattern Fit Does the chosen pattern (chaining/parallel/routing/orchestration) actually match the workflow? Are the steps logical and complete? Would a different pattern work better? / 5
Guardrails & Safety Are there clear "must NOT" rules? Are escalation thresholds specific (numbers, not vague)? Is there a human-in-the-loop for high-stakes decisions? / 5
Implementation Readiness Could a tech team start building from this? Are data sources named? Are skills/steps specific enough to implement? Is the autonomy level realistic? / 5
Business Impact Is time savings quantified (hours/week, not "saves time")? Are success metrics measurable? Is there a realistic first milestone? / 5
AnyCompany Relevance Is this grounded in AnyCompany's actual operations? Does it reference real data (SGD, merchants, PayLater, markets)? Would this actually help the CFO Office? / 5

๐Ÿ’ก Scoring Guide

  • 25โ€“30: ๐Ÿฅ‡ Production-Ready โ€” your tech team could start building this week
  • 19โ€“24: ๐Ÿฅˆ Strong Design โ€” solid foundation, address the feedback and it's ready
  • 13โ€“18: ๐Ÿฅ‰ Good Start โ€” right direction, needs more specificity
  • Below 13: ๐Ÿ”„ Needs Rework โ€” add more detail, check the AI feedback
Pro tip: The same principles from Day 1 apply โ€” be specific (not vague), ground everything in data (not assumptions), include guardrails (not just the happy path), and quantify impact (not just "saves time").
1 Pick Your Workflow

Step 1: Choose a Real Process

Click a workflow below โ€” or pick your own. Choose something your team does at least weekly that takes at least 30 minutes each time.

๐Ÿ“„ Invoice Processing
Extract โ†’ validate against PO โ†’ flag mismatches โ†’ route for approval
Recommended: ๐Ÿ”— Chaining
๐Ÿ” Fraud Monitoring
Analyze from 3 angles simultaneously โ†’ combine โ†’ generate case file
Recommended: โšก Parallelization
๐Ÿ’ฌ Customer Complaints
Classify type โ†’ route to correct team โ†’ draft response
Recommended: ๐Ÿ”€ Routing
๐Ÿ’ณ Credit Applications
Assess โ†’ auto-decide if small โ†’ human review if large
Recommended: ๐ŸŽฏ Orchestration
๐Ÿ“Š Monthly Reporting
Pull data โ†’ generate narrative โ†’ create presentation
Recommended: ๐Ÿ”— Chaining
๐Ÿ“‹ Regulatory Updates
Scan circular โ†’ assess impact โ†’ notify compliance
Recommended: ๐Ÿ”— Chaining
2 Choose Pattern

Step 2: Confirm the Workflow Pattern

The recommended pattern is pre-selected. Change it if you think a different one fits better:

๐Ÿ”— Chaining
A โ†’ B โ†’ C โ†’ D
โšก Parallelization
3 views โ†’ combine
๐Ÿ”€ Routing
Classify โ†’ right path
๐ŸŽฏ Orchestration
If/then + human gates
3 Fill the Canvas

Step 3: Design Your Agent

Open Claude Cowork and use the hints below to fill each section of the canvas. Don't copy-paste a prompt โ€” write it yourself using the hints. The AI Judge rewards specificity and original thinking.

๐ŸŽฏ The Canvas Template โ€” Fill Each Section

Ask Claude to help you complete this canvas. Give it your workflow choice and pattern, then work through each section together:

Section 1: Agent Mission

Hints

  • One sentence: "This agent [does what] for [whom] by [how]"
  • Include the trigger: what event starts it? (new file arrives, weekly schedule, manual request)
  • Name it something descriptive โ€” "Invoice Validator" not "Agent 1"

Section 2: Workflow Steps

Hints

  • List 3โ€“5 steps. Each step = one clear action with one clear output
  • For each step: what goes IN, what comes OUT, what could go wrong?
  • Label which pattern each step uses (if combining patterns)
  • Think: which steps need AI judgment vs which are just data lookup?

Section 3: Data Requirements

Hints

  • Inputs: What data does the agent need? Be specific โ€” "vendor_invoices.pdf" not "some files"
  • Outputs: What does it produce? Report, notification, decision, dashboard?
  • Knowledge: What reference documents should it always have access to? (policies, thresholds, templates)

Section 4: Guardrails & Escalation

Hints โ€” This is where leaders add the most value

  • Must NOT: What should the agent NEVER do? (auto-approve above $X, share PII, skip checks)
  • Escalate when: Specific thresholds โ€” "$25K+", "confidence below 80%", "3+ risk flags"
  • Autonomy level: Start at L1 (suggest only) or L2 (act on routine, ask on exceptions)?
  • Think: what would make you uncomfortable if the agent did it without asking?

Section 5: Business Impact

Hints

  • Current state: How many hours/week does this take today? How many people?
  • With agent: Estimate time savings (be realistic โ€” 60-80% reduction, not 100%)
  • Success metrics: 2-3 measurable KPIs (processing time, error rate, throughput)
  • First milestone: What's the smallest version that proves value in 2 weeks?
โœ… Checkpoint: Your canvas should be 300-500 words. Every section filled. Specific numbers, not vague language. If you can't quantify something, say "estimate: X" โ€” that's better than leaving it blank.
4 Submit & Score

Step 4: Submit for AI Judge Scoring

Paste your completed canvas below. The AI Judge (Claude on Amazon Bedrock) evaluates it against the 6 dimensions shown above and returns a score with specific feedback.

๐Ÿ“ค Submit Your Canvas

Resubmitting with the same name replaces your previous entry โ€” iterate and improve!

๐Ÿ’ก How the AI Judge works (behind the scenes)

Your canvas is sent to Amazon Bedrock (Claude Sonnet). The model receives a structured rubric with the 6 scoring dimensions and evaluates your canvas against each one. It returns:

  • A score (1-5) per dimension with brief justification
  • Top strengths โ€” what you did well
  • Specific improvements โ€” what would raise your score

This is the same LLM-as-Judge technique from Day 1 โ€” using AI to evaluate AI outputs. You can resubmit as many times as you want. Iterate based on the feedback.

5 Translate to Cowork

Step 5: From Canvas to Claude Cowork

Your canvas is the design. Here's how each section translates to something you can build in Claude Cowork today:

๐Ÿ  Guardrails โ†’ Project Instructions

Your "Must NOT" rules and escalation thresholds become the Custom Instructions in your Cowork project. These apply to every conversation.

Canvas section โ†’ Cowork: Project Settings โ†’ Custom Instructions

๐Ÿ“‹ Workflow Steps โ†’ Knowledge Files

Each workflow step becomes a Knowledge file uploaded to the project. Include the step instructions, expected input/output format, and quality criteria.

Canvas section โ†’ Cowork: Project โ†’ Add Knowledge

๐Ÿ“Š Data Requirements โ†’ Uploaded Files

Reference documents (policies, thresholds, templates) get uploaded as Knowledge files. Live data gets pasted into conversations.

Canvas section โ†’ Cowork: Project โ†’ Add Knowledge + paste data in chat

โฐ Trigger โ†’ Scheduled Task

If your trigger is time-based ("every Monday"), set up a Scheduled Task in Cowork. If event-based ("when file arrives"), you'll need your tech team's help later.

Canvas section โ†’ Cowork: Scheduled Tasks (time-based) or manual trigger

๐ŸŽฏ Your homework: Set this up after the workshop

  • Create a new Claude Cowork project named after your agent
  • Paste your guardrails into Custom Instructions
  • Upload your reference documents as Knowledge files
  • Write a "kickoff prompt" that runs your workflow steps in sequence
  • Test with one real example from last week

That's your first working agent โ€” steering + skills, no code required. Add a Scheduled Task when you're ready for automation.

๐Ÿ† Leaderboard

All submissions ranked by score. Click "Refresh" to see new entries.

Loading leaderboard...