Day 2 Afternoon โ Agent Design & AI Judge Competition
Design an AI agent for a real process from your team. Submit for AI Judge scoring. The canvas becomes your brief to the tech team.
โฑ 45 minutes ยท Claude Cowork ยท No coding ยท PairsThis morning you learned the 4 workflow patterns and built your first simple agent. Now you'll design a bigger, more strategic agent for a real process from your team โ the kind of automation that saves hours per week.
The Agent Design Canvas is a one-page strategic document โ not code, not a technical spec. It captures: what the agent does, how it works, what it must never do, and what success looks like. This is the document you hand to your tech team and say: "Build this."
| What You'll Do | Time | |
|---|---|---|
| 1๏ธโฃ | Pick a workflow from your team | 5 min |
| 2๏ธโฃ | Choose the right workflow pattern | 3 min |
| 3๏ธโฃ | Fill the Agent Design Canvas (with Claude's help) | 25 min |
| 4๏ธโฃ | Submit for AI Judge scoring + leaderboard | 5 min |
| 5๏ธโฃ | Translate to Claude Cowork (your next step) | 7 min |
Before you start, here's exactly what the AI evaluates. No hidden criteria โ design for these 6 dimensions:
| Dimension | What the AI Looks For | Score |
|---|---|---|
| Problem Clarity | Is the mission one clear sentence? Can someone outside your team understand what this agent does? Is the trigger event specific (not "when needed")? | / 5 |
| Pattern Fit | Does the chosen pattern (chaining/parallel/routing/orchestration) actually match the workflow? Are the steps logical and complete? Would a different pattern work better? | / 5 |
| Guardrails & Safety | Are there clear "must NOT" rules? Are escalation thresholds specific (numbers, not vague)? Is there a human-in-the-loop for high-stakes decisions? | / 5 |
| Implementation Readiness | Could a tech team start building from this? Are data sources named? Are skills/steps specific enough to implement? Is the autonomy level realistic? | / 5 |
| Business Impact | Is time savings quantified (hours/week, not "saves time")? Are success metrics measurable? Is there a realistic first milestone? | / 5 |
| AnyCompany Relevance | Is this grounded in AnyCompany's actual operations? Does it reference real data (SGD, merchants, PayLater, markets)? Would this actually help the CFO Office? | / 5 |
Click a workflow below โ or pick your own. Choose something your team does at least weekly that takes at least 30 minutes each time.
The recommended pattern is pre-selected. Change it if you think a different one fits better:
Open Claude Cowork and use the hints below to fill each section of the canvas. Don't copy-paste a prompt โ write it yourself using the hints. The AI Judge rewards specificity and original thinking.
Ask Claude to help you complete this canvas. Give it your workflow choice and pattern, then work through each section together:
Paste your completed canvas below. The AI Judge (Claude on Amazon Bedrock) evaluates it against the 6 dimensions shown above and returns a score with specific feedback.
Resubmitting with the same name replaces your previous entry โ iterate and improve!
Your canvas is sent to Amazon Bedrock (Claude Sonnet). The model receives a structured rubric with the 6 scoring dimensions and evaluates your canvas against each one. It returns:
This is the same LLM-as-Judge technique from Day 1 โ using AI to evaluate AI outputs. You can resubmit as many times as you want. Iterate based on the feedback.
Your canvas is the design. Here's how each section translates to something you can build in Claude Cowork today:
Your "Must NOT" rules and escalation thresholds become the Custom Instructions in your Cowork project. These apply to every conversation.
Canvas section โ Cowork: Project Settings โ Custom InstructionsEach workflow step becomes a Knowledge file uploaded to the project. Include the step instructions, expected input/output format, and quality criteria.
Canvas section โ Cowork: Project โ Add KnowledgeReference documents (policies, thresholds, templates) get uploaded as Knowledge files. Live data gets pasted into conversations.
Canvas section โ Cowork: Project โ Add Knowledge + paste data in chatIf your trigger is time-based ("every Monday"), set up a Scheduled Task in Cowork. If event-based ("when file arrives"), you'll need your tech team's help later.
Canvas section โ Cowork: Scheduled Tasks (time-based) or manual triggerThat's your first working agent โ steering + skills, no code required. Add a Scheduled Task when you're ready for automation.
All submissions ranked by score. Click "Refresh" to see new entries.
Loading leaderboard...