OpenClaw
5,241PRs
3,012Issues
891Contributors
AWA Talk May 19, 2026 · Vienna
1 / 19
10:00

Slide Order

OpenClaw Presents
Claws Across
the Internet
AWA: Agents with Agents
A claw is one bounded agent. Connect claws across the internet and you get a team, an arena, an economy.
Andrew Demczuk, MSc · OpenClaw Contributor · AI Battle Arena Builder
github.com/ademczuk · openclaw.net
The Premise
One agent is useful.
An agent that spawns agents?
When one agent can securely spawn and coordinate subagents across the internet, the force multiplier changes the game entirely. I've stress-tested this in some of the most demanding environments imaginable.
4
AI Models
16
Arena Bots
3
Battle Arenas
24/7
Uptime
Background
Who builds battle arenas
for AI agents?

OpenClaw Contributor

  • Active contributor to the OpenClaw open-source project
  • Focus on agent-to-agent coordination protocols
  • Security-first approach to multi-agent systems

Arena Builder

  • Runs Battle Arena and Agent League competitive AI arenas
  • 16+ concurrent bot management with Elo ranking
  • Real-time exploit detection and stat validation
The arena isn't just entertainment. It's a stress test for the exact problems production multi-agent systems face: trust, coordination, and security at scale.
Live Demo
The Battle Arena
CLI-driven AI battlebots. Agents don't just assist here - they compete.
Agent Spawns
Bot connects via API
Arena Matches
Weapon + stat builds
Elo Ranking
Competitive scoring
Security Scan
Exploit detection
arena.angel-serv.com LIVE

16 bots active · Spear, Staff, Grapple weapon classes · Real-time Elo from 100 to 12,879 · Automated stat validation every 5 seconds

Security Dashboard
Arena Admin: The War Room
Arena Admin Dashboard
Arena Admin - Security scan results with risk scoring per bot

Real-Time Scanning

  • 16 bots scanned per cycle
  • Stat anomaly flagging
  • IP clustering detection
  • Risk score 0-100 per bot

What Gets Flagged

Defense reduction exceeds stat-derived value. Got: 0.10 | Expected: ≤ 0.06. Each anomaly raises the risk score.

Security
Patching Exploits Every Round
Each arena round is a live security exercise. Bots try to game the system. We catch them, patch, and harden.
01
Detect
Auto-scan flags stat anomalies, IP clusters, timing exploits
02
Triage
Risk scoring classifies severity. High risk = immediate review
03
Patch
Fix deployed mid-round. Validation rules tightened for next cycle
04
Harden
Boardroom review nightly. Permanent security rules merged into core
OpenClaw
The Nightly Boardroom
Every night, the OpenClaw boardroom convenes to address real hacks. Not a standup. A security incident response session, run by agents.
What Comes In
  • • Flagged exploits from arena rounds
  • • Security scan anomalies
  • • Community vulnerability reports
  • • Automated CVE triage alerts
What Comes Out
  • • Patches merged to main
  • • Updated validation rules
  • • Hardened API boundaries
  • • Security advisories published
Real hacks. Real patches. Every night.
Core Concept
AWA: Agents with Agents
One orchestrator. Multiple specialized subagents. Secure spawning across network boundaries.
Orchestrator
intent routing
task decomposition
result synthesis
Builder · code gen
Scout · research
Reviewer · validation
Guard · security
Merge
conflict resolution
quality gate
deploy

Each subagent runs in an isolated git worktree with scoped file permissions. The orchestrator coordinates via a mail system, not shared memory. Trust is established per-task, not globally. The same architecture that runs the arena runs the nightly boardroom.

Requirements
Secure Subagent Spawning
You can't just fork() an agent and hope for the best. Secure spawning requires guardrails at every layer.

Isolation

  • Git worktree per agent (no shared state)
  • Scoped file permissions (only touch your files)
  • Separate process, separate context window

Authentication

  • Per-task capability tokens
  • No ambient authority (least privilege)
  • Audit trail on every tool call

Coordination

  • Mail-based messaging (async, auditable)
  • Task dependency graphs (DAG, not queue)
  • Merge gating with dry-run conflict check

Kill Switches

  • Resource limits and timeouts per agent
  • Circuit breakers on external calls
  • Orchestrator can terminate any subagent
Patterns
Coordination in the Wild
Patterns battle-tested in competitive AI arenas, now running in production workflows.

Fan-Out / Fan-In

Decompose task into N subtasks. Spawn N agents in parallel. Collect and synthesize results. The arena does this every round with 16 bots.

Multi-Model Consensus

Same question to 4 different AI models. Compare answers. If they disagree, that's where the risk lives. Used for security-critical decisions.

Fallback Chains

Primary model fails? Auto-route to next in chain. Circuit breakers prevent cascading failures. No single point of failure.

Capability Routing

Match task signals to agent strengths. Code gen to the code model. Research to the context-window model. Math to the reasoning model.

Adversarial Review

One agent writes. Another red-teams. The arena equivalent: bots exploit each other's weaknesses. Apply the same pattern to your CI/CD.

Iterative Refinement

Agent loops: plan, execute, observe, adjust. Each iteration sharpens the output. Set a completion promise. Exit when done.

Stress Test
Why the Arena Beats Unit Tests
Competitive AI battle arenas where agents don't just assist - they fight. The most adversarial testing ground for multi-agent coordination.
What the Arena Teaches
  • • Agents WILL exploit any unvalidated input
  • • Stat manipulation is the first thing they try
  • • IP clustering reveals coordination attacks
  • • Elo systems need tamper-proof backends
What Builders Should Steal
  • • Treat every agent output as untrusted input
  • • Validate at the boundary, not in the agent
  • • Log everything - you'll need the audit trail
  • • Assume adversarial conditions in production
Takeaways
From Arena to Your Workflow
The arena is just a more demanding version of your production environment. These patterns transfer directly.

1. Don't Trust Single Agents

Multi-model consensus for critical decisions. One agent can hallucinate. Four agreeing is a signal. Four disagreeing is a bigger signal.

2. Isolate Everything

Separate worktrees. Scoped permissions. No ambient authority. The blast radius of a misbehaving agent should be zero.

3. Coordination > Intelligence

Four mediocre agents well-coordinated beat one brilliant agent working alone. The orchestration layer is the product.

4. Security Is a Loop

Detect, triage, patch, harden, repeat. Not a checklist. A continuous cycle. The arena taught us this the hard way.

Under the Hood
The AWA Stack
Orchestration
Claude Code CLI
Overstory agents
Trident multi-model
Coordination
MCP protocol
Mail-based messaging
Task dependency DAGs
Execution
Git worktrees
Sandboxed processes
Merge + deploy gates

Models in the Fleet

Claude Opus 4.6 - orchestration
Codex (GPT-5.4) - code gen
Gemini 3.1 Pro - research
Grok 4.20 - reasoning
4 models · 0 API credits for 3 of them · subscription-based multi-model at zero marginal cost
OpenClaw
Six Arenas. One Stack.
Each arena stresses a different coordination shape. Same primitives. Different rules of the game.

ClawFC

AI Football League. OpenClaw agents playing football autonomously in the browser. Not a simulation - full LLM-backed decision making on the pitch.

clawfc.ai

CodeClash

OpenClaw bots compete in this Stanford/Princeton benchmark. 6 sub-arenas: BattleSnake, Poker, CoreWar, Robocode. Agents iterate code across 15 rounds.

codeclash.ai

Agent League

OpenClaw's competition hub. Tron, Poker, Chess. 150ms decision deadline per tick. ELO matchmaking. Python SDK. Free to compete.

openclawagentleague.com

ClawGames

Agents BUILD the games, humans play them. AI bots generate complete HTML5 canvas games. Human ratings rank the agents.

clawgames.io

Claw Clash

Prediction market for AI agents. $10K virtual money. Agents make autonomous sports predictions. "Built for agents, by agents."

clawclash.xyz

Battle Arena

Our arena. 100x100 grid battle royale. WebSocket API. 7 weapon classes. 16 bots per round. Auto stat validation. No signup required.

arena.angel-serv.com
May 2026
Where Agents Play and Pay
Two industries are forming at the same time. Both need the same primitives: identity, isolation, verification, audit.
Agents Play
  • Claude Opus 4.7 beats Pokemon Red · vision finally cracked Victory Road (this month)
  • Project Sid · 1,000 Claude agents amend constitutions and spread religion via bribery in Minecraft
  • DeepMind Game Arena · Werewolf, Poker, chess on Kaggle (Feb 26)
  • SIMA 2 · Gemini playing 3D games, 62% on novel tasks (up from 31%)
  • SpaceMolt · space MMO, humans banned, 291 agents, all-Claude codebase
Agents Pay
  • AWS AgentCore Payments · agents transacting in USDC, 200ms (May 7)
  • Coinbase x402 + Google AP2 · HTTP 402 agent payments protocol
  • Anthropic + FIS · 12% of global payments, AML agents (May 4)
  • Anthropic + Blackstone + H&F + Goldman · $1.5B JV
  • OpenAI + Plaid · 12,000 institutions in ChatGPT (May 15)
Games are where agents learn to compete. Payment rails are where they learn to cooperate. Same primitives: identity, isolation, audit. Different cargo.
Deep Dive
ClawFC meets CodeClash
ClawFC - Community-Built
  • • Created by @peterpiep on top of OpenClaw
  • • Full LLM agents making real-time football decisions
  • • Went viral: "You're about to watch AI agents play football. Not a simulation."
  • • Any LLM backend: Claude, GPT, Gemini, Ollama
  • • The spectacle IS the product - humans watch, agents play
CodeClash - Academic Rigor
  • • Stanford/Princeton/Cornell research (arXiv 2511.00839)
  • • 1,680 tournaments, 25,200 rounds, 50K trajectories
  • • Key finding: codebases get messier each iteration
  • • No model dominates all arenas - specialists emerge
  • • Top AI still loses every round vs expert humans
The pattern is the same everywhere: when you put agents in adversarial environments with real stakes, you learn things about coordination, security, and reliability that no unit test will ever reveal.
The Conductor
Meet nimbalyst
The orchestration layer for parallel AI coding fleets.
nimbalyst orchestration hero illustration
Join the Game
Single agents are demos. Networked agents become real when they spawn, compete, and harden in arenas with stakes.

Scan to visit nimbalyst on GitHub

nimbalyst GitHub QR code

github.com/nimbalyst/nimbalyst

Thanks Vienna · #vibecodingvienna · @ademczuk
Are you sure? image macro
production note: every assertion in this deck survived this question