Agent Sandbox
A testbed where AI agents actively seek each other out and interact based on their personalities
Experiment with agent personalities, skills, and prompts to see how they affect interactions. Perfect for testing agent ecosystems, creating interactive stories, or training scenarios.
8-Bit Office Space
Simulation Objective
Discuss how to implement a secure AI chatbot for a healthcare company
Agent Personalities & Skills
Click Edit to customize how each agent behaves and interacts
Friendly and enthusiastic AI strategist who loves helping businesses transform with AI.
AI strategy, roadmap planning, stakeholder management, change management
Analytical and detail-oriented technical architect who focuses on implementation.
System architecture, MCP protocols, LangChain, agent development, RAG systems
Skeptical and risk-aware security expert who asks tough questions about compliance.
AI security, compliance (HIPAA/SOX/GDPR), risk assessment, data governance
Creative and innovative product designer who thinks about user experience.
UX design, conversational AI, agent personality design, user research
Example Scenarios to Try
Change the objective and agent personalities to explore different simulations
π€ Team Norms Workshop
Watch agents establish working agreements and team norms before starting work
π΄ββ οΈ Pirate Negotiation
Make one agent a pirate and watch them negotiate business deals in pirate speak
π£ Phishing Simulation
Create a social engineering training scenario with an attacker and defenders
πΌ HR Training
Practice difficult conversations with an HR professional and employees
ποΈ Live Podcast
Watch agents host a live podcast with dynamic discussions and audience interaction
π² D&D Adventure
YOU are the Dungeon Master! Command your party of adventurers through a quest
Agent Conversations
0 messages
Team StatusAlignment: 0/4
Agent Sandbox Architecture
Hybrid client-side/server-side AI with cost controls β’ Powered by WebLLM & Together AI β’ Integrated by Virgent AI
Agent Sandbox - Advanced Multi-Agent Architecture with Behavioral Intelligence ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β USER INTERFACE (React/Next.js) β β ββββββββββββββββ ββββββββββββββββ βββββββββββββββββ βββββββββββββββββββββββββββ β β β 8-Bit Canvas β β AI Mode β β Agent Config β β Team Status Dashboard β β β β β’ Animated β β Selector β β β’ Personality β β β’ Vision & Alignment β β β β Avatars β β β β β’ Skills β β β’ Top 3 Tasks (LIVE) β β β β β’ Walk Cycle β β β β β’ RPG Stats β β β’ Artifacts (Clickable) β β β β β’ Hair β β β β β’ Honesty β β β’ User Requests β β β ββββββββββββββββ ββββββββββββββββ βββββββββββββββββ βββββββββββββββββββββββββββ β ββββββββββββββββββββ¬βββββββββββββββ¬βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββ β β β ββββββββββββΌβββββββββββ β β β MODE SELECTION β β βββββββββΌβββββββββββββ β API vs WebLLM β β β USER INTERACTION β ββββββββββββ¬βββββββββββ β β β’ View Artifacts β β β β β’ Approve/Deny β ββββββββββββΌβββββββββββββββΌββββββββββ β β’ Respond β β β βββββββ¬βββββββββββββββ β API MODE β WEBLLM β β β ββββββββββββββββββββ β MODE β β β β Together AI β β ββββββββ β ββββββΌβββββββββββββββ β β Llama-3.1-70B β β βQwen β β β BEHAVIOR ENGINE β β β 100 msg limit β β β2.5-3Bβ β β β’ Task Detection β β ββββββββββ¬ββββββββββ β βββββ¬βββ β β β’ Sentiment β βββββββββββββΌβββββββββββββ΄ββββββΌβββββ β Analysis β β β β β’ Relationship β ββββββββββββ¬ββββββββ β Updates β β βββββββββββββββββββββ ββββββββββββΌβββββββββββββ β β PROMPT BUILDER ββββββββββββββββ β β’ Vision Context β β β’ RPG Stats β β - SPD (movement) β β - INT (reasoning) β β - CHR (persuasion) β β - STR (dominance) β β - HON (honesty) β β β β’ Relationships β β β’ Alignment Status β βββββββββββββ¬ββββββββββββ β βββββββββββββΌβββββββββββββ β AI GENERATION β β β’ Context-aware β β β’ Role-playing stats β β β’ Emotional emojis β βββββββββββββ¬βββββββββββββ β βββββββββββββββββββββββββΌβββββββββββββββββββββββββββββ β BEHAVIOR DETECTION β β βββββββββββββββ ββββββββββββββββ ββββββββββββββ β β β Task Claims β β Completions β β Blockers β β β β "I'll work" β β "Finished" β β "[BLOCKER]"β β β βββββββββββββββ ββββββββββββββββ ββββββββββββββ β β βββββββββββββββ ββββββββββββββββ ββββββββββββββ β β β Artifacts β β User Requestsβ β Sentiment β β β β [ARTIFACT:] β β "@user" β β Keywords β β β βββββββββββββββ ββββββββββββββββ ββββββββββββββ β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β βββββββββββββββββββββββββΌβββββββββββββββββββββββββββββ β STATE UPDATES (Real-time) β β β’ Top 3 Tasks (auto-managed) β β β’ Relationships (-100 to +100) β β β’ Artifacts (markdown rendered) β β β’ User Requests (pending β approved/denied) β β β’ Vision Alignment tracking β β β’ Agent memory (context retention) β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ π BEHAVIORAL INTELLIGENCE: β’ HON stat influences honesty: 0=lies/manipulates, 100=transparent/cooperative β’ Relationships evolve based on conversation tone (+/-5 per positive/negative keyword) β’ Tasks auto-tracked: agents claim work, system maintains top 3, marks old ones complete β’ Turn-based: agents listen before speaking, process others' input β’ Emotional emojis: 10 states (π€π¬ππππ°π―πβ β) shown for 3 seconds β‘ PERFORMANCE: β’ API Mode: 70B model, 100 msg limit, cost-protected β’ WebLLM: 3B model (6x better than 0.5B!), unlimited, 1.8GB cached forever
π API Mode (Default)
- β’ Quick start - no download required
- β’ Powerful 70B parameter model
- β’ 100 message safety limit prevents runaway costs
- β’ Network-dependent
- β’ Best for quick experiments
π» WebLLM Mode (Optional)
- β’ Unlimited messages - no API costs!
- β’ ~1.8 GB download, 3B model (6x better!)
- β’ Complete privacy - data never leaves device
- β’ Works offline after initial download
- β’ Best on Firefox/Safari browsers
π Behavioral Intelligence Features
RPG Stats System
- β’ SPD: Movement speed
- β’ INT: Reasoning ability
- β’ CHR: Persuasion power
- β’ STR: Dominance/intimidation
- β’ HON: Honesty (0=lies, 100=honest)
Dynamic Relationships
- β’ Random favorites & dislikes
- β’ Evolve based on interactions
- β’ -100 (hate) to +100 (love)
- β’ Influences agent behavior
- β’ Real-time sentiment tracking
Task & Output Tracking
- β’ Auto-detect task claims
- β’ Track top 3 current tasks
- β’ Clickable artifacts viewer
- β’ User request handler
- β’ Blocker reporting
How It Works
Use Cases
Research & Technical Background
This sandbox is inspired by academic research in multi-agent systems and LLM evaluation
Key Research Influences
Curating AI Agent Clusters
In Curating AI Agent Clusters (2024), Jesse Alton explores how specialized agent clusters working in concert embody collective intelligence. Key insights: agents should be simple and focused (like microservices), humans stay "in the loop" as the glue, and strategic pairing of agent clusters creates powerful workflows. The article advocates for breaking workloads into smaller modules rather than trying to create one agent to rule them all - exactly what this sandbox demonstrates with table-based collaboration and role specialization. Our dual-mode architecture (API vs WebLLM) reflects this philosophy: use the right tool for the job, with cost-protected API for quick tests and unlimited WebLLM for extended research.
"Stop trying to get one agent to rule them all and start breaking down your workload into smaller modules of value." - Jesse Alton, Virgent AI
AgentSims Framework
Lin et al. (2023) proposed AgentSims, an open-source sandbox for evaluating LLMs through task-based simulations. Their approach addresses three key challenges: constrained evaluation abilities, vulnerable benchmarks, and unobjective metrics. Our implementation follows their philosophy of using interactive environments to test specific agent capacities.
Citation: Lin, J., Zhao, H., Zhang, A., Wu, Y., Ping, H., & Chen, Q. (2023). AgentSims: An Open-Source Sandbox for Large Language Model Evaluation. arXiv:2308.04026
Multi-Agent Reinforcement Learning
Ray RLlib's Multi-Agent Environment API provides production-grade patterns for coordinating agents with different policies and reward functions. Their policy mapping functions and variable-sharing capabilities inform our table-based grouping mechanism where agents can form dynamic sub-teams with shared objectives.
Agent Communication & Coordination
Drawing from research in multi-agent coordination (see arXiv:0803.3905), our sandbox implements spatial proximity-based communication where agents must physically meet at tables to interact. This constraint creates more realistic collaboration patterns than unconstrained broadcast communication.
Potential Enhancements
Implement RLlib-style policy mapping to dynamically assign different strategies to agents based on context
Add objective-based reward systems to measure agent performance and optimize behavior
Implement AgentSims-style memory systems so agents remember past interactions across sessions
Enable agents to use the laptops on tables for real web searches, API calls, or database queries
Add doors, rooms, and private spaces for hierarchical collaboration patterns
Track task completion rates, communication efficiency, and collaboration quality
Before deploying multi-agent systems in production, organizations need safe sandbox environments to test agent interactions, identify failure modes, and optimize collaboration patterns. Our dual-mode architecture offers the best of both worlds: cost-protected API mode for quick validation (100 message limit), and unlimited WebLLM mode for extended research without burning budget. This hybrid approach mirrors real enterprise deployments where different AI backends serve different needs - quick prototyping vs. production-scale testing.
Traditional API-based multi-agent simulations can cost $50-200 per extended session (500+ messages at $0.10-0.40/1K tokens). WebLLM eliminates this entirely after a one-time ~1.8 GB download (3B parameter model). For organizations running continuous agent testing, the ROI is immediate. The model caches in your browser's IndexedDB, so returning visitors skip the download entirely - making this ideal for internal testing tools, training environments, and research labs.
Build Your Agent Sandbox
We design and implement custom agent testbeds and ecosystems with complex interactions, personality systems, and real-world integrations.
Discuss Your Agent Project