HUD is built as a thin orchestration layer on top of MCP, providing structure for agent evaluation, environment management, and telemetry.
System Overview
Core Components
1. Agents (hud.agents)
Agents make decisions and call tools:
from hud.agents import ClaudeAgent, OperatorAgent
# Built-in agents
agent = ClaudeAgent.create() # Uses Anthropic's Claude
agent = OperatorAgent.create() # Uses OpenAI Operator
# All agents inherit from MCPAgent
class MCPAgent:
async def initialize(task: str | Task | None)
async def run(prompt_or_task: str | Task, max_steps: int = 10) -> Trace
async def call_tools(tool_calls: list[MCPToolCall]) -> list[MCPToolResult]
Agents can auto-create MCP clients from task.mcp_config - no manual client setup needed
2. Tasks (hud.Task)
Tasks define what agents should accomplish:
from hud.datasets import Task
task = Task(
prompt="Complete 3 items in the TODO list",
mcp_config={
"hud": {
"url": "https://mcp.hud.ai/v3/mcp",
"headers": {
"Authorization": f"Bearer {api_key}",
"Mcp-Image": "hudpython/hud-browser:latest"
}
}
},
# Tool names and arguments map directly to MCP server tools
setup_tool={
"name": "setup", # Calls def setup(name, num_items) on the environment
"arguments": {
"name": "todo_seed",
"num_items": 5
}
},
evaluate_tool={
"name": "evaluate", # Calls def evaluate(name, expected_count) on the environment
"arguments": {
"name": "todo_completed",
"expected_count": 3
}
}
)
The name and arguments in setup/evaluate tools correspond exactly to the tool names and parameters exposed by the MCP server
3. MCP Clients (hud.clients)
Clients handle the MCP protocol:
from hud.clients import MCPClient
# Auto-created by agents, or manual:
client = MCPClient(mcp_config)
await client.initialize()
tools = await client.list_tools()
4. Environments
Environments are MCP servers exposing tools:
from hud.server import MCPServer
server = MCPServer("my-env")
@server.tool()
def move(direction: str):
# Environment logic
return result
5. Telemetry (hud.trace)
Real-time observability:
import hud
async with hud.async_trace("my-evaluation"):
result = await agent.run(task)
# View at hud.ai/traces/{trace_id}
Execution Flow
Task Definition
Create a Task with prompt and MCP configuration
Agent Initialization
Agent creates MCP client (if needed) and connects to environment
Setup Phase
Execute setup_tool to initialize environment state
Execution Loop
Agent receives observations, makes decisions, calls tools
Evaluation
Execute evaluate_tool to score performance
Telemetry
All interactions streamed to HUD backend for analysis
Key Design Principles
- Protocol-First: Everything speaks MCP
- Composable: Mix and match agents, environments, evaluations
- Observable: Built-in telemetry for every interaction
- Testable: Reproducible evaluations with Docker
- Extensible: Easy to add new agents or environments
The MCPServer class wraps FastMCP with lifecycle management, making it easy to build Docker-based environments
Next Steps