HUD Documentation — Evaluations and RL Environments.

HUD is built as a thin orchestration layer on top of MCP, providing structure for agent evaluation, environment management, and telemetry.

System Overview

Core Components

1. Agents (`hud.agents`)

Agents make decisions and call tools:

from hud.agents import ClaudeAgent, OperatorAgent

# Built-in agents
agent = ClaudeAgent.create()      # Uses Anthropic's Claude
agent = OperatorAgent.create()    # Uses OpenAI Operator

# All agents inherit from MCPAgent
class MCPAgent:
    async def initialize(task: str | Task | None)
    async def run(prompt_or_task: str | Task, max_steps: int = 10) -> Trace
    async def call_tools(tool_calls: list[MCPToolCall]) -> list[MCPToolResult]

Agents can auto-create MCP clients from task.mcp_config - no manual client setup needed

2. Tasks (`hud.Task`)

Tasks define what agents should accomplish:

from hud.datasets import Task

task = Task(
    prompt="Complete 3 items in the TODO list",
    mcp_config={
        "hud": {
            "url": "https://mcp.hud.ai/v3/mcp",
            "headers": {
                "Authorization": f"Bearer {api_key}",
                "Mcp-Image": "hudpython/hud-browser:latest"
            }
        }
    },
    # Tool names and arguments map directly to MCP server tools
    setup_tool={
        "name": "setup",  # Calls def setup(name, num_items) on the environment
        "arguments": {
            "name": "todo_seed",
            "num_items": 5
        }
    },
    evaluate_tool={
        "name": "evaluate",  # Calls def evaluate(name, expected_count) on the environment
        "arguments": {
            "name": "todo_completed",
            "expected_count": 3
        }
    }
)

The name and arguments in setup/evaluate tools correspond exactly to the tool names and parameters exposed by the MCP server

3. MCP Clients (`hud.clients`)

Clients handle the MCP protocol:

from hud.clients import MCPClient

# Auto-created by agents, or manual:
client = MCPClient(mcp_config)
await client.initialize()
tools = await client.list_tools()

4. Environments

Environments are MCP servers exposing tools:

from hud.server import MCPServer

server = MCPServer("my-env")

@server.tool()
def move(direction: str):
    # Environment logic
    return result

5. Telemetry (`hud.trace`)

Real-time observability:

import hud

async with hud.async_trace("my-evaluation"):
    result = await agent.run(task)
    # View at hud.ai/traces/{trace_id}

Execution Flow

Task Definition

Create a Task with prompt and MCP configuration

Agent Initialization

Agent creates MCP client (if needed) and connects to environment

Setup Phase

Execute setup_tool to initialize environment state

Execution Loop

Agent receives observations, makes decisions, calls tools

Evaluation

Execute evaluate_tool to score performance

Telemetry

All interactions streamed to HUD backend for analysis

Key Design Principles

Protocol-First: Everything speaks MCP
Composable: Mix and match agents, environments, evaluations
Observable: Built-in telemetry for every interaction
Testable: Reproducible evaluations with Docker
Extensible: Easy to add new agents or environments

The MCPServer class wraps FastMCP with lifecycle management, making it easy to build Docker-based environments

Get Started

Core Concepts

SDK Reference

Environments

HUD Gateway

Beta Features

Agents

CLI Reference

Community

Architecture

System Overview

Core Components

1. Agents (`hud.agents`)

2. Tasks (`hud.Task`)

3. MCP Clients (`hud.clients`)

4. Environments

5. Telemetry (`hud.trace`)

Execution Flow

Key Design Principles

Next Steps

Evaluate Agents

Build Environments

Get Started

Core Concepts

SDK Reference

Environments

HUD Gateway

Beta Features

Agents

CLI Reference

Community

​System Overview

​Core Components

​1. Agents (hud.agents)

​2. Tasks (hud.Task)

​3. MCP Clients (hud.clients)

​4. Environments

​5. Telemetry (hud.trace)

​Execution Flow

​Key Design Principles

​Next Steps

Evaluate Agents

Build Environments

System Overview

Core Components

1. Agents (`hud.agents`)

2. Tasks (`hud.Task`)

3. MCP Clients (`hud.clients`)

4. Environments

5. Telemetry (`hud.trace`)

Execution Flow

Key Design Principles

Next Steps