HUD Documentation — Evaluations and RL Environments.

v4 separated environments (Docker containers) from evaluation logic (Task objects). v5 unifies everything in the Environment class—tools, setup, and scoring live together.

Deprecation Notice: LegacyTask, setup_tool, and evaluate_tool are deprecated in v0.5.0 and will be removed in v0.6.0 (no earlier than March 1st, 2026). Use Task.from_v4() for quick migration or @env.scenario() for new code.

Good News: Your Code Still Works

Environment inherits from MCPServer. Same API, same behavior. Just change the import:

# Before
from hud.server import MCPServer
mcp = MCPServer("my-env")

@mcp.tool()
def my_tool(): ...

mcp.run()

# After
from hud import Environment
env = Environment("my-env")

@env.tool()
def my_tool(): ...

env.run()

That’s it. Your Dockerfile, your tools, your run() call—all unchanged. Environment adds scenarios, connectors, and integrations on top.

Migration Path 1: Quick Conversion with Task.from_v4()

The fastest way to migrate existing v4 code—no changes to task definitions needed:

# BEFORE (deprecated in v0.6.0)
from hud.datasets import LegacyTask

legacy_task = LegacyTask(
    prompt="Navigate to google.com",
    mcp_config={"hud": {...}},
    setup_tool={"name": "navigate", "arguments": {"url": "https://google.com"}},
    evaluate_tool={"name": "check_url", "arguments": {}}
)

# AFTER - One-line conversion
from hud.eval import Task

task = Task.from_v4(legacy_task)  # Converts LegacyTask → Task
# Also works with: Task.from_v4(dict), Task.from_v4(json_string)

# Works the same with agents
agent = ClaudeAgent.create()
result = await agent.run(task)

Task.from_v4() automatically:

Runs setup_tool at the start of evaluation
Runs evaluate_tool at the end to compute reward
Preserves all existing behavior

Migration Path 2: Full Scenario Migration (Recommended)

For new code or when refactoring, migrate setup_tool and evaluate_tool to @env.scenario(). The rule is simple:

setup_tool code → before the first yield
evaluate_tool code → after the first yield

# BEFORE (deprecated in v0.6.0)
task = LegacyTask(
    prompt="What's the current URL?",
    mcp_config={"hud": {...}},
    setup_tool={"name": "navigate", "arguments": {"url": "https://google.com"}},
    evaluate_tool={"name": "check_url", "arguments": {"expected": "google.com"}}
)

# AFTER
from hud import Environment

env = Environment("browser").connect_hub("hud-evals/browser")

@env.scenario("navigate-google")
async def navigate_google():
    # ===== SETUP SECTION (replaces setup_tool) =====
    await env.call_tool("navigate", url="https://google.com")
    
    # ===== PROMPT (first yield) =====
    answer = yield "What's the current URL?"
    
    # ===== EVALUATE SECTION (replaces evaluate_tool) =====
    result = await env.call_tool("check_url", expected="google.com")
    
    # ===== REWARD (second yield) =====
    yield 1.0 if result else 0.0

# Create task from scenario
task = env("navigate-google")

Multiple setup_tool Calls

If you have multiple setup tools, just call them in sequence:

# BEFORE
setup_tool=[
    {"name": "navigate", "arguments": {"url": "..."}},
    {"name": "login", "arguments": {"user": "..."}},
    {"name": "go_to_page", "arguments": {"page": "settings"}}
]

# AFTER
@env.scenario("settings-test")
async def settings_test():
    # Multiple setup steps - just call them in order
    await env.call_tool("navigate", url="...")
    await env.call_tool("login", user="...")
    await env.call_tool("go_to_page", page="settings")
    
    answer = yield "Verify the settings page loaded correctly"
    
    result = await env.call_tool("check_settings")
    yield 1.0 if result else 0.0

Using with Built-in Agents

Built-in agents (ClaudeAgent, OpenAIAgent, etc.) work with both patterns:

from hud.agents import ClaudeAgent

agent = ClaudeAgent.create()

# Works with Task from scenario
result = await agent.run(env("navigate-google"))

# Works with Task.from_v4() conversion
result = await agent.run(Task.from_v4(legacy_task))

Optional: Bring Your Own Agent

v5 gives you the hud.eval() context manager for maximum flexibility:

async with hud.eval(env("checkout", product="laptop")) as ctx:
    # Use OpenAI, Anthropic, your own agent—whatever you want
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": ctx.prompt}],
        tools=ctx.as_openai_chat_tools()
    )
    
    # Handle tool calls, run your agent loop...
    await ctx.submit(response.choices[0].message.content)

print(ctx.reward)

The old ClaudeAgent and OperatorAgent still work—even with the new hud.eval() system. But now you’re not locked into a specific agent spec. Pair with the Gateway to use any model through one API.

Quick Reference

v4 (deprecated in v0.6.0)	v5
`LegacyTask(...)`	`Task.from_v4(...)` (quick) or `env("scenario", ...)` (recommended)
`setup_tool`	Code before first yield in `@env.scenario()`
`evaluate_tool`	Code after first yield in `@env.scenario()`
`MCPServer`	`Environment` (drop-in replacement)
`agent.run(task)`	Still works, or use `hud.eval()` for BYOA

Documentation Index

​Good News: Your Code Still Works

​Migration Path 1: Quick Conversion with Task.from_v4()

​Migration Path 2: Full Scenario Migration (Recommended)

​Multiple setup_tool Calls

​Using with Built-in Agents

​Optional: Bring Your Own Agent

​Quick Reference

Good News: Your Code Still Works

Migration Path 1: Quick Conversion with Task.from_v4()

Migration Path 2: Full Scenario Migration (Recommended)

Multiple setup_tool Calls

Using with Built-in Agents

Optional: Bring Your Own Agent

Quick Reference