Skip to main content

Documentation Index

Fetch the complete documentation index at: https://hud-f5fd7c15-feat-agent-orchestrator-cookbook.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

v4 separated environments (Docker containers) from evaluation logic (Task objects). v5 unifies everything in the Environment class—tools, setup, and scoring live together.
Deprecation Notice: LegacyTask, setup_tool, and evaluate_tool are deprecated in v0.5.0 and will be removed in v0.6.0 (no earlier than March 1st, 2026). Use Task.from_v4() for quick migration or @env.scenario() for new code.

Good News: Your Code Still Works

Environment inherits from MCPServer. Same API, same behavior. Just change the import:
# Before
from hud.server import MCPServer
mcp = MCPServer("my-env")

@mcp.tool()
def my_tool(): ...

mcp.run()
# After
from hud import Environment
env = Environment("my-env")

@env.tool()
def my_tool(): ...

env.run()
That’s it. Your Dockerfile, your tools, your run() call—all unchanged. Environment adds scenarios, connectors, and integrations on top.

Migration Path 1: Quick Conversion with Task.from_v4()

The fastest way to migrate existing v4 code—no changes to task definitions needed:
# BEFORE (deprecated in v0.6.0)
from hud.datasets import LegacyTask

legacy_task = LegacyTask(
    prompt="Navigate to google.com",
    mcp_config={"hud": {...}},
    setup_tool={"name": "navigate", "arguments": {"url": "https://google.com"}},
    evaluate_tool={"name": "check_url", "arguments": {}}
)

# AFTER - One-line conversion
from hud.eval import Task

task = Task.from_v4(legacy_task)  # Converts LegacyTask → Task
# Also works with: Task.from_v4(dict), Task.from_v4(json_string)

# Works the same with agents
agent = ClaudeAgent.create()
result = await agent.run(task)
Task.from_v4() automatically:
  • Runs setup_tool at the start of evaluation
  • Runs evaluate_tool at the end to compute reward
  • Preserves all existing behavior
For new code or when refactoring, migrate setup_tool and evaluate_tool to @env.scenario(). The rule is simple:
  • setup_tool code → before the first yield
  • evaluate_tool code → after the first yield
# BEFORE (deprecated in v0.6.0)
task = LegacyTask(
    prompt="What's the current URL?",
    mcp_config={"hud": {...}},
    setup_tool={"name": "navigate", "arguments": {"url": "https://google.com"}},
    evaluate_tool={"name": "check_url", "arguments": {"expected": "google.com"}}
)

# AFTER
from hud import Environment

env = Environment("browser").connect_hub("hud-evals/browser")

@env.scenario("navigate-google")
async def navigate_google():
    # ===== SETUP SECTION (replaces setup_tool) =====
    await env.call_tool("navigate", url="https://google.com")
    
    # ===== PROMPT (first yield) =====
    answer = yield "What's the current URL?"
    
    # ===== EVALUATE SECTION (replaces evaluate_tool) =====
    result = await env.call_tool("check_url", expected="google.com")
    
    # ===== REWARD (second yield) =====
    yield 1.0 if result else 0.0

# Create task from scenario
task = env("navigate-google")

Multiple setup_tool Calls

If you have multiple setup tools, just call them in sequence:
# BEFORE
setup_tool=[
    {"name": "navigate", "arguments": {"url": "..."}},
    {"name": "login", "arguments": {"user": "..."}},
    {"name": "go_to_page", "arguments": {"page": "settings"}}
]

# AFTER
@env.scenario("settings-test")
async def settings_test():
    # Multiple setup steps - just call them in order
    await env.call_tool("navigate", url="...")
    await env.call_tool("login", user="...")
    await env.call_tool("go_to_page", page="settings")
    
    answer = yield "Verify the settings page loaded correctly"
    
    result = await env.call_tool("check_settings")
    yield 1.0 if result else 0.0

Using with Built-in Agents

Built-in agents (ClaudeAgent, OpenAIAgent, etc.) work with both patterns:
from hud.agents import ClaudeAgent

agent = ClaudeAgent.create()

# Works with Task from scenario
result = await agent.run(env("navigate-google"))

# Works with Task.from_v4() conversion
result = await agent.run(Task.from_v4(legacy_task))

Optional: Bring Your Own Agent

v5 gives you the hud.eval() context manager for maximum flexibility:
async with hud.eval(env("checkout", product="laptop")) as ctx:
    # Use OpenAI, Anthropic, your own agent—whatever you want
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": ctx.prompt}],
        tools=ctx.as_openai_chat_tools()
    )
    
    # Handle tool calls, run your agent loop...
    await ctx.submit(response.choices[0].message.content)

print(ctx.reward)
The old ClaudeAgent and OperatorAgent still work—even with the new hud.eval() system. But now you’re not locked into a specific agent spec. Pair with the Gateway to use any model through one API.

Quick Reference

v4 (deprecated in v0.6.0)v5
LegacyTask(...)Task.from_v4(...) (quick) or env("scenario", ...) (recommended)
setup_toolCode before first yield in @env.scenario()
evaluate_toolCode after first yield in @env.scenario()
MCPServerEnvironment (drop-in replacement)
agent.run(task)Still works, or use hud.eval() for BYOA