Documentation Index
Fetch the complete documentation index at: https://hud-f5fd7c15-feat-agent-orchestrator-cookbook.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
v4 separated environments (Docker containers) from evaluation logic (Task objects). v5 unifies everything in the Environment class—tools, setup, and scoring live together.
Deprecation Notice: LegacyTask, setup_tool, and evaluate_tool are deprecated in v0.5.0 and will be removed in v0.6.0 (no earlier than March 1st, 2026). Use Task.from_v4() for quick migration or @env.scenario() for new code.
Good News: Your Code Still Works
Environment inherits from MCPServer. Same API, same behavior. Just change the import:
# Before
from hud.server import MCPServer
mcp = MCPServer("my-env")
@mcp.tool()
def my_tool(): ...
mcp.run()
# After
from hud import Environment
env = Environment("my-env")
@env.tool()
def my_tool(): ...
env.run()
That’s it. Your Dockerfile, your tools, your run() call—all unchanged. Environment adds scenarios, connectors, and integrations on top.
Migration Path 1: Quick Conversion with Task.from_v4()
The fastest way to migrate existing v4 code—no changes to task definitions needed:
# BEFORE (deprecated in v0.6.0)
from hud.datasets import LegacyTask
legacy_task = LegacyTask(
prompt="Navigate to google.com",
mcp_config={"hud": {...}},
setup_tool={"name": "navigate", "arguments": {"url": "https://google.com"}},
evaluate_tool={"name": "check_url", "arguments": {}}
)
# AFTER - One-line conversion
from hud.eval import Task
task = Task.from_v4(legacy_task) # Converts LegacyTask → Task
# Also works with: Task.from_v4(dict), Task.from_v4(json_string)
# Works the same with agents
agent = ClaudeAgent.create()
result = await agent.run(task)
Task.from_v4() automatically:
- Runs
setup_tool at the start of evaluation
- Runs
evaluate_tool at the end to compute reward
- Preserves all existing behavior
Migration Path 2: Full Scenario Migration (Recommended)
For new code or when refactoring, migrate setup_tool and evaluate_tool to @env.scenario().
The rule is simple:
setup_tool code → before the first yield
evaluate_tool code → after the first yield
# BEFORE (deprecated in v0.6.0)
task = LegacyTask(
prompt="What's the current URL?",
mcp_config={"hud": {...}},
setup_tool={"name": "navigate", "arguments": {"url": "https://google.com"}},
evaluate_tool={"name": "check_url", "arguments": {"expected": "google.com"}}
)
# AFTER
from hud import Environment
env = Environment("browser").connect_hub("hud-evals/browser")
@env.scenario("navigate-google")
async def navigate_google():
# ===== SETUP SECTION (replaces setup_tool) =====
await env.call_tool("navigate", url="https://google.com")
# ===== PROMPT (first yield) =====
answer = yield "What's the current URL?"
# ===== EVALUATE SECTION (replaces evaluate_tool) =====
result = await env.call_tool("check_url", expected="google.com")
# ===== REWARD (second yield) =====
yield 1.0 if result else 0.0
# Create task from scenario
task = env("navigate-google")
If you have multiple setup tools, just call them in sequence:
# BEFORE
setup_tool=[
{"name": "navigate", "arguments": {"url": "..."}},
{"name": "login", "arguments": {"user": "..."}},
{"name": "go_to_page", "arguments": {"page": "settings"}}
]
# AFTER
@env.scenario("settings-test")
async def settings_test():
# Multiple setup steps - just call them in order
await env.call_tool("navigate", url="...")
await env.call_tool("login", user="...")
await env.call_tool("go_to_page", page="settings")
answer = yield "Verify the settings page loaded correctly"
result = await env.call_tool("check_settings")
yield 1.0 if result else 0.0
Using with Built-in Agents
Built-in agents (ClaudeAgent, OpenAIAgent, etc.) work with both patterns:
from hud.agents import ClaudeAgent
agent = ClaudeAgent.create()
# Works with Task from scenario
result = await agent.run(env("navigate-google"))
# Works with Task.from_v4() conversion
result = await agent.run(Task.from_v4(legacy_task))
Optional: Bring Your Own Agent
v5 gives you the hud.eval() context manager for maximum flexibility:
async with hud.eval(env("checkout", product="laptop")) as ctx:
# Use OpenAI, Anthropic, your own agent—whatever you want
response = await client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": ctx.prompt}],
tools=ctx.as_openai_chat_tools()
)
# Handle tool calls, run your agent loop...
await ctx.submit(response.choices[0].message.content)
print(ctx.reward)
The old ClaudeAgent and OperatorAgent still work—even with the new hud.eval() system. But now you’re not locked into a specific agent spec. Pair with the Gateway to use any model through one API.
Quick Reference
| v4 (deprecated in v0.6.0) | v5 |
|---|
LegacyTask(...) | Task.from_v4(...) (quick) or env("scenario", ...) (recommended) |
setup_tool | Code before first yield in @env.scenario() |
evaluate_tool | Code after first yield in @env.scenario() |
MCPServer | Environment (drop-in replacement) |
agent.run(task) | Still works, or use hud.eval() for BYOA |