HUD Documentation — Evaluations and RL Environments.

Version 0.4.73 - Latest stable release

I want to evaluate agents

Test Claude, Operator, or custom agents on benchmarks like SheetBench and OSWorld

I want to build environments

Wrap any software in dockerized MCP for scalable and generalizable agent evaluation

What is HUD?

HUD connects AI agents to software environments using the Model Context Protocol (MCP). Whether you’re evaluating existing agents or building new environments, HUD provides the infrastructure.

Why HUD?

🔌 MCP-native: Any agent can connect to any environment
📡 Live telemetry: Debug every tool call at hud.ai
⚡ HUD Gateway: Unified inference API for all LLMs
🚀 Production-ready: From local Docker to cloud scale
🎯 Built-in benchmarks: OSWorld-Verified, SheetBench-50, and more
🔧 CLI tools: Create, develop, and run with hud init, hud dev, hud run, hud eval

3-minute quickstart

Run your first agent evaluation with zero setup

HUD Gateway

Unified inference API for OpenAI, Anthropic, Gemini, and Open Source Models

Add to Cursor/Claude

Give your AI assistant full knowledge of HUD docs

Quick Example

import asyncio, os, hud
from hud.datasets import Task
from hud.agents import ClaudeAgent

async def main():
    # Define evaluation task with remote MCP
    task = Task(
        prompt="Win a game of 2048 by reaching the 128 tile",
        mcp_config={
            "hud": {
                "url": "https://mcp.hud.ai/v3/mcp",
                "headers": {
                    "Authorization": f"Bearer {os.getenv('HUD_API_KEY')}",
                    "Mcp-Image": "hudevals/hud-text-2048:0.1.3"
                }
            }
        },
        setup_tool={"name": "setup", "arguments": {"name": "board", "arguments": { "board_size": 4}}},
        evaluate_tool={"name": "evaluate", "arguments": {"name": "max_number", "arguments": {"target": 64}}}
    )
    
    # Run agent (auto-creates MCP client)
    agent = ClaudeAgent.create()
    result = await agent.run(task)
    print(f"Score: {result.reward}")

asyncio.run(main())

Community

GitHub

Star the repo and contribute

Discord

Join our community

Are you an enterprise building agents?

📅 Hop on a call or 📧 founders@hud.ai

QuickstartRun your first agent evaluation in 3 minutes

⌘I

Documentation Index

I want to evaluate agents

I want to build environments

​What is HUD?

​Why HUD?

3-minute quickstart

HUD Gateway

Add to Cursor/Claude

​Quick Example

​Community

GitHub

Discord

​Are you an enterprise building agents?

What is HUD?

Why HUD?

Quick Example

Community

Are you an enterprise building agents?