Skip to main content

Documentation Index

Fetch the complete documentation index at: https://hud-f5fd7c15-feat-agent-orchestrator-cookbook.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Version 0.4.73 - Latest stable release

I want to evaluate agents

Test Claude, Operator, or custom agents on benchmarks like SheetBench and OSWorld

I want to build environments

Wrap any software in dockerized MCP for scalable and generalizable agent evaluation

What is HUD?

HUD connects AI agents to software environments using the Model Context Protocol (MCP). Whether you’re evaluating existing agents or building new environments, HUD provides the infrastructure.

Why HUD?

  • πŸ”Œ MCP-native: Any agent can connect to any environment
  • πŸ“‘ Live telemetry: Debug every tool call at hud.ai
  • ⚑ HUD Gateway: Unified inference API for all LLMs
  • πŸš€ Production-ready: From local Docker to cloud scale
  • 🎯 Built-in benchmarks: OSWorld-Verified, SheetBench-50, and more
  • πŸ”§ CLI tools: Create, develop, and run with hud init, hud dev, hud run, hud eval

3-minute quickstart

Run your first agent evaluation with zero setup

HUD Gateway

Unified inference API for OpenAI, Anthropic, Gemini, and Open Source Models

Add to Cursor/Claude

Give your AI assistant full knowledge of HUD docs

Quick Example

import asyncio, os, hud
from hud.datasets import Task
from hud.agents import ClaudeAgent

async def main():
    # Define evaluation task with remote MCP
    task = Task(
        prompt="Win a game of 2048 by reaching the 128 tile",
        mcp_config={
            "hud": {
                "url": "https://mcp.hud.ai/v3/mcp",
                "headers": {
                    "Authorization": f"Bearer {os.getenv('HUD_API_KEY')}",
                    "Mcp-Image": "hudevals/hud-text-2048:0.1.3"
                }
            }
        },
        setup_tool={"name": "setup", "arguments": {"name": "board", "arguments": { "board_size": 4}}},
        evaluate_tool={"name": "evaluate", "arguments": {"name": "max_number", "arguments": {"target": 64}}}
    )
    
    # Run agent (auto-creates MCP client)
    agent = ClaudeAgent.create()
    result = await agent.run(task)
    print(f"Score: {result.reward}")

asyncio.run(main())

Community

GitHub

Star the repo and contribute

Discord

Join our community

Are you an enterprise building agents?

πŸ“… Hop on a call or πŸ“§ founders@hud.ai