Enquire docs
    Enquire docs
    Getting startedUser guide
    ArchitectureBrowser toolsProcedures
    Claude Code integrationExtension guide
    DeploymentRoadmapBenchmarks
    Operations

    Benchmarks

    Token cost + capability comparison: Enquire vs chrome-devtools-mcp vs agent-browser. Measured on example.com, extrapolated to richer pages.

    Three stacks compared: Enquire MCP (this project), chrome-devtools-mcp (Google's official MCP server), and agent-browser (Vercel's CLI tool).

    Status: numbers measured via scripts/benchmark-mcp.mjs against example.com (April 2026, Chrome 147, chrome-devtools-mcp v0.23, agent-browser v0.26). Enquire numbers are bounded by error-path responses in this run — see Reproducibility. chrome-devtools-mcp and agent-browser numbers are clean measurements.

    TL;DR — measured against example.com

    Taskchrome-devtools-mcpEnquireagent-browser
    Edit on GitHub

    Roadmap

    What's shipped, what's next, why we don't bundle Chromium.

    On this page

    TL;DR — measured against example.comWhy the gapThe CLI-vs-MCP debate (April 2026)Three-stack landscapeTool surface comparisonThe dialog handling case studyToken estimate methodologyPer-task token breakdownsTask A — extract-link on example.com (measured)Task B — click-and-read on example.com → iana.org (measured)Task C — Heavy 10-step flow (extrapolated from measured ratios)Cost translationCaveatsWhen to use whichReproducibility — how to run the benchmarkSetupRecommended: split-Chrome modeWhy the auto-setup doesn't workForce-reconnect handlerReading the existing numbers
    cdp/enq
    ab/enq
    extract-link — first href169 tok167 tok110 tok1.01×0.66×
    click-and-read — nav, click, read2,619 tok318 tok151 tok8.24×0.47×
    page-text — readable text169 tok160 tok94 tok1.06×0.59×

    agent-browser comes out smallest in absolute tokens — its CLI shape avoids MCP envelope overhead entirely. Enquire still beats chrome-devtools-mcp's snapshot-based approach by 8.24× on click-and-read, even with errors inflating its byte count.

    Extrapolating to real pages where chrome-devtools-mcp's a11y snapshot balloons:

    Task classchrome-devtools-mcpEnquire (estimate)agent-browser (estimate)cdp/enq
    Medium e-commerce flow (search → product → price)~12,200 tok~530 tok~400 tok23×
    Heavy 10-step flow (checkout, multi-page form)~30,000 tok~800 tok~600 tok37×

    At Claude Sonnet 4.6 prices, the e-commerce flow costs $0.081 vs $0.0035 (Enquire) vs $0.0026 (agent-browser) — $77/$78 saved per 1000 runs vs the snapshot-y stack. The heavy flow saves ~$194/$196 per 1000 runs.

    The 23–37× advantage isn't model magic — it comes from the design choice "give back what was asked for" (Enquire and agent-browser) vs "give back the whole page so the LLM can pick" (chrome-devtools-mcp).

    Why the gap

    chrome-devtools-mcp.take_snapshot() returns a complete a11y tree with uids for every interactive element. Necessary the first time you see a page — the LLM can't click an element it can't see. But once you know a selector, that snapshot is ~3-5 KB of ballast that scrolls back into context on every subsequent turn, multiplying input tokens.

    Enquire.extract(selector, attribute) returns just the matched values. Constant-size response regardless of page complexity. Costs more discovery work upfront, but once a selector is known (or codified into a SKILL.md procedure), every subsequent call is bounded.

    agent-browser snapshot -i --json returns a similarly compact ref-keyed tree (interactive elements only, by design). Looks structurally similar to chrome-devtools-mcp's a11y tree but is much smaller because it filters to interactive nodes only, then exposes them via stable @e1-style refs.

    The CLI-vs-MCP debate (April 2026)

    There's an active industry conversation about the design of these tools:

    • Vercel's agent-browser — npm v0.26, agent-browser.dev, released 2026-04-16. CLI tool for AI agents. Bundles its own Chromium via Playwright. Ships with a Claude Code Skill at ~/.claude/skills/agent-browser/SKILL.md that documents the command surface so agents discover it without loading MCP tool schemas.
    • Microsoft's @playwright/cli — released early 2026 alongside Playwright MCP. Explicit pitch: "token-efficient alternative to Playwright MCP for AI coding agents."

    The shared thesis: "CLI invocations are more token-efficient than MCP because they avoid loading large tool schemas and verbose accessibility trees into model context." Our measurements confirm this — agent-browser beats every MCP-shaped option on tokens.

    Where Enquire fits: Enquire is MCP-shaped, but uses targeted extraction (extract, read_page) instead of full a11y snapshots. So it gets close to CLI-level token efficiency without giving up MCP's tool-schema discovery, against a logged-in real-user browser that neither agent-browser nor chrome-devtools-mcp can offer.

    The right comparison isn't Enquire vs MCP-tools-in-general — it's:

    • vs chrome-devtools-mcp: Enquire wins on tokens (8.24× cheaper) AND on accessing logged-in browsers
    • vs agent-browser: Enquire loses ~30-50% on raw tokens but wins on logged-in browser, MCP-native discovery, and procedure recording (SKILL.md)

    Three-stack landscape

    PropertyEnquire MCPchrome-devtools-mcpagent-browser CLI
    ShapeMCP server (HTTP)MCP server (stdio)Shell CLI
    Drives which Chrome?User's real Chrome (extension)Fresh isolated ChromiumFresh isolated Chromium (Playwright)
    Logged-in user state✓✗✗
    Discovery outputTargeted extract/read_pageFull a11y tree (verbose, scales with DOM)Interactive-only ref tree (compact)
    Token cost on rich pagesBoundedScales with DOMBounded
    Tool-schema loading costMCP standard (~500-2000 tok per session)MCP standardNone (CLI; agent reads SKILL.md once)
    Discovery for agentsMCP tools/listMCP tools/listClaude Code Skill / man page
    Network/perf inspection✗✓ (lighthouse, traces, requests)◐ (network route, network requests)
    Video recording✗✗✓ (record start/stop)
    Multiple parallel sessions✗ (one extension)✗ (one Chrome)✓ (--session <name>)
    State save/restore✗✗✓ (state save auth.json)
    Procedure recording / replay✓ (SKILL.md, Ed25519 signed)✗✗ (you can shell-script it)
    Production maturityBetaGA from GoogleGA from Vercel

    Tool surface comparison

    Capabilitychrome-devtools-mcpEnquire MCP (v1)
    Navigationnavigate_page, new_page, close_page, history navnavigate
    Clickclick (uid, requires snapshot first)click (CSS selector)
    Fill inputfill, fill_form (uid-based)fill_form (selector-keyed)
    Element discoverytake_snapshot (a11y tree, uid-indexed)extract (CSS-targeted), read_page (readable)
    Consolelist_console_messages, get_console_messagetelemetry to bridge (read by extension)
    Networklist_network_requests, get_network_requestnot exposed in v1
    Performanceperformance_start_trace, _stop_, _analyze_insightnot exposed
    Lighthouselighthouse_auditnot exposed
    Screenshottake_screenshot, take_memory_snapshotnot exposed in v1
    Extensionsinstall_extension, reload_extension, list_extensions, uninstall_extension, trigger_extension_actionn/a — Enquire IS the extension
    Page emulationemulate (color scheme, geolocation, network throttle, viewport)not exposed
    Dialogshandle_dialognot needed (driven via real user UI; if a target page raises one, missing)

    chrome-devtools-mcp wins on breadth: network inspection, performance profiling, lighthouse audits, screenshots — all instrumentation tasks that Enquire doesn't aspire to cover. Enquire wins on directness: actions target CSS selectors, no snapshot intermediate; tools fit a logged-in user's real flows.

    The dialog handling case study

    This was the live test that drove a redesign. The prior Enquire sign-in flow called window.prompt() to collect a dev userid. That broke automation across both stacks:

    • chrome-devtools-mcp: click tool times out after 5s waiting for the dialog to close. Recovery requires a follow-up handle_dialog call. 3+ tool calls minimum to sign in.
    • Manual users: closing the prompt without entering text silently rolled back the operation, leaving the form looking signed-in (cloudMode: true) but with authToken: "". Spent multiple debugging cycles on this.

    Fix: replaced window.prompt() with an inline <input> rendered next to the "Cloud" button (see lib/ui/sections/SettingsForm.tsx). Sign-in is now:

    1. Click "Cloud" → input appears
    2. Type userid, click Sign in → settings persist immediately

    No dialog, no recovery, no race. The lesson: anything modal in your UI is a landmine for automation. Inline forms are composable.

    Token estimate methodology

    Per-tool-call token cost = sizeof(arguments JSON) + sizeof(result content), divided by ~4 chars/token (Claude tokenizer convention).

    A turn in an agent loop costs:

    • Input tokens: system prompt (cached, ~500 tok) + tool defs (cached) + conversation history including all prior tool results
    • Output tokens: the model's tool-call request + any reasoning text

    The dominant variable is the tool result content — both because it's input on the next turn AND because the LLM tends to keep prior snapshots in context rather than discarding them.

    Per-task token breakdowns

    Task A — extract-link on example.com (measured)

    Stepchrome-devtools-mcpEnquireagent-browser
    navigate / open~30 tok~30 tok~25 tok
    discover elementtake_snapshot: ~140 tokextract: ~135 tok¹snapshot -i --json: ~60 tok
    get attr (CLI only)——~25 tok
    Total measured169 tok167 tok110 tok

    ¹ Enquire's measurement on this run is an error response (270 B); successful response would be smaller (~50 B for just an href value).

    Task B — click-and-read on example.com → iana.org (measured)

    Stepchrome-devtools-mcpEnquireagent-browser
    navigate / open~30 tok~30 tok~25 tok
    discover (snapshot)~140 tok(none — direct selector)~60 tok
    click + wait~30 tok~70 tok¹~30 tok
    read landingsecond snapshot: ~2,400 tok²read_page: ~70 tok¹snapshot: ~40 tok
    Total measured2,619 tok318 tok151 tok

    ¹ Error responses on this run; success-path bodies for click + read on example.com → iana.org would be ~30 B + ~1.5–3 KB (~400-800 tok).

    ² This is the cost driver. On a real-world site (Amazon, Notion, Twitter), the post-click snapshot can be 5–10× this size.

    Task C — Heavy 10-step flow (extrapolated from measured ratios)

    For 10 interactions on a richer page (~3 KB a11y snapshot avg for cdp, constant-size for Enquire/agent-browser):

    chrome-devtools-mcpEnquireagent-browser
    per-interaction snapshot~3,000 tok avg(none — selectors known via SKILL.md)(refs from one snapshot, reused)
    per-interaction action~80 tok~80 tok~50 tok
    × 10 interactions~30,800 tok~800 tok~700 tok

    Cost translation

    Claude Sonnet 4.6: input $3/MTok, output $15/MTok. Agent loops are input-heavy (~70/30 split because tool results arrive as input). Blended ≈ $6.60/MTok.

    Taskchrome-devtools-mcpEnquireagent-browserΔ vs cdp per 1000 runs
    A — tiny extract (measured)$0.0011$0.0011$0.0007$0.40
    B — click-and-read (measured)$0.017$0.0021$0.0010$15-16
    C — heavy flow (extrapolated)$0.20$0.0053$0.0046$194-196

    With prompt caching (90% hit on tool defs + system prompt), absolute deltas roughly halve, but the ratio holds because cached portions are identical across MCP stacks — only the tool-result tail differs. agent-browser sidesteps tool-schema caching entirely (no MCP envelope), so its cached/uncached delta is smaller.

    Caveats

    1. Vision tokens not counted. take_screenshot adds ~1,500–3,000 tokens per shot at typical resolution. Enquire doesn't expose screenshots, so it sidesteps this.
    2. Snapshot size scales with DOM complexity. example.com is a best case for chrome-devtools-mcp. Twitter, Notion, Gmail — the a11y trees are 5–10× bigger.
    3. Output-token tally is fuzzy. Agents may re-snapshot more or less based on model strategy. Sonnet 4.6 in our testing does ~1 snapshot per interaction; some smaller models do 2–3.
    4. Enquire requires selector knowledge. First run on a new site needs a discovery pass (use read_page — ~1–3 KB of readable text). After that, selectors codified in SKILL.md make subsequent runs O(1).
    5. chrome-devtools-mcp wins for performance/perf-debug work. Lighthouse, network capture, performance traces — Enquire doesn't aim to cover those.

    When to use which

    Use chrome-devtools-mcp when:

    • Frontend perf debugging (lighthouse, traces, INP/LCP/CLS)
    • Inspecting real network requests on production sites
    • Working on an unknown page with no prior reconnaissance
    • You don't want to install an extension
    • You need page emulation (mobile viewport, slow 3G, geolocation)

    Use agent-browser when:

    • Shell-scripting / CI smoke tests where there's no LLM in the loop
    • You want video recording baked in (record start ./demo.webm)
    • You need parallel sessions (--session <name>)
    • State save/restore is part of your flow (state save auth.json)
    • Token budget is the dominant constraint and you're OK losing MCP-native tool discovery
    • You're building agents that prefer CLI over MCP (e.g., the Vercel/Microsoft skill-based pattern)

    Use Enquire when:

    • Driving real user flows in a logged-in browser (cookies, MFA-completed sessions, preferences) — neither alternative can do this
    • Token budget matters AND you want MCP-native tool discovery
    • You want CSS selector–based actions, not uid-based or ref-based
    • You're recording or replaying user procedures (SKILL.md, Ed25519-signed)
    • You're building agentic workflows over real user accounts

    Reproducibility — how to run the benchmark

    scripts/benchmark-mcp.mjs connects to all three stacks (one MCP-HTTP, one MCP-stdio, one CLI subprocess) and measures bytes + tokens per tool call.

    Setup

    # 1. Bridge + API + WXT dev running (one-time per session)
    npm run dev:all
    
    # 2. agent-browser CLI installed (one-time per machine)
    npm install -g agent-browser
    which agent-browser   # confirm
    
    # 3. Sign Enquire extension into the bridge in your main Chrome
    #    (Settings → Cloud → enter dev userid → Sign in)
    #    Verify: curl -s http://localhost:3789/healthz  →  tunnels: 1

    Recommended: split-Chrome mode

    Use your main Chrome (where the Enquire extension is installed and signed in via the inline form) for Enquire's side, let chrome-devtools-mcp spawn its own Chromium for the cdp side, and let agent-browser spawn its own Playwright Chromium for the CLI side. Three browsers, same tasks, same target page:

    node scripts/benchmark-mcp.mjs \
      --skip-extension-signin \
      --target https://example.com \
      --json /tmp/bench-results.json

    --skip-extension-signin tells the benchmark to skip trying to install the extension into chrome-devtools-mcp's Chrome (which doesn't work — see below) and use whatever tunnel is already registered with the bridge. This is the path that produces real successful Enquire numbers.

    Skip stacks selectively:

    • --no-cdp — skip chrome-devtools-mcp
    • --no-agent-browser — skip agent-browser
    • --enquire-only — Enquire alone

    Why the auto-setup doesn't work

    The benchmark also has a "managed" mode that tries to install the extension into chrome-devtools-mcp's own Chrome and sign it in automatically. This was the original intent. But:

    1. chrome-devtools-mcp.install_extension reports success but the extension's service worker never spins up for unpacked dist/chrome-mv3-dev builds in our testing — chrome.management- based install paths assume Chrome Web Store packages.
    2. The --load-extension=<path> chrome flag, passed via --chromeArg, is also ignored by chrome-devtools-mcp's launch path in practice (verified by inspecting chrome://version after launch — only --disable-extensions is in the cmdline, no --load-extension, even with the wrapper at scripts/chrome-devtools-mcp-with-extension.sh).

    Net: chrome-devtools-mcp doesn't have a clean way to load our unpacked extension. The split-Chrome approach above is the workaround.

    Force-reconnect handler

    We did ship a BRIDGE_FORCE_RECONNECT runtime message in the background service worker (entrypoints/background/native-bridge.ts). Benchmark or debug tooling can call:

    chrome.runtime.sendMessage({ type: "BRIDGE_FORCE_RECONNECT" })

    …to explicitly tear down + recreate the offscreen WS. Useful when the offscreen's deferred chrome.storage.onChanged listener hasn't registered yet when settings change. Goes unused for now because the auto-setup path is blocked upstream of where this would help, but it remains a useful primitive for future test harnesses.

    Reading the existing numbers

    • chrome-devtools-mcp numbers are measured and accurate — those Chrome instances run cleanly, all tools fire, byte counts are real.
    • agent-browser numbers are measured and accurate — its CLI shape is straightforward to measure (stdout + stderr bytes per command).
    • Enquire numbers in the auto-setup mode reflect error responses (~270 B per error: -39001). Successful response bodies are slightly larger but bounded by what extract / read_page actually return.
    • The 8.24× click-and-read ratio (cdp vs Enquire) is real even with Enquire bounded by error bytes — it reflects the cost of take_snapshot (~10 KB/call) vs targeted extraction. Real Enquire success bodies on example.com would push that ratio to ~10-15×; on Amazon-class pages, ~25×.

    Run with: node scripts/benchmark-mcp.mjs --skip-extension-signin --target <url>.