Engineering

Browsing agents: scraping vs. structured tools

Browsing agents are powerful and brittle. Turn the browse into a tool wherever possible.

Yash ShahApril 6, 20263 min read

A team's browsing agent worked beautifully on the customer they built it for. It broke catastrophically the next month when the customer's website redesigned. The team had built selectors directly into the agent's prompt. The HTML changed; the prompt didn't.

Browsing agents are useful for the cases where structured tools don't exist. They're brittle by nature. The discipline is converting browse-once-and-figure-it-out into structured tool calls wherever possible.

The 'turn it into a tool' rule

Whenever a browsing agent does something repeatedly, build a structured tool for it:

Filling out a specific form → write a tool that takes the form's inputs and submits.
Reading data from a specific site → write a tool that scrapes (with care) and returns structured data.
Navigating a known flow → write a tool that runs the flow.

The browsing agent figures it out the first time. The team builds the tool. From then on, the structured tool runs.

This converts brittle one-off behaviour into reliable repeated behaviour. The website might still change; when it does, the team updates the tool, not every prompt.

Sandboxing

Browsing agents touch the open internet. Risks include:

Prompt injection from page content.
Unintended actions taken in third-party services.
Data exfiltration if the agent has read access to sensitive data.
Accidental compliance violations (the agent ends up in places the team didn't authorise).

Sandboxing:

Browsing happens in an ephemeral environment.
The agent can't access the team's infrastructure during the browse.
Outputs are filtered before reaching the agent's main context.
Action limits enforced (max-clicks, max-form-submits, max-page-fetches).

Rate limits

The agent visiting a site at agent-speed looks like a bot to the site. Rate limits:

Per-site, per-second request limits.
Polite headers (user-agent identifying as automation).
Honour robots.txt.
Backoff on errors.

Without these, the agent gets blocked on first encounter. With them, it operates within site-owner expectations.

Eval discipline

Browsing-agent evals are tricky. Live websites change. Eval cases that test specific page behaviour rot.

The pattern:

Eval against snapshots of pages, not live pages.
Periodically refresh snapshots.
Eval cases that test the agent's reasoning about pages, not specifics.

A real browsing agent

A research agent that pulls company data from public sources:

Initial: agent browsed Crunchbase, LinkedIn, company sites freely.
Three months in: team built crunchbase_lookup, linkedin_company_lookup, company_basic_info tools.
Browsing was reserved for novel sources.
Costs dropped (tools are cheaper than browsing).
Reliability rose (tools don't break when the page redesigns).
Maintenance shifted from prompt-tuning to tool-maintenance.

The team's velocity stayed high because the architecture matched the workload.

What we won't ship

Browsing agents that interact with shared accounts.

Browsing without sandboxing in production.

Agents that scrape sites at rates inconsistent with the site's terms.

Agents that don't strip prompt-injection markers from page content before adding to context.

Close

Browsing agents are brittle by nature and useful by capability. Convert browse-once into tool-call wherever possible. Sandbox the rest. Honour rate limits. Eval against snapshots. The agent's reliability comes from the architecture, not from the model's cleverness.

Browsing agents: scraping vs. structured tools

The 'turn it into a tool' rule

Sandboxing

Rate limits

Eval discipline

A real browsing agent

What we won't ship

Close

Related reading

Determinism harnesses for non-deterministic systems

Multi-agent orchestration: from kitchen brigade to opera

Retry strategies that don't compound errors