Runtime Guardrails for AI That Browses the Web

The Problem: Static Allowlists Cannot Handle Dynamic Browsing

AI agents do not follow predictable paths. They follow links, get redirected, encounter new domains mid-session, and discover URLs that no pre-flight check could have anticipated.

Pre-Flight Checks Fail at Runtime

Most agent governance frameworks implement pre-flight validation: before the agent starts its task, a list of approved domains is compiled and loaded. The agent is told "you may visit these 50 domains." This approach fails catastrophically in practice because web browsing is inherently dynamic. A Google search for "enterprise software pricing" returns links the pre-flight list never anticipated. A redirect chain from an approved domain lands the agent on an unapproved one. An embedded iframe loads content from a third-party ad network. Every one of these runtime events bypasses the pre-flight allowlist entirely.

Redirect chains: An approved URL redirects through 2-3 intermediate domains before reaching the final destination — none of the intermediate domains were on the allowlist
Dynamic link discovery: Agents following search results or scraping link directories encounter URLs that did not exist at pre-flight time
Session drift: Multi-step browsing tasks accumulate navigation events over minutes or hours — the further into the session, the further from the original allowlist
Embedded content: Iframes, JavaScript-loaded resources, and API calls from visited pages pull content from domains the agent never explicitly navigated to

The Solution: In-Line Guardrails That Execute at Navigation Time

Runtime guardrails intercept every navigation event — not just the initial task URL, but every subsequent click, redirect, iframe load, and API call — and evaluate it against your policy rules in real-time. The evaluation uses a pre-classified domain database with 102 million entries, delivering sub-millisecond lookup times with zero external API dependencies. The classification is deterministic (no model inference, no hallucination risk) and the policy decision (allow, block, log, or escalate) executes before the HTTP request fires.

This architecture treats your agent's browser session like a network firewall treats packet flows: every request is inspected, classified, and either permitted or dropped based on policy rules. The difference is that instead of IP addresses and ports, the guardrail operates on URLs, IAB categories, page types, and reputation scores — giving your security team the same visibility and control over agent web traffic that they already have over employee web traffic.

How Runtime Guardrails Work

Four stages of in-line policy enforcement during an active agent browsing session

Navigation Intent Capture

The guardrail hooks into the agent's browser automation layer (Playwright, Puppeteer, Selenium, or direct CDP) and intercepts every navigation event before it executes. This includes explicit navigations (agent clicks a link), implicit navigations (JavaScript redirects), and passive loads (iframes, fetch calls). The interception point is synchronous — the browser blocks until the guardrail returns a verdict.

Real-Time Classification Lookup

The intercepted URL is extracted and queried against the local 102M domain database. The lookup returns IAB categories, web filtering categories, page-type labels, reputation scores, and popularity rankings — all in under 1 millisecond. For URLs not in the local database, a real-time API fallback classifies the domain on demand with an average latency under 200ms.

Policy Rule Evaluation

The classification result is evaluated against your policy rule set. Rules can match on any combination of IAB category tier, web filtering category, page type, reputation score, and popularity rank. The evaluation is deterministic — the same URL always produces the same decision. Rule priority, conflict resolution, and default actions are all configurable by your security team.

Runtime Guardrail Integration Code

Middleware implementations that intercept agent navigation in real-time

Python — Playwright Runtime Interceptor

import http.client
import json
from urllib.parse import urlparse

class RuntimeGuardrail:
    """In-line guardrail for Playwright browser sessions."""

    BLOCKED_TYPES = ["login", "checkout", "admin", "settings"]
    BLOCKED_CATS = ["Adult", "Malware", "Phishing", "Gambling"]

    def __init__(self, api_key, local_db=None):
        self.api_key = api_key
        self.local_db = local_db  # dict: domain -> classification
        self.session_log = []
        self.conn = http.client.HTTPSConnection(
            "www.websitecategorizationapi.com"
        )

    def intercept(self, route, request):
        """Playwright route handler — runs on EVERY request."""
        url = request.url
        domain = urlparse(url).netloc

        # Step 1: Local DB lookup (sub-millisecond)
        classification = self._local_lookup(domain)

        # Step 2: API fallback if not in local DB
        if not classification:
            classification = self._api_classify(url)

        # Step 3: Evaluate policy
        decision = self._evaluate(classification, url)

        # Step 4: Log every decision
        self.session_log.append({
            "url": url,
            "decision": decision["action"],
            "reason": decision["reason"],
            "category": classification.get("category", "Unknown")
        })

        # Step 5: Allow or block
        if decision["action"] == "block":
            route.abort("blockedbyclient")
        else:
            route.continue_()

    def _local_lookup(self, domain):
        if self.local_db and domain in self.local_db:
            return self.local_db[domain]
        return None

    def _api_classify(self, url):
        payload = (
            f"query={url}"
            f"&api_key={self.api_key}"
            f"&data_type=url"
            f"&expanded_categories=1"
        )
        headers = {
            "Content-Type": "application/x-www-form-urlencoded"
        }
        self.conn.request(
            "POST",
            "/api/iab/iab_web_content_filtering.php",
            payload,
            headers
        )
        res = self.conn.getresponse()
        return json.loads(res.read().decode("utf-8"))

    def _evaluate(self, data, url):
        page_type = data.get("page_type", "unknown")
        if page_type in self.BLOCKED_TYPES:
            return {"action": "block", "reason": f"Page type: {page_type}"}

        category = data.get("filtering_category", "Unknown")
        if category in self.BLOCKED_CATS:
            return {"action": "block", "reason": f"Category: {category}"}

        return {"action": "allow", "reason": "Policy passed"}

# Usage with Playwright
# page.route("**/*", guardrail.intercept)

JavaScript — CDP Request Interception

class RuntimeNavigationGuard {
  constructor(apiKey, localDB = new Map()) {
    this.apiKey = apiKey;
    this.localDB = localDB;
    this.sessionLog = [];
    this.blockedPageTypes = new Set([
      "login", "checkout", "admin", "settings", "signup"
    ]);
    this.blockedCategories = new Set([
      "Adult", "Malware", "Phishing", "Gambling"
    ]);
  }

  async onNavigationRequest(url) {
    const domain = new URL(url).hostname;

    // Local DB first (sub-ms)
    let classification = this.localDB.get(domain);

    // API fallback
    if (!classification) {
      classification = await this.apiClassify(url);
    }

    // Evaluate
    const decision = this.evaluate(classification);
    this.sessionLog.push({
      url, ...decision,
      timestamp: Date.now()
    });

    return decision;
  }

  evaluate(data) {
    const pageType = data?.page_type || "unknown";
    if (this.blockedPageTypes.has(pageType)) {
      return { action: "block", reason: `Page type: ${pageType}` };
    }

    const filterCat = data?.filtering_taxonomy?.[0]?.[0]
      ?.replace("Category name: ", "") || "Unknown";
    if (this.blockedCategories.has(filterCat)) {
      return { action: "block", reason: `Category: ${filterCat}` };
    }

    return { action: "allow", reason: "Policy passed" };
  }

  async apiClassify(url) {
    const resp = await fetch(
      "https://www.websitecategorizationapi.com" +
      "/api/iab/iab_web_content_filtering.php",
      {
        method: "POST",
        headers: {
          "Content-Type": "application/x-www-form-urlencoded"
        },
        body: new URLSearchParams({
          query: url,
          api_key: this.apiKey,
          data_type: "url",
          expanded_categories: "1"
        })
      }
    );
    return await resp.json();
  }

  getSessionReport() {
    const total = this.sessionLog.length;
    const blocked = this.sessionLog.filter(
      e => e.action === "block"
    ).length;
    return { total, allowed: total - blocked, blocked };
  }
}

Pre-Classified Page-Type URLs

Why Pre-Classified URLs for 102M Domains
Changes Everything for AI Agents

Having pre-classified URLs for 20 page types across 102 million domains at the start of any agent task means your agents skip the discovery phase entirely. The result: orders of magnitude faster task completion.

Orders of Magnitude Faster

Without pre-classified data, an agent must crawl each domain, follow links, load pages, and analyze content to find a login or pricing page. That takes seconds to minutes per domain. With our database, the agent gets the exact URL in under 1ms — a local lookup instead of a live crawl.

From minutes per domain to microseconds

Dramatically Lower Cost

Live crawling and AI classification at runtime burns tokens, compute, and API calls. Every page an agent visits to discover structure costs $0.01–$0.05 in LLM inference. Multiply by thousands of domains and the bill explodes. A one-time database purchase eliminates all per-query classification costs.

One-time cost vs. per-query billing

Zero Hallucination Risk

When agents guess URLs, they hallucinate. An LLM asked to find a company's pricing page might fabricate /pricing, /plans, or /packages — none of which exist. Our database provides verified, real URLs that were actually discovered and classified, eliminating hallucinated navigation entirely.

Verified URLs, not AI guesses

1000x faster lookups

Zero per-query cost

100% verified URLs

AI Agent Database Pricing

Purpose-built domain databases for AI agent filtering. Includes IAB categories, 20+ page types, reputation scores, and popularity rankings. One-time purchase with perpetual license.

AI Agent Database

AI Agent Domain Database 10M

$7,999

10 Million Domains with Page-Type Intelligence

One-time purchase: Perpetual license | Optional Updates: $1,599/year

10M+ Categorized Domains
IAB Taxonomies v2 & v3
20+ Page Type Labels
Web Filtering Categories
OpenPageRank Scores
Global Popularity Rankings

Get AI Agent DB 10M

Popular

AI Agent Domain Database 20M

$14,999

20 Million Domains with Full Intelligence Suite

One-time purchase: Perpetual license | Optional Updates: $2,999/year

20M+ Categorized Domains
IAB Taxonomies v2 & v3
20+ Page Type Labels
Web Filtering Categories
OpenPageRank Scores
Global & Country Rankings
Dedicated Account Manager

Get AI Agent DB 20M

Maximum Coverage

AI Agent Domain Database 50M

$24,999

50 Million Domains with Complete Intelligence Suite

One-time purchase: Perpetual license | Optional Updates: $4,999/year

50M+ Categorized Domains
IAB Taxonomies v2 & v3
20+ Page Type Labels
Web Filtering Categories
OpenPageRank Scores
Global & Country Rankings
Dedicated Account Manager

Get AI Agent DB 50M

Also available: Enterprise URL Database up to 102M domains from $2,499. View all database tiers →

Why Runtime Beats Pre-Flight for Agent Browsing Governance

The fundamental assumption behind pre-flight agent governance is that you can predict every URL an agent will visit before the browsing session begins. This assumption is incorrect for any task that involves web search, link following, or dynamic content discovery. A researcher agent tasked with "find the top 10 enterprise CRM vendors and compare their pricing" will discover URLs through search results, follow links from comparison sites, and navigate to vendor pages — none of which can be enumerated in advance. The agent needs governance that operates at runtime, not at plan time.

Runtime guardrails solve this by moving the policy enforcement point from the task planner to the browser automation layer. Instead of telling the agent "you may visit these specific URLs," you tell the browser "every URL must pass policy evaluation before loading." This inversion of control means the agent retains full flexibility in its browsing behavior while the guardrail retains full authority over what actually loads.

The Architecture of an In-Line Navigation Guardrail

An in-line guardrail sits between the agent's navigation intent and the actual HTTP request. In a Playwright-based agent, this is implemented as a route handler that intercepts all requests before they fire. In a Puppeteer-based agent, it uses CDP's Fetch.requestPaused event. In a Selenium-based agent, it wraps the WebDriver's navigate method with a pre-check. Regardless of the browser automation framework, the architecture is the same: intercept, classify, evaluate, decide.

The classification step is where the 102M domain database becomes critical. Every intercepted URL is extracted, the domain is looked up in the local database, and the classification result — IAB categories, page types, reputation scores — is returned in under 1 millisecond. This sub-millisecond latency means the guardrail adds negligible overhead to the agent's browsing session. The agent does not perceive a delay because the database lookup is faster than the network latency of the actual HTTP request it was about to make.

Handling Redirect Chains and Dynamic Destinations

One of the most dangerous gaps in pre-flight governance is redirect chains. An agent navigates to an approved URL — say, a marketing page on a known SaaS vendor's domain. That page includes a JavaScript redirect to a third-party analytics platform, which in turn redirects to a content delivery network, which serves the actual page from a completely different domain. The pre-flight allowlist approved the original URL but knows nothing about the intermediate hops or the final destination.

Runtime guardrails handle this by intercepting every navigation event in the chain, not just the initial one. Each redirect triggers a new interception, a new classification lookup, and a new policy evaluation. If any hop in the chain lands on a blocked category or page type, the entire chain is terminated at that point. The agent never reaches the final destination through a blocked intermediate domain.

Sub-Resource Monitoring: Beyond Page Navigation

Sophisticated runtime guardrails do not stop at page-level navigation. They also monitor sub-resource requests: JavaScript files loaded from external CDNs, API calls to third-party services, image assets served from ad networks, and WebSocket connections to real-time data feeds. Each of these sub-resources represents a potential data exfiltration channel or policy violation vector. An agent visiting an approved blog post might trigger a JavaScript beacon that sends browsing data to an advertising tracker — a data leakage event that only sub-resource monitoring can detect.

Session-Level Audit Trails for Compliance

Every decision the runtime guardrail makes — allow, block, log, or escalate — is recorded in a session-level audit trail. This trail includes the URL, the classification result, the policy rule that matched, the decision, and a timestamp. For regulated industries (financial services, healthcare, government), this audit trail is the evidence that your agent governance program actually works. Auditors do not accept "we told the agent not to visit bad websites" as compliance evidence. They need deterministic records: this URL was evaluated, this classification was returned, this policy rule was applied, and this decision was made at this timestamp.

Latency Budgets: Local Database vs. API-Only Approaches

Agent browsing sessions are latency-sensitive. An agent that takes 500 milliseconds to evaluate each URL will feel sluggish and waste compute time waiting for classification responses. API-only classification approaches introduce network round-trip latency (typically 100-300ms per request) into every navigation decision. At 50 navigation events per browsing session, that adds 5 to 15 seconds of pure classification overhead — time the agent spends waiting instead of working.

A local database lookup eliminates this overhead entirely. The 102M domain database loaded into Redis or SQLite returns classification results in under 1 millisecond. Over a 50-event browsing session, the total classification overhead is under 50 milliseconds — invisible to the agent and negligible compared to the network latency of the actual page loads. This is why runtime guardrails built on local databases are the only architecturally sound approach for production agent deployments.

Graceful Degradation When Classification Is Unavailable

Production systems must handle failure gracefully. What happens when the local database does not contain the URL and the API fallback times out? A well-designed runtime guardrail has a configurable default action for unclassified URLs. Conservative deployments set the default to "block and log" — the agent cannot visit unclassified URLs, and each block is logged for manual review. Permissive deployments set the default to "allow and alert" — the agent can proceed, but the security team receives a notification. The correct default depends on your organization's risk tolerance and the specific agent's task context.

Related topics: Inline Policy Enforcement Block Computer Use from Admin Pages Agent Navigation Middleware Zero Trust Agent Controls Traffic Inspection by Category Agentic AI Observability Web Filtering for ChatGPT, Claude Agents

Framework-Specific Integration Patterns

For Playwright-based agents (including Anthropic's Computer Use and many custom agent frameworks), the guardrail registers as a route handler via page.route("**/*", handler). For Puppeteer-based agents, it uses the CDP Fetch domain to intercept requests. For Selenium-based agents, it wraps the WebDriver's get() method and evaluates the URL before calling the underlying navigation. For higher-level frameworks like LangChain or CrewAI, the guardrail wraps the browsing tool's execute method, intercepting URLs before they reach the browser automation layer.

In all cases, the integration follows the same principle: the guardrail must execute synchronously in the navigation path. Asynchronous logging-only approaches are useful for observability but do not provide governance — by the time the log entry is written, the agent has already loaded the page, rendered the content, and potentially extracted sensitive data. True runtime guardrails are blocking: the page does not load until the guardrail returns a verdict.

Runtime Guardrails for AI That Browses the Web

The Problem: Static Allowlists Cannot Handle Dynamic Browsing

Pre-Flight Checks Fail at Runtime

The Solution: In-Line Guardrails That Execute at Navigation Time

Runtime Navigation Interception

How Runtime Guardrails Work

Navigation Intent Capture

Real-Time Classification Lookup

Policy Rule Evaluation

Request Classification Pipeline

Over 10 Billion Links Individually Analyzed

Runtime Guardrail Integration Code

Python — Playwright Runtime Interceptor

JavaScript — CDP Request Interception

Agent Browsing Session Timeline

Why Pre-Classified URLs for 102M Domains
Changes Everything for AI Agents

Orders of Magnitude Faster

Dramatically Lower Cost

Zero Hallucination Risk

AI Agent Database Pricing

How Many Domains in Each Category?

Domain Distribution by Category in Our 102M Enterprise Database

Top 50 IAB v3 Categories

Redirect Chain Interception

Why Runtime Beats Pre-Flight for Agent Browsing Governance

The Architecture of an In-Line Navigation Guardrail

Handling Redirect Chains and Dynamic Destinations

Sub-Resource Monitoring: Beyond Page Navigation

Session-Level Audit Trails for Compliance

Latency Budgets: Local Database vs. API-Only Approaches

Graceful Degradation When Classification Is Unavailable

Framework-Specific Integration Patterns

Real-Time Defense Perimeter

Deploy Runtime Guardrails for Your Agent Stack

You are on the list!

Runtime Guardrails for AI That Browses the Web

The Problem: Static Allowlists Cannot Handle Dynamic Browsing

Pre-Flight Checks Fail at Runtime

The Solution: In-Line Guardrails That Execute at Navigation Time

Runtime Navigation Interception

How Runtime Guardrails Work

Navigation Intent Capture

Real-Time Classification Lookup

Policy Rule Evaluation

Request Classification Pipeline

Over 10 Billion Links Individually Analyzed

Runtime Guardrail Integration Code

Python — Playwright Runtime Interceptor

JavaScript — CDP Request Interception

Agent Browsing Session Timeline

Why Pre-Classified URLs for 102M Domains Changes Everything for AI Agents

Orders of Magnitude Faster

Dramatically Lower Cost

Zero Hallucination Risk

AI Agent Database Pricing

How Many Domains in Each Category?

Domain Distribution by Category in Our 102M Enterprise Database

Top 50 IAB v3 Categories

Redirect Chain Interception

Why Runtime Beats Pre-Flight for Agent Browsing Governance

The Architecture of an In-Line Navigation Guardrail

Handling Redirect Chains and Dynamic Destinations

Sub-Resource Monitoring: Beyond Page Navigation

Session-Level Audit Trails for Compliance

Latency Budgets: Local Database vs. API-Only Approaches

Graceful Degradation When Classification Is Unavailable

Framework-Specific Integration Patterns

Real-Time Defense Perimeter

Deploy Runtime Guardrails for Your Agent Stack

You are on the list!

Why Pre-Classified URLs for 102M Domains
Changes Everything for AI Agents