WebsiteCategorizationAPI
Home
Demo Tools - Categorization
Website Categorization Text Classification URL Database Taxonomy Mapper
Demo Tools - Website Intel
Technology Detector Quality Score Competitor Finder
Demo Tools - Brand Safety
Brand Safety Checker Brand Suitability Quality Checker
Demo Tools - Content
Sentiment Analyzer Context Aware Ads
Resources
API Documentation Pricing Login
Try Categorization

Runtime Guardrails for AI That Browses the Web

Pre-flight checks are not enough. When an autonomous AI agent browses the web, every navigation event, redirect, and iframe load is a potential policy violation. Runtime guardrails execute in-line — at the exact moment the agent attempts to navigate — intercepting, classifying, and enforcing policy before the HTTP request leaves your network.

<1ms
Local Lookup Latency
102M
Pre-Classified Domains
20+
Page Type Detections
0
Hallucination Risk

The Problem: Static Allowlists Cannot Handle Dynamic Browsing

AI agents do not follow predictable paths. They follow links, get redirected, encounter new domains mid-session, and discover URLs that no pre-flight check could have anticipated.

Pre-Flight Checks Fail at Runtime

Most agent governance frameworks implement pre-flight validation: before the agent starts its task, a list of approved domains is compiled and loaded. The agent is told "you may visit these 50 domains." This approach fails catastrophically in practice because web browsing is inherently dynamic. A Google search for "enterprise software pricing" returns links the pre-flight list never anticipated. A redirect chain from an approved domain lands the agent on an unapproved one. An embedded iframe loads content from a third-party ad network. Every one of these runtime events bypasses the pre-flight allowlist entirely.

  • Redirect chains: An approved URL redirects through 2-3 intermediate domains before reaching the final destination — none of the intermediate domains were on the allowlist
  • Dynamic link discovery: Agents following search results or scraping link directories encounter URLs that did not exist at pre-flight time
  • Session drift: Multi-step browsing tasks accumulate navigation events over minutes or hours — the further into the session, the further from the original allowlist
  • Embedded content: Iframes, JavaScript-loaded resources, and API calls from visited pages pull content from domains the agent never explicitly navigated to

The Solution: In-Line Guardrails That Execute at Navigation Time

Runtime guardrails intercept every navigation event — not just the initial task URL, but every subsequent click, redirect, iframe load, and API call — and evaluate it against your policy rules in real-time. The evaluation uses a pre-classified domain database with 102 million entries, delivering sub-millisecond lookup times with zero external API dependencies. The classification is deterministic (no model inference, no hallucination risk) and the policy decision (allow, block, log, or escalate) executes before the HTTP request fires.

This architecture treats your agent's browser session like a network firewall treats packet flows: every request is inspected, classified, and either permitted or dropped based on policy rules. The difference is that instead of IP addresses and ports, the guardrail operates on URLs, IAB categories, page types, and reputation scores — giving your security team the same visibility and control over agent web traffic that they already have over employee web traffic.

Runtime Navigation Interception

Every URL evaluated in-line before the agent's HTTP request fires

How Runtime Guardrails Work

Four stages of in-line policy enforcement during an active agent browsing session

Navigation Intent Capture

The guardrail hooks into the agent's browser automation layer (Playwright, Puppeteer, Selenium, or direct CDP) and intercepts every navigation event before it executes. This includes explicit navigations (agent clicks a link), implicit navigations (JavaScript redirects), and passive loads (iframes, fetch calls). The interception point is synchronous — the browser blocks until the guardrail returns a verdict.

Real-Time Classification Lookup

The intercepted URL is extracted and queried against the local 102M domain database. The lookup returns IAB categories, web filtering categories, page-type labels, reputation scores, and popularity rankings — all in under 1 millisecond. For URLs not in the local database, a real-time API fallback classifies the domain on demand with an average latency under 200ms.

Policy Rule Evaluation

The classification result is evaluated against your policy rule set. Rules can match on any combination of IAB category tier, web filtering category, page type, reputation score, and popularity rank. The evaluation is deterministic — the same URL always produces the same decision. Rule priority, conflict resolution, and default actions are all configurable by your security team.

Request Classification Pipeline

Intercept > Classify > Evaluate > Allow or Block

Runtime Guardrail Integration Code

Middleware implementations that intercept agent navigation in real-time

Python — Playwright Runtime Interceptor

import http.client import json from urllib.parse import urlparse class RuntimeGuardrail: """In-line guardrail for Playwright browser sessions.""" BLOCKED_TYPES = ["login", "checkout", "admin", "settings"] BLOCKED_CATS = ["Adult", "Malware", "Phishing", "Gambling"] def __init__(self, api_key, local_db=None): self.api_key = api_key self.local_db = local_db # dict: domain -> classification self.session_log = [] self.conn = http.client.HTTPSConnection( "www.websitecategorizationapi.com" ) def intercept(self, route, request): """Playwright route handler — runs on EVERY request.""" url = request.url domain = urlparse(url).netloc # Step 1: Local DB lookup (sub-millisecond) classification = self._local_lookup(domain) # Step 2: API fallback if not in local DB if not classification: classification = self._api_classify(url) # Step 3: Evaluate policy decision = self._evaluate(classification, url) # Step 4: Log every decision self.session_log.append({ "url": url, "decision": decision["action"], "reason": decision["reason"], "category": classification.get("category", "Unknown") }) # Step 5: Allow or block if decision["action"] == "block": route.abort("blockedbyclient") else: route.continue_() def _local_lookup(self, domain): if self.local_db and domain in self.local_db: return self.local_db[domain] return None def _api_classify(self, url): payload = ( f"query={url}" f"&api_key={self.api_key}" f"&data_type=url" f"&expanded_categories=1" ) headers = { "Content-Type": "application/x-www-form-urlencoded" } self.conn.request( "POST", "/api/iab/iab_web_content_filtering.php", payload, headers ) res = self.conn.getresponse() return json.loads(res.read().decode("utf-8")) def _evaluate(self, data, url): page_type = data.get("page_type", "unknown") if page_type in self.BLOCKED_TYPES: return {"action": "block", "reason": f"Page type: {page_type}"} category = data.get("filtering_category", "Unknown") if category in self.BLOCKED_CATS: return {"action": "block", "reason": f"Category: {category}"} return {"action": "allow", "reason": "Policy passed"} # Usage with Playwright # page.route("**/*", guardrail.intercept)

JavaScript — CDP Request Interception

class RuntimeNavigationGuard { constructor(apiKey, localDB = new Map()) { this.apiKey = apiKey; this.localDB = localDB; this.sessionLog = []; this.blockedPageTypes = new Set([ "login", "checkout", "admin", "settings", "signup" ]); this.blockedCategories = new Set([ "Adult", "Malware", "Phishing", "Gambling" ]); } async onNavigationRequest(url) { const domain = new URL(url).hostname; // Local DB first (sub-ms) let classification = this.localDB.get(domain); // API fallback if (!classification) { classification = await this.apiClassify(url); } // Evaluate const decision = this.evaluate(classification); this.sessionLog.push({ url, ...decision, timestamp: Date.now() }); return decision; } evaluate(data) { const pageType = data?.page_type || "unknown"; if (this.blockedPageTypes.has(pageType)) { return { action: "block", reason: `Page type: ${pageType}` }; } const filterCat = data?.filtering_taxonomy?.[0]?.[0] ?.replace("Category name: ", "") || "Unknown"; if (this.blockedCategories.has(filterCat)) { return { action: "block", reason: `Category: ${filterCat}` }; } return { action: "allow", reason: "Policy passed" }; } async apiClassify(url) { const resp = await fetch( "https://www.websitecategorizationapi.com" + "/api/iab/iab_web_content_filtering.php", { method: "POST", headers: { "Content-Type": "application/x-www-form-urlencoded" }, body: new URLSearchParams({ query: url, api_key: this.apiKey, data_type: "url", expanded_categories: "1" }) } ); return await resp.json(); } getSessionReport() { const total = this.sessionLog.length; const blocked = this.sessionLog.filter( e => e.action === "block" ).length; return { total, allowed: total - blocked, blocked }; } }

Agent Browsing Session Timeline

Every navigation event intercepted and classified in real-time

AI Agent Database Pricing

Purpose-built domain databases for AI agent filtering. Includes IAB categories, 20+ page types, reputation scores, and popularity rankings. One-time purchase with perpetual license.

AI Agent Database
AI Agent Domain Database 10M
$7,999

10 Million Domains with Page-Type Intelligence

One-time purchase: Perpetual license  |  Optional Updates: $1,599/year

  • 10M+ Categorized Domains
  • IAB Taxonomies v2 & v3
  • 20+ Page Type Labels
  • Web Filtering Categories
  • OpenPageRank Scores
  • Global Popularity Rankings
Popular
AI Agent Domain Database 20M
$14,999

20 Million Domains with Full Intelligence Suite

One-time purchase: Perpetual license  |  Optional Updates: $2,999/year

  • 20M+ Categorized Domains
  • IAB Taxonomies v2 & v3
  • 20+ Page Type Labels
  • Web Filtering Categories
  • OpenPageRank Scores
  • Global & Country Rankings
  • Dedicated Account Manager
Maximum Coverage
AI Agent Domain Database 50M
$24,999

50 Million Domains with Complete Intelligence Suite

One-time purchase: Perpetual license  |  Optional Updates: $4,999/year

  • 50M+ Categorized Domains
  • IAB Taxonomies v2 & v3
  • 20+ Page Type Labels
  • Web Filtering Categories
  • OpenPageRank Scores
  • Global & Country Rankings
  • Dedicated Account Manager

Also available: Enterprise URL Database up to 102M domains from $2,499. View all database tiers →

How Many Domains in Each Category?

Search any IAB or Web Filtering category to see how many domains are in our 102M Enterprise Database — the same data your runtime guardrails query on every navigation event.

Popular:
Database Analytics

Domain Distribution by Category in Our 102M Enterprise Database

How 102 million domains from our main Enterprise Database are distributed across IAB v3 taxonomy classifications

Top 50 IAB v3 Categories

Spanning Tier 1 through Tier 4 classifications from our 102M Enterprise Database

IAB v3

Charts display domain counts for the top 50 out of 700+ categories in our 102M Enterprise Database. To check the number of domains for the remaining 650+ categories, use the Category Counter tool above .

Redirect Chain Interception

Following and classifying every hop in a redirect chain

Why Runtime Beats Pre-Flight for Agent Browsing Governance

The fundamental assumption behind pre-flight agent governance is that you can predict every URL an agent will visit before the browsing session begins. This assumption is incorrect for any task that involves web search, link following, or dynamic content discovery. A researcher agent tasked with "find the top 10 enterprise CRM vendors and compare their pricing" will discover URLs through search results, follow links from comparison sites, and navigate to vendor pages — none of which can be enumerated in advance. The agent needs governance that operates at runtime, not at plan time.

Runtime guardrails solve this by moving the policy enforcement point from the task planner to the browser automation layer. Instead of telling the agent "you may visit these specific URLs," you tell the browser "every URL must pass policy evaluation before loading." This inversion of control means the agent retains full flexibility in its browsing behavior while the guardrail retains full authority over what actually loads.

The Architecture of an In-Line Navigation Guardrail

An in-line guardrail sits between the agent's navigation intent and the actual HTTP request. In a Playwright-based agent, this is implemented as a route handler that intercepts all requests before they fire. In a Puppeteer-based agent, it uses CDP's Fetch.requestPaused event. In a Selenium-based agent, it wraps the WebDriver's navigate method with a pre-check. Regardless of the browser automation framework, the architecture is the same: intercept, classify, evaluate, decide.

The classification step is where the 102M domain database becomes critical. Every intercepted URL is extracted, the domain is looked up in the local database, and the classification result — IAB categories, page types, reputation scores — is returned in under 1 millisecond. This sub-millisecond latency means the guardrail adds negligible overhead to the agent's browsing session. The agent does not perceive a delay because the database lookup is faster than the network latency of the actual HTTP request it was about to make.

Handling Redirect Chains and Dynamic Destinations

One of the most dangerous gaps in pre-flight governance is redirect chains. An agent navigates to an approved URL — say, a marketing page on a known SaaS vendor's domain. That page includes a JavaScript redirect to a third-party analytics platform, which in turn redirects to a content delivery network, which serves the actual page from a completely different domain. The pre-flight allowlist approved the original URL but knows nothing about the intermediate hops or the final destination.

Runtime guardrails handle this by intercepting every navigation event in the chain, not just the initial one. Each redirect triggers a new interception, a new classification lookup, and a new policy evaluation. If any hop in the chain lands on a blocked category or page type, the entire chain is terminated at that point. The agent never reaches the final destination through a blocked intermediate domain.

Sub-Resource Monitoring: Beyond Page Navigation

Sophisticated runtime guardrails do not stop at page-level navigation. They also monitor sub-resource requests: JavaScript files loaded from external CDNs, API calls to third-party services, image assets served from ad networks, and WebSocket connections to real-time data feeds. Each of these sub-resources represents a potential data exfiltration channel or policy violation vector. An agent visiting an approved blog post might trigger a JavaScript beacon that sends browsing data to an advertising tracker — a data leakage event that only sub-resource monitoring can detect.

Session-Level Audit Trails for Compliance

Every decision the runtime guardrail makes — allow, block, log, or escalate — is recorded in a session-level audit trail. This trail includes the URL, the classification result, the policy rule that matched, the decision, and a timestamp. For regulated industries (financial services, healthcare, government), this audit trail is the evidence that your agent governance program actually works. Auditors do not accept "we told the agent not to visit bad websites" as compliance evidence. They need deterministic records: this URL was evaluated, this classification was returned, this policy rule was applied, and this decision was made at this timestamp.

Latency Budgets: Local Database vs. API-Only Approaches

Agent browsing sessions are latency-sensitive. An agent that takes 500 milliseconds to evaluate each URL will feel sluggish and waste compute time waiting for classification responses. API-only classification approaches introduce network round-trip latency (typically 100-300ms per request) into every navigation decision. At 50 navigation events per browsing session, that adds 5 to 15 seconds of pure classification overhead — time the agent spends waiting instead of working.

A local database lookup eliminates this overhead entirely. The 102M domain database loaded into Redis or SQLite returns classification results in under 1 millisecond. Over a 50-event browsing session, the total classification overhead is under 50 milliseconds — invisible to the agent and negligible compared to the network latency of the actual page loads. This is why runtime guardrails built on local databases are the only architecturally sound approach for production agent deployments.

Graceful Degradation When Classification Is Unavailable

Production systems must handle failure gracefully. What happens when the local database does not contain the URL and the API fallback times out? A well-designed runtime guardrail has a configurable default action for unclassified URLs. Conservative deployments set the default to "block and log" — the agent cannot visit unclassified URLs, and each block is logged for manual review. Permissive deployments set the default to "allow and alert" — the agent can proceed, but the security team receives a notification. The correct default depends on your organization's risk tolerance and the specific agent's task context.

Framework-Specific Integration Patterns

For Playwright-based agents (including Anthropic's Computer Use and many custom agent frameworks), the guardrail registers as a route handler via page.route("**/*", handler). For Puppeteer-based agents, it uses the CDP Fetch domain to intercept requests. For Selenium-based agents, it wraps the WebDriver's get() method and evaluates the URL before calling the underlying navigation. For higher-level frameworks like LangChain or CrewAI, the guardrail wraps the browsing tool's execute method, intercepting URLs before they reach the browser automation layer.

In all cases, the integration follows the same principle: the guardrail must execute synchronously in the navigation path. Asynchronous logging-only approaches are useful for observability but do not provide governance — by the time the log entry is written, the agent has already loaded the page, rendered the content, and potentially extracted sensitive data. True runtime guardrails are blocking: the page does not load until the guardrail returns a verdict.

Real-Time Defense Perimeter

Continuous monitoring of every agent navigation event

Deploy Runtime Guardrails for Your Agent Stack

Move from static allowlists to real-time, in-line navigation governance. Sub-millisecond classification, deterministic policy enforcement, complete session audit trails.

View AI Agent Database View 102M Enterprise Database
Stay in the loop

You are on the list!

We will send you updates that matter — no spam.