Every AI agent that accesses the web makes HTTP requests. Middleware sits between the agent and those requests, intercepting every outbound navigation, classifying the target domain against a 102 million domain database, and enforcing your policy rules before the request reaches the public internet. This is the most pragmatic, framework-agnostic approach to agent web governance — a thin layer of code that works with any agent stack.
Most agent frameworks ship with no built-in URL filtering. The agent decides where to go, and nothing stops it.
Popular agent frameworks — LangChain, CrewAI, AutoGen, custom OpenAI/Anthropic tool-calling agents — provide powerful abstractions for web browsing. They give agents the ability to search the web, visit URLs, extract content, and follow links. What they do not provide is a control layer between the agent's decision to visit a URL and the actual HTTP request. The agent says "navigate to example.com/admin" and the framework executes the request without question. There is no interception point, no classification step, no policy check.
Build a middleware layer that wraps your agent's HTTP client. Every outbound request passes through the middleware, which extracts the target domain, queries the 102M domain categorization database, evaluates the classification against your policy rules, and either allows the request to proceed or blocks it with a structured response. The middleware is framework-agnostic — it wraps the HTTP client, not the agent framework — so it works with LangChain, CrewAI, AutoGen, or any custom agent implementation.
The database provides the structured intelligence that makes the middleware useful: IAB v3 categories (four taxonomy tiers), web filtering categories (security-focused classifications), page-type labels (login, checkout, admin, settings, and 15+ more), OpenPageRank scores, and popularity rankings. Without this data, your middleware would be limited to basic blocklists. With it, your middleware can enforce sophisticated, context-aware policies.
How to insert categorization-powered filtering into any agent's request pipeline
The most common pattern: wrap your language's HTTP client (Python requests, aiohttp, Node.js fetch) with a class that intercepts every request. Before the underlying client fires, the wrapper queries the categorization database with the target domain. If the policy check passes, the wrapper delegates to the real HTTP client. If it fails, the wrapper returns a structured error without making the request. This pattern requires minimal code changes — swap your import statement, and every HTTP call is protected.
Some frameworks expose lifecycle hooks — events that fire before a tool executes. In LangChain, use a custom callback handler. In CrewAI, implement a pre-task hook. In AutoGen, register a function guard. These hooks call the middleware's classification function before the browsing tool runs, blocking the navigation at the framework level rather than the HTTP level. This pattern is cleaner but framework-specific.
Deploy the middleware as a separate process — a lightweight HTTP proxy that your agent routes all requests through. The proxy intercepts each request, queries the categorization database, enforces the policy, and forwards allowed requests to the target. This pattern works with any agent in any language without code changes — just set the HTTP_PROXY environment variable. It also provides a natural point for centralized logging and monitoring across multiple agents.
Production-ready middleware for Python and JavaScript agent stacks
import http.client
import json
from urllib.parse import urlparse
class NavigationMiddleware:
"""Middleware layer that intercepts agent HTTP
requests and enforces category-based policies."""
def __init__(self, api_key, policy_config):
self.api_key = api_key
self.policy = policy_config
self.domain_cache = {}
self.request_log = []
self.conn = http.client.HTTPSConnection(
"www.websitecategorizationapi.com"
)
def classify_domain(self, domain):
if domain in self.domain_cache:
return self.domain_cache[domain]
payload = (
f"query={domain}"
f"&api_key={self.api_key}"
f"&data_type=domain"
f"&expanded_categories=1"
)
headers = {
"Content-Type": "application/x-www-form-urlencoded"
}
self.conn.request(
"POST",
"/api/iab/iab_web_content_filtering.php",
payload, headers
)
data = json.loads(
self.conn.getresponse().read().decode("utf-8")
)
self.domain_cache[domain] = data
return data
def intercept(self, url, method="GET"):
"""Called before every HTTP request. Returns
(proceed, response_or_error)."""
parsed = urlparse(url)
domain = parsed.hostname
classification = self.classify_domain(domain)
page_type = classification.get(
"page_type", "unknown"
)
categories = [
c[0].split("Category name: ")[1]
for c in classification.get(
"iab_classification", []
)
]
# Evaluate against policy
for blocked_type in self.policy.get(
"blocked_page_types", []
):
if page_type == blocked_type:
return False, {
"blocked": True,
"reason": f"Page type: {page_type}",
"url": url
}
for cat in categories:
for blocked_cat in self.policy.get(
"blocked_categories", []
):
if blocked_cat.lower() in cat.lower():
return False, {
"blocked": True,
"reason": f"Category: {cat}",
"url": url
}
self.request_log.append({
"url": url, "domain": domain,
"page_type": page_type,
"categories": categories,
"action": "allow"
})
return True, classification
# Configure and use
policy = {
"blocked_categories": [
"Adult", "Malware", "Gambling", "Weapons"
],
"blocked_page_types": [
"login", "checkout", "admin", "settings"
]
}
mw = NavigationMiddleware("your_api_key", policy)
# Before any agent HTTP request:
proceed, result = mw.intercept(
"https://example.com/products"
)
if proceed:
# Execute the actual HTTP request
print("Request allowed, proceeding...")
else:
print(f"BLOCKED: {result['reason']}")
class AgentNavigationMiddleware {
constructor(apiKey, blockedCategories, blockedTypes) {
this.apiKey = apiKey;
this.blockedCategories = new Set(
blockedCategories.map(c => c.toLowerCase())
);
this.blockedTypes = new Set(blockedTypes);
this.cache = new Map();
}
async classify(domain) {
if (this.cache.has(domain)) {
return this.cache.get(domain);
}
const resp = await fetch(
"https://www.websitecategorizationapi.com" +
"/api/iab/iab_web_content_filtering.php",
{
method: "POST",
headers: {
"Content-Type":
"application/x-www-form-urlencoded"
},
body: new URLSearchParams({
query: domain,
api_key: this.apiKey,
data_type: "domain",
expanded_categories: "1"
})
}
);
const data = await resp.json();
this.cache.set(domain, data);
return data;
}
async guardedNavigate(url) {
const domain = new URL(url).hostname;
const data = await this.classify(domain);
const pageType = data.page_type || "unknown";
if (this.blockedTypes.has(pageType)) {
return {
allowed: false,
reason: `Blocked page type: ${pageType}`
};
}
const cats = (data.iab_classification || [])
.map(c => c[0]?.replace("Category name: ", ""))
.filter(Boolean);
for (const cat of cats) {
if (this.blockedCategories.has(cat.toLowerCase())) {
return {
allowed: false,
reason: `Blocked category: ${cat}`
};
}
}
return { allowed: true, classification: data };
}
}
// Wrap agent's navigation
const middleware = new AgentNavigationMiddleware(
"your_api_key",
["Adult", "Malware", "Gambling"],
["login", "checkout", "admin", "settings"]
);
const decision = await middleware.guardedNavigate(
"https://competitor.com/pricing"
);
if (!decision.allowed) {
console.log(`Navigation blocked: ${decision.reason}`);
}
Purpose-built domain databases for AI agent filtering. Includes IAB categories, 20+ page types, reputation scores, and popularity rankings. One-time purchase with perpetual license.
10 Million Domains with Page-Type Intelligence
One-time purchase: Perpetual license | Optional Updates: $1,599/year
20 Million Domains with Full Intelligence Suite
One-time purchase: Perpetual license | Optional Updates: $2,999/year
50 Million Domains with Complete Intelligence Suite
One-time purchase: Perpetual license | Optional Updates: $4,999/year
Also available: Enterprise URL Database up to 102M domains from $2,499. View all database tiers →
Search any IAB or Web Filtering category to see how many domains are in our 102M Enterprise Database — the same data your middleware queries for every navigation decision.
How 102 million domains from our main Enterprise Database are distributed across IAB v3 taxonomy classifications
Spanning Tier 1 through Tier 4 classifications from our 102M Enterprise Database
Charts display domain counts for the top 50 out of 700+ categories in our 102M Enterprise Database. To check the number of domains for the remaining 650+ categories, use the Category Counter tool above .
Middleware is a battle-tested architectural pattern. Web servers use middleware for authentication, logging, rate limiting, and CORS. API gateways use middleware for request transformation and validation. The same pattern applies perfectly to AI agent navigation control: insert a thin layer between the agent's intent to navigate and the actual HTTP request, and use that layer to enforce policies based on domain categorization data.
The middleware pattern works for agent navigation because it is framework-agnostic, low-latency, and composable. Framework-agnostic means you build the middleware once and use it across every agent in your organization, regardless of whether they run on LangChain, CrewAI, AutoGen, or a custom framework. Low-latency means the middleware adds less than one millisecond to each request when using a local database. Composable means you can stack multiple middleware layers — categorization check, rate limiting, logging, authentication — to build a complete governance pipeline.
When the agent decides to navigate to a URL, the middleware executes a four-step pipeline. Step one: extract the domain from the target URL. The middleware parses the URL to isolate the hostname, stripping path, query parameters, and fragments. Step two: query the categorization database. The middleware looks up the domain in the local database (Redis, SQLite, or in-memory dictionary) and receives the domain's IAB categories, web filtering categories, page type, reputation score, and popularity ranking. Step three: evaluate against the policy. The middleware checks the classification data against the active policy rules — blocked categories, blocked page types, minimum reputation thresholds, and scope restrictions. Step four: execute or reject. If the policy allows the request, the middleware passes it through to the underlying HTTP client. If the policy blocks it, the middleware returns a structured error to the agent without making the request.
Agents that make dozens or hundreds of requests per session will frequently revisit the same domains. A well-designed middleware includes a local cache — a simple dictionary or LRU cache — that stores classification results by domain. The first request for a domain triggers a database lookup; subsequent requests for the same domain are served from cache in under 0.01 milliseconds. For typical research agents, the cache hit rate exceeds 80% after the first few minutes of operation, meaning the database is consulted for fewer than 20% of requests.
Middleware must handle failure gracefully. If the categorization database is temporarily unavailable (disk failure, Redis restart, API timeout for the fallback), the middleware should default to "deny" — blocking the request rather than allowing unclassified navigation. This fail-closed approach ensures that a database outage does not create an unfiltered window. The middleware should log the failure, alert the operations team, and provide the agent with a clear error message: "Navigation blocked: classification service temporarily unavailable. Retry in 30 seconds."
LangChain provides a callback system that fires before and after tool executions. To integrate navigation middleware, create a custom callback handler that intercepts the WebBrowser tool's execution. In the on_tool_start callback, extract the target URL from the tool input, run it through the middleware's classification and policy check, and raise an exception if the URL is blocked. LangChain will catch the exception and report the blocked navigation back to the agent, which can then choose an alternative URL. This approach requires no changes to the agent's prompt or tools — the middleware is invisible to the LLM.
CrewAI organizes agent work into tasks. Before a task that involves web browsing executes, a pre-task hook can run the middleware's classification check on the task's target URLs. If any URL is blocked, the hook modifies the task to exclude the blocked URLs and logs the modification. This approach is cleaner than intercepting HTTP calls because it operates at the semantic level — the agent never even tries to navigate to a blocked URL, avoiding unnecessary retry loops.
For organizations that cannot modify agent source code — because agents run third-party binaries, or because code changes require lengthy approval cycles — the sidecar proxy pattern is ideal. Deploy the middleware as a lightweight HTTP proxy (using mitmproxy, Squid, or a custom proxy written in Go or Python) and configure the agent's environment to route all HTTP traffic through the proxy. The proxy intercepts each request, queries the categorization database, enforces the policy, and forwards allowed requests. This pattern works with any agent in any language without a single line of code change — just set the HTTP_PROXY and HTTPS_PROXY environment variables.
The middleware is the ideal location for comprehensive navigation logging. Because every request passes through it, the middleware can record a complete timeline of agent web activity: every URL visited, every domain classification, every policy decision. These logs can be structured as JSON events and piped to any observability platform — Elasticsearch, Datadog, Splunk, or a simple file-based log. The structured format makes it easy to build dashboards, set up alerts for anomalous navigation patterns, and produce compliance reports showing that agent web access was governed by consistent, enforceable policies.
The alternative to middleware-based filtering is prompt-based filtering — instructing the LLM to "avoid visiting login pages" or "do not access adult content." Prompt-based filtering has three fatal flaws. First, it is non-deterministic: the same URL may be classified differently on consecutive calls because the LLM's judgment is probabilistic. Second, it is bypassable: adversarial prompt injections can override safety instructions. Third, it is invisible: there is no audit trail of which URLs were evaluated and which were blocked. Middleware-based filtering solves all three problems: it is deterministic (the database returns the same classification every time), tamper-proof (the middleware operates outside the LLM's control), and fully auditable (every decision is logged with context).
Deploy middleware powered by 102 million classified domains. Framework-agnostic, sub-millisecond overhead, production-ready. One-time purchase, perpetual license.