WebsiteCategorizationAPI
Home
Demo Tools - Categorization
Website Categorization Text Classification URL Database Taxonomy Mapper
Demo Tools - Website Intel
Technology Detector Quality Score Competitor Finder
Demo Tools - Brand Safety
Brand Safety Checker Brand Suitability Quality Checker
Demo Tools - Content
Sentiment Analyzer Context Aware Ads
Resources
API Documentation Pricing Login
Try Categorization

Middleware That Controls Where AI Agents Can Navigate

Every AI agent that accesses the web makes HTTP requests. Middleware sits between the agent and those requests, intercepting every outbound navigation, classifying the target domain against a 102 million domain database, and enforcing your policy rules before the request reaches the public internet. This is the most pragmatic, framework-agnostic approach to agent web governance — a thin layer of code that works with any agent stack.

102M
Classified Domains
700+
IAB Categories
20+
Page Types
<1ms
Middleware Overhead

The Problem: Agents Navigate Without Checkpoints

Most agent frameworks ship with no built-in URL filtering. The agent decides where to go, and nothing stops it.

The Missing Middleware Layer

Popular agent frameworks — LangChain, CrewAI, AutoGen, custom OpenAI/Anthropic tool-calling agents — provide powerful abstractions for web browsing. They give agents the ability to search the web, visit URLs, extract content, and follow links. What they do not provide is a control layer between the agent's decision to visit a URL and the actual HTTP request. The agent says "navigate to example.com/admin" and the framework executes the request without question. There is no interception point, no classification step, no policy check.

  • No interception hook: Most frameworks execute HTTP requests directly through the language runtime, providing no middleware insertion point
  • No classification data: Even if you could intercept the request, you would need structured data about the target domain to make an informed allow/block decision
  • No policy engine: Raw URL strings mean nothing to a policy system — you need categories, page types, and reputation scores to write meaningful rules
  • No audit capability: Without middleware, there is no centralized place to log navigation events for compliance and incident response

The Solution: A Categorization-Powered Middleware Layer

Build a middleware layer that wraps your agent's HTTP client. Every outbound request passes through the middleware, which extracts the target domain, queries the 102M domain categorization database, evaluates the classification against your policy rules, and either allows the request to proceed or blocks it with a structured response. The middleware is framework-agnostic — it wraps the HTTP client, not the agent framework — so it works with LangChain, CrewAI, AutoGen, or any custom agent implementation.

The database provides the structured intelligence that makes the middleware useful: IAB v3 categories (four taxonomy tiers), web filtering categories (security-focused classifications), page-type labels (login, checkout, admin, settings, and 15+ more), OpenPageRank scores, and popularity rankings. Without this data, your middleware would be limited to basic blocklists. With it, your middleware can enforce sophisticated, context-aware policies.

Middleware Interception Layer

Every request intercepted, classified, and evaluated before execution

Middleware Architecture: Three Integration Patterns

How to insert categorization-powered filtering into any agent's request pipeline

HTTP Client Wrapper

The most common pattern: wrap your language's HTTP client (Python requests, aiohttp, Node.js fetch) with a class that intercepts every request. Before the underlying client fires, the wrapper queries the categorization database with the target domain. If the policy check passes, the wrapper delegates to the real HTTP client. If it fails, the wrapper returns a structured error without making the request. This pattern requires minimal code changes — swap your import statement, and every HTTP call is protected.

Framework Hook Integration

Some frameworks expose lifecycle hooks — events that fire before a tool executes. In LangChain, use a custom callback handler. In CrewAI, implement a pre-task hook. In AutoGen, register a function guard. These hooks call the middleware's classification function before the browsing tool runs, blocking the navigation at the framework level rather than the HTTP level. This pattern is cleaner but framework-specific.

Sidecar Proxy Pattern

Deploy the middleware as a separate process — a lightweight HTTP proxy that your agent routes all requests through. The proxy intercepts each request, queries the categorization database, enforces the policy, and forwards allowed requests to the target. This pattern works with any agent in any language without code changes — just set the HTTP_PROXY environment variable. It also provides a natural point for centralized logging and monitoring across multiple agents.

Request Interception Pipeline

Extract domain, classify, evaluate policy, allow or block

Middleware Implementation Code

Production-ready middleware for Python and JavaScript agent stacks

Python — HTTP Client Middleware with Categorization

import http.client import json from urllib.parse import urlparse class NavigationMiddleware: """Middleware layer that intercepts agent HTTP requests and enforces category-based policies.""" def __init__(self, api_key, policy_config): self.api_key = api_key self.policy = policy_config self.domain_cache = {} self.request_log = [] self.conn = http.client.HTTPSConnection( "www.websitecategorizationapi.com" ) def classify_domain(self, domain): if domain in self.domain_cache: return self.domain_cache[domain] payload = ( f"query={domain}" f"&api_key={self.api_key}" f"&data_type=domain" f"&expanded_categories=1" ) headers = { "Content-Type": "application/x-www-form-urlencoded" } self.conn.request( "POST", "/api/iab/iab_web_content_filtering.php", payload, headers ) data = json.loads( self.conn.getresponse().read().decode("utf-8") ) self.domain_cache[domain] = data return data def intercept(self, url, method="GET"): """Called before every HTTP request. Returns (proceed, response_or_error).""" parsed = urlparse(url) domain = parsed.hostname classification = self.classify_domain(domain) page_type = classification.get( "page_type", "unknown" ) categories = [ c[0].split("Category name: ")[1] for c in classification.get( "iab_classification", [] ) ] # Evaluate against policy for blocked_type in self.policy.get( "blocked_page_types", [] ): if page_type == blocked_type: return False, { "blocked": True, "reason": f"Page type: {page_type}", "url": url } for cat in categories: for blocked_cat in self.policy.get( "blocked_categories", [] ): if blocked_cat.lower() in cat.lower(): return False, { "blocked": True, "reason": f"Category: {cat}", "url": url } self.request_log.append({ "url": url, "domain": domain, "page_type": page_type, "categories": categories, "action": "allow" }) return True, classification # Configure and use policy = { "blocked_categories": [ "Adult", "Malware", "Gambling", "Weapons" ], "blocked_page_types": [ "login", "checkout", "admin", "settings" ] } mw = NavigationMiddleware("your_api_key", policy) # Before any agent HTTP request: proceed, result = mw.intercept( "https://example.com/products" ) if proceed: # Execute the actual HTTP request print("Request allowed, proceeding...") else: print(f"BLOCKED: {result['reason']}")

JavaScript — Fetch Middleware for Agent Navigation

class AgentNavigationMiddleware { constructor(apiKey, blockedCategories, blockedTypes) { this.apiKey = apiKey; this.blockedCategories = new Set( blockedCategories.map(c => c.toLowerCase()) ); this.blockedTypes = new Set(blockedTypes); this.cache = new Map(); } async classify(domain) { if (this.cache.has(domain)) { return this.cache.get(domain); } const resp = await fetch( "https://www.websitecategorizationapi.com" + "/api/iab/iab_web_content_filtering.php", { method: "POST", headers: { "Content-Type": "application/x-www-form-urlencoded" }, body: new URLSearchParams({ query: domain, api_key: this.apiKey, data_type: "domain", expanded_categories: "1" }) } ); const data = await resp.json(); this.cache.set(domain, data); return data; } async guardedNavigate(url) { const domain = new URL(url).hostname; const data = await this.classify(domain); const pageType = data.page_type || "unknown"; if (this.blockedTypes.has(pageType)) { return { allowed: false, reason: `Blocked page type: ${pageType}` }; } const cats = (data.iab_classification || []) .map(c => c[0]?.replace("Category name: ", "")) .filter(Boolean); for (const cat of cats) { if (this.blockedCategories.has(cat.toLowerCase())) { return { allowed: false, reason: `Blocked category: ${cat}` }; } } return { allowed: true, classification: data }; } } // Wrap agent's navigation const middleware = new AgentNavigationMiddleware( "your_api_key", ["Adult", "Malware", "Gambling"], ["login", "checkout", "admin", "settings"] ); const decision = await middleware.guardedNavigate( "https://competitor.com/pricing" ); if (!decision.allowed) { console.log(`Navigation blocked: ${decision.reason}`); }

Domain Classification Flow

Domain extracted, cached lookup, policy evaluation in microseconds

AI Agent Database Pricing

Purpose-built domain databases for AI agent filtering. Includes IAB categories, 20+ page types, reputation scores, and popularity rankings. One-time purchase with perpetual license.

AI Agent Database
AI Agent Domain Database 10M
$7,999

10 Million Domains with Page-Type Intelligence

One-time purchase: Perpetual license  |  Optional Updates: $1,599/year

  • 10M+ Categorized Domains
  • IAB Taxonomies v2 & v3
  • 20+ Page Type Labels
  • Web Filtering Categories
  • OpenPageRank Scores
  • Global Popularity Rankings
Popular
AI Agent Domain Database 20M
$14,999

20 Million Domains with Full Intelligence Suite

One-time purchase: Perpetual license  |  Optional Updates: $2,999/year

  • 20M+ Categorized Domains
  • IAB Taxonomies v2 & v3
  • 20+ Page Type Labels
  • Web Filtering Categories
  • OpenPageRank Scores
  • Global & Country Rankings
  • Dedicated Account Manager
Maximum Coverage
AI Agent Domain Database 50M
$24,999

50 Million Domains with Complete Intelligence Suite

One-time purchase: Perpetual license  |  Optional Updates: $4,999/year

  • 50M+ Categorized Domains
  • IAB Taxonomies v2 & v3
  • 20+ Page Type Labels
  • Web Filtering Categories
  • OpenPageRank Scores
  • Global & Country Rankings
  • Dedicated Account Manager

Also available: Enterprise URL Database up to 102M domains from $2,499. View all database tiers →

How Many Domains in Each Category?

Search any IAB or Web Filtering category to see how many domains are in our 102M Enterprise Database — the same data your middleware queries for every navigation decision.

Popular:
Database Analytics

Domain Distribution by Category in Our 102M Enterprise Database

How 102 million domains from our main Enterprise Database are distributed across IAB v3 taxonomy classifications

Top 50 IAB v3 Categories

Spanning Tier 1 through Tier 4 classifications from our 102M Enterprise Database

IAB v3

Charts display domain counts for the top 50 out of 700+ categories in our 102M Enterprise Database. To check the number of domains for the remaining 650+ categories, use the Category Counter tool above .

Multi-Framework Middleware Compatibility

One middleware pattern, works with LangChain, CrewAI, AutoGen, and custom agents

The Middleware Pattern: Why It Works for Agent Navigation Control

Middleware is a battle-tested architectural pattern. Web servers use middleware for authentication, logging, rate limiting, and CORS. API gateways use middleware for request transformation and validation. The same pattern applies perfectly to AI agent navigation control: insert a thin layer between the agent's intent to navigate and the actual HTTP request, and use that layer to enforce policies based on domain categorization data.

The middleware pattern works for agent navigation because it is framework-agnostic, low-latency, and composable. Framework-agnostic means you build the middleware once and use it across every agent in your organization, regardless of whether they run on LangChain, CrewAI, AutoGen, or a custom framework. Low-latency means the middleware adds less than one millisecond to each request when using a local database. Composable means you can stack multiple middleware layers — categorization check, rate limiting, logging, authentication — to build a complete governance pipeline.

The Anatomy of a Navigation Middleware Request

When the agent decides to navigate to a URL, the middleware executes a four-step pipeline. Step one: extract the domain from the target URL. The middleware parses the URL to isolate the hostname, stripping path, query parameters, and fragments. Step two: query the categorization database. The middleware looks up the domain in the local database (Redis, SQLite, or in-memory dictionary) and receives the domain's IAB categories, web filtering categories, page type, reputation score, and popularity ranking. Step three: evaluate against the policy. The middleware checks the classification data against the active policy rules — blocked categories, blocked page types, minimum reputation thresholds, and scope restrictions. Step four: execute or reject. If the policy allows the request, the middleware passes it through to the underlying HTTP client. If the policy blocks it, the middleware returns a structured error to the agent without making the request.

Caching for High-Throughput Agents

Agents that make dozens or hundreds of requests per session will frequently revisit the same domains. A well-designed middleware includes a local cache — a simple dictionary or LRU cache — that stores classification results by domain. The first request for a domain triggers a database lookup; subsequent requests for the same domain are served from cache in under 0.01 milliseconds. For typical research agents, the cache hit rate exceeds 80% after the first few minutes of operation, meaning the database is consulted for fewer than 20% of requests.

Error Handling and Graceful Degradation

Middleware must handle failure gracefully. If the categorization database is temporarily unavailable (disk failure, Redis restart, API timeout for the fallback), the middleware should default to "deny" — blocking the request rather than allowing unclassified navigation. This fail-closed approach ensures that a database outage does not create an unfiltered window. The middleware should log the failure, alert the operations team, and provide the agent with a clear error message: "Navigation blocked: classification service temporarily unavailable. Retry in 30 seconds."

Middleware in LangChain: Custom Callback Handler

LangChain provides a callback system that fires before and after tool executions. To integrate navigation middleware, create a custom callback handler that intercepts the WebBrowser tool's execution. In the on_tool_start callback, extract the target URL from the tool input, run it through the middleware's classification and policy check, and raise an exception if the URL is blocked. LangChain will catch the exception and report the blocked navigation back to the agent, which can then choose an alternative URL. This approach requires no changes to the agent's prompt or tools — the middleware is invisible to the LLM.

Middleware in CrewAI: Pre-Task Hook

CrewAI organizes agent work into tasks. Before a task that involves web browsing executes, a pre-task hook can run the middleware's classification check on the task's target URLs. If any URL is blocked, the hook modifies the task to exclude the blocked URLs and logs the modification. This approach is cleaner than intercepting HTTP calls because it operates at the semantic level — the agent never even tries to navigate to a blocked URL, avoiding unnecessary retry loops.

Middleware as a Sidecar Proxy: Zero Code Changes

For organizations that cannot modify agent source code — because agents run third-party binaries, or because code changes require lengthy approval cycles — the sidecar proxy pattern is ideal. Deploy the middleware as a lightweight HTTP proxy (using mitmproxy, Squid, or a custom proxy written in Go or Python) and configure the agent's environment to route all HTTP traffic through the proxy. The proxy intercepts each request, queries the categorization database, enforces the policy, and forwards allowed requests. This pattern works with any agent in any language without a single line of code change — just set the HTTP_PROXY and HTTPS_PROXY environment variables.

Logging and Observability

The middleware is the ideal location for comprehensive navigation logging. Because every request passes through it, the middleware can record a complete timeline of agent web activity: every URL visited, every domain classification, every policy decision. These logs can be structured as JSON events and piped to any observability platform — Elasticsearch, Datadog, Splunk, or a simple file-based log. The structured format makes it easy to build dashboards, set up alerts for anomalous navigation patterns, and produce compliance reports showing that agent web access was governed by consistent, enforceable policies.

Why Middleware Outperforms Prompt-Based Filtering

The alternative to middleware-based filtering is prompt-based filtering — instructing the LLM to "avoid visiting login pages" or "do not access adult content." Prompt-based filtering has three fatal flaws. First, it is non-deterministic: the same URL may be classified differently on consecutive calls because the LLM's judgment is probabilistic. Second, it is bypassable: adversarial prompt injections can override safety instructions. Third, it is invisible: there is no audit trail of which URLs were evaluated and which were blocked. Middleware-based filtering solves all three problems: it is deterministic (the database returns the same classification every time), tamper-proof (the middleware operates outside the LLM's control), and fully auditable (every decision is logged with context).

Composable Middleware Stack

Classification, policy, logging, and rate limiting in a single pipeline

Add Navigation Control to Your Agents Today

Deploy middleware powered by 102 million classified domains. Framework-agnostic, sub-millisecond overhead, production-ready. One-time purchase, perpetual license.

View AI Agent Database View 102M Enterprise Database
Stay in the loop

You are on the list!

We will send you updates that matter — no spam.