Autonomous AI agents are browsing the open web — and without a reliable categorization layer, they have zero awareness of where they are navigating. Our 102 million domain database gives your agent harness the structured intelligence it needs to make real-time allow/block decisions based on IAB categories, page types, and site reputation signals.
Without a URL categorization layer, autonomous agents have no mechanism to distinguish between a benign product page and a corporate admin panel.
When an AI agent receives an instruction like "research competitor pricing," it needs to visit dozens of websites. Without URL categorization data, the agent has no way to know whether it is landing on a public marketing page, a login portal, a payment checkout flow, or an internal HR portal. Every uncategorized navigation event is a potential compliance incident, a data exposure risk, or a brand safety violation.
Our 102 million domain database transforms raw URLs into structured, actionable intelligence that your agent harness can consume in microseconds. Every domain comes pre-tagged with IAB v3 taxonomy categories, web filtering classifications, page-type labels (login, checkout, settings, pricing, careers, contact, and 15+ more), reputation scores, and popularity signals.
Instead of building your own classifier — which requires continuous training data, model maintenance, and latency overhead — you deploy a lookup table that covers 99.5% of the active internet. Your agent checks the database before every navigation event: green-light for approved categories, red-flag for blocked page types, yellow-hold for categories requiring human review.
Three integration patterns that turn a static database into a dynamic agent control plane
Deploy the full 102M database on-premise or in your VPC. Every URL the agent wants to visit gets checked against the local store in under 1ms. No external API calls, no latency penalty, no data leaving your network. The database ships as CSV or JSON — load it into Redis, PostgreSQL, SQLite, or any key-value store your agent stack already uses.
For domains not in your local cache, the API classifies any URL on demand. Send the domain, receive IAB categories, page types, reputation signals, and content sentiment in a single JSON response. Average latency under 200ms. Use this as a fallback for the long tail of newly registered or rarely visited domains.
Map database fields directly to your agent policy rules. IAB category "Illegal Content" → hard block. Page type "login" → block with audit log. Web filtering category "Adult" → block. Category "Business and Finance" → allow with monitoring. The mapping is deterministic — no probabilistic model in the decision path.
Production-ready snippets to plug URL categorization into your agent harness
import http.client
import json
class AgentURLFilter:
"""Middleware that checks every URL before an AI agent navigates."""
BLOCKED_PAGE_TYPES = ["login", "checkout", "settings", "admin"]
BLOCKED_CATEGORIES = ["Adult", "Illegal Content", "Malware"]
def __init__(self, api_key):
self.api_key = api_key
self.conn = http.client.HTTPSConnection(
"www.websitecategorizationapi.com"
)
def classify_url(self, target_url):
payload = (
f"query={target_url}"
f"&api_key={self.api_key}"
f"&data_type=url"
f"&expanded_categories=1"
)
headers = {
"Content-Type": "application/x-www-form-urlencoded"
}
self.conn.request(
"POST",
"/api/iab/iab_web_content_filtering.php",
payload,
headers
)
res = self.conn.getresponse()
return json.loads(res.read().decode("utf-8"))
def should_allow(self, target_url):
data = self.classify_url(target_url)
categories = [
c[0].split("Category name: ")[1]
for c in data.get("iab_classification", [])
]
page_type = data.get("page_type", "unknown")
if page_type in self.BLOCKED_PAGE_TYPES:
return False, f"Blocked page type: {page_type}"
for cat in categories:
for blocked in self.BLOCKED_CATEGORIES:
if blocked.lower() in cat.lower():
return False, f"Blocked category: {cat}"
return True, "Navigation approved"
# Usage in agent harness
filter = AgentURLFilter(api_key="your_api_key")
allowed, reason = filter.should_allow("https://example.com/admin")
if not allowed:
print(f"Agent blocked: {reason}")
async function agentNavigationGuard(targetURL, policyRules) {
const response = await fetch(
"https://www.websitecategorizationapi.com" +
"/api/iab/iab_web_content_filtering.php",
{
method: "POST",
headers: {
"Content-Type": "application/x-www-form-urlencoded"
},
body: new URLSearchParams({
query: targetURL,
api_key: policyRules.apiKey,
data_type: "url",
expanded_categories: "1"
})
}
);
const classification = await response.json();
const filterCategory =
classification.filtering_taxonomy?.[0]?.[0]
?.replace("Category name: ", "") || "Unknown";
const decision = {
url: targetURL,
category: filterCategory,
action: "allow",
timestamp: new Date().toISOString()
};
if (policyRules.blockedCategories.includes(filterCategory)) {
decision.action = "block";
}
return decision;
}
Purpose-built domain databases for AI agent filtering. Includes IAB categories, 20+ page types, reputation scores, and popularity rankings. One-time purchase with perpetual license.
10 Million Domains with Page-Type Intelligence
One-time purchase: Perpetual license | Optional Updates: $1,599/year
20 Million Domains with Full Intelligence Suite
One-time purchase: Perpetual license | Optional Updates: $2,999/year
50 Million Domains with Complete Intelligence Suite
One-time purchase: Perpetual license | Optional Updates: $4,999/year
Also available: Enterprise URL Database up to 102M domains from $2,499. View all database tiers →
Search any IAB or Web Filtering category to see how many domains are in our 102M Enterprise Database — the same data your AI agent filtering rules will reference.
How 102 million domains from our main Enterprise Database are distributed across IAB v3 taxonomy classifications
Spanning Tier 1 through Tier 4 classifications from our 102M Enterprise Database
Charts display domain counts for the top 50 out of 700+ categories in our 102M Enterprise Database. To check the number of domains for the remaining 650+ categories, use the Category Counter tool above .
The shift from chat-based AI to agentic AI means language models are no longer passively answering questions — they are actively navigating websites, clicking buttons, filling forms, and making decisions on behalf of users. This transition creates an entirely new threat surface. A chatbot that hallucinates a URL is annoying; an agent that navigates to that URL and submits credentials is a security incident.
URL categorization databases address this gap by providing the structured metadata that agents lack natively. When an agent receives a URL — whether from its own web search, a user instruction, or a tool call — the categorization layer instantly resolves it to a known category, page type, and reputation score. This resolution happens deterministically, without model inference, which means zero hallucination risk in the decision path.
Beyond IAB content categories, page-type detection is the critical differentiator for agent filtering. Knowing that a domain belongs to the "Business and Finance" IAB category is useful for content filtering. Knowing that the specific page the agent is about to visit is a login page, a checkout page, or a settings panel is essential for security.
Our database classifies pages into 20+ distinct types: homepage, about, contact, pricing, careers, login, signup, checkout, settings, admin, legal, privacy policy, terms of service, blog, documentation, API reference, support, FAQ, forum, and product pages. Each page type can be mapped to a policy action — allow, block, flag for review, or log for audit.
Some teams attempt to build URL filtering directly into their agent's prompt or use a secondary LLM to evaluate each URL. This approach has three fundamental problems. First, it introduces latency — every URL evaluation requires a model inference call, adding 500ms to 2 seconds to each navigation decision. Second, it is non-deterministic — the same URL may be classified differently on consecutive calls, creating inconsistent policy enforcement. Third, it is expensive — at $0.01 to $0.03 per evaluation, filtering 10,000 URLs per day costs $100 to $300 daily.
A database lookup eliminates all three problems. The data is pre-computed, so latency is sub-millisecond. The classification is static until the next database update, so policy enforcement is consistent. And the database is a one-time purchase, so the per-query cost drops to effectively zero after acquisition.
The IAB Content Taxonomy v3 organizes websites into a hierarchical structure with four tiers of increasing specificity. Tier 1 categories like "Technology & Computing" or "Business and Finance" provide broad domain awareness. Tier 4 categories like "Artificial Intelligence > Machine Learning > Natural Language Processing" provide granular topic resolution.
For agent filtering, the most effective approach is to define policy rules at multiple tiers simultaneously. Block all Tier 1 categories related to sensitive content (Adult, Illegal, Gambling). Allow specific Tier 2 categories that match the agent's task scope (e.g., "Business and Finance > Financial Services" for a financial research agent). Flag Tier 3 and Tier 4 categories for logging when they represent edge cases that may require human review.
In addition to IAB taxonomy, our database includes web filtering categories specifically designed for security and compliance use cases. These categories — such as Malware, Phishing, Spam, Adult, Gambling, Weapons, and Drugs — map directly to the blocking rules that enterprise web proxies and CASBs already enforce for human users. Extending these same categories to AI agents creates a consistent security posture across your entire organization.
The 102M domain database ships as a flat file — CSV or JSON — that you can ingest into any data store. Common deployment patterns include loading the data into Redis for sub-millisecond lookups, importing into PostgreSQL for SQL-based policy queries, or embedding a SQLite file directly alongside your agent runtime. For cloud-native deployments, teams often load the data into DynamoDB or Cloud Firestore for serverless agent architectures.
Regardless of the storage backend, the integration pattern is the same: intercept the agent's navigation intent, extract the target URL, query the database, evaluate the result against your policy rules, and either allow or block the navigation before the agent's HTTP request fires.
No static database covers every domain on the internet. New domains are registered at a rate of approximately 50,000 per day. To handle the long tail of newly registered, rarely visited, or dynamically generated URLs, pair the offline database with our real-time API. When a URL lookup returns no match in the local database, the agent's middleware sends the URL to the API for on-demand classification. The API response includes the same IAB categories, page types, and reputation signals as the database — ensuring consistent policy evaluation regardless of the data source.
Whether you are building on LangChain, CrewAI, AutoGen, or a custom agent framework, the integration pattern follows the same middleware approach. In LangChain, implement a custom Tool that wraps the database lookup and returns a structured allow/block decision. In CrewAI, add a pre-navigation hook to the agent's browsing tool that checks the database before each HTTP request. In AutoGen, register a function call that the agent invokes before every URL visit. The key principle is that the categorization check must execute before the navigation — not after.
The market for agent filtering is broad and growing rapidly as organizations move from pilot AI agent deployments to production. The primary buyers include enterprise security teams deploying browser-using agents like Anthropic's Computer Use, OpenAI's Operator, or Google's Project Mariner. These teams need to enforce the same URL filtering policies on agents that they already enforce on employees via web proxies and CASBs.
Platform vendors building agent orchestration tools need categorization data to offer their customers built-in governance controls. Without this data, their platforms ship with a "deploy and hope" security model that enterprise buyers will not accept.
Managed service providers operating AI agents on behalf of clients need URL categorization to prove compliance with client security policies and regulatory requirements. The database provides the audit trail: every domain the agent visited, its category, its page type, and the policy decision that was made.
An agent filtering database is only as good as its coverage. If 20% of the URLs an agent encounters return "unknown" from the database, your policy engine defaults to either blocking (which halts the agent's workflow) or allowing (which defeats the purpose of filtering). Our 102M domain database covers 99.5% of the active internet as measured by the Google Chrome User Experience Report. This means that for virtually every domain an agent will encounter in normal operation, the database already has a classification ready.
The remaining 0.5% — newly registered domains, parked pages, and extremely niche sites — are handled by the real-time API fallback, ensuring 100% coverage in practice.
Deploy URL categorization as the foundation of your AI agent governance strategy. One-time purchase, perpetual license, 102 million domains classified and ready.