A new category of enterprise software is forming around a single problem: controlling what AI agents can access on the open web. As organizations deploy autonomous agents for research, sales, customer support, and operations, the need for structured governance has outpaced the tooling available. URL categorization databases are emerging as the foundational data layer that every governance vendor needs to build on.
Enterprises have mature governance stacks for human web access — proxies, CASBs, DLP systems, and SWGs. None of these tools were designed for autonomous AI agents browsing the open internet at machine speed.
Enterprise adoption of browser-using AI agents — Anthropic Computer Use, OpenAI Operator, Google Project Mariner, and dozens of custom implementations — is accelerating faster than the security industry can respond. Organizations are deploying agents into production with no standardized way to control which websites the agent can visit, which page types it can interact with, or which content categories it should avoid entirely. The result is a governance vacuum: agents operate on the open web with the same unfettered access as a fresh browser install with no enterprise policies applied.
Regardless of which governance vendor emerges as the market leader, they will all need the same foundational data: a comprehensive, pre-classified database of domains mapped to content categories, page types, and reputation scores. This is the data layer that makes policy enforcement possible. Without it, any governance tool is reduced to simple domain allowlists and blocklists — a manual, brittle approach that cannot scale with the volume and diversity of agent web traffic.
Our 102 million domain database provides this foundation. Every domain is tagged with IAB v3 taxonomy categories (700+ labels across 4 tiers), web filtering classifications (Adult, Malware, Phishing, Gambling, and dozens more), page-type labels (login, checkout, admin, settings, pricing, documentation, and 15+ additional types), OpenPageRank reputation scores, and global popularity rankings. This is the raw material that governance vendors need to build policy engines, audit dashboards, and compliance reports for AI agent web access.
The market is splitting into distinct segments — all requiring domain classification data as their core intelligence layer
These vendors focus on real-time threat prevention for agent browsing. They intercept agent HTTP requests, evaluate destination risk, and block malicious or policy-violating navigation before it occurs. Their core capability is inline enforcement — acting as a proxy or middleware between the agent runtime and the internet. They need URL categorization data to classify destinations in real-time without building and maintaining their own classification models.
These vendors provide visibility and analytics for agent web activity after the fact. They ingest agent traffic logs, enrich them with category metadata, and produce dashboards showing which content categories agents accessed, which page types they interacted with, and which policy violations occurred. They need URL categorization data to transform raw URL logs into structured, queryable intelligence that security teams can actually use for threat hunting and compliance auditing.
These vendors help organizations demonstrate that their AI agents comply with industry regulations, internal policies, and client contracts. They provide audit trails, policy templates, and compliance reports showing what agents accessed and whether that access was authorized. They need URL categorization data to map agent activity to specific compliance requirements — for example, proving that no agent accessed adult content (CIPA compliance) or that all agent-visited domains were within an approved industry vertical.
Code patterns that governance vendors and enterprise teams use to integrate URL classification into agent policy engines
import http.client
import json
class GovernancePolicyEngine:
"""Evaluates agent navigation against governance policies
using URL categorization data as the decision layer."""
def __init__(self, api_key, policy_config):
self.api_key = api_key
self.policy = policy_config
self.conn = http.client.HTTPSConnection(
"www.websitecategorizationapi.com"
)
self.audit_log = []
def classify(self, url):
payload = (
f"query={url}"
f"&api_key={self.api_key}"
f"&data_type=url"
f"&expanded_categories=1"
)
headers = {
"Content-Type": "application/x-www-form-urlencoded"
}
self.conn.request(
"POST",
"/api/iab/iab_web_content_filtering.php",
payload,
headers
)
res = self.conn.getresponse()
return json.loads(res.read().decode("utf-8"))
def evaluate(self, url, agent_id, agent_role):
data = self.classify(url)
page_type = data.get("page_type", "unknown")
categories = [
c[0].split("Category name: ")[1]
for c in data.get("iab_classification", [])
]
# Check role-based category allowlist
allowed_cats = self.policy.get(
"role_allowlists", {}
).get(agent_role, [])
blocked_types = self.policy.get(
"blocked_page_types", []
)
decision = "allow"
reason = "Within policy scope"
if page_type in blocked_types:
decision = "block"
reason = f"Page type '{page_type}' is blocked"
elif allowed_cats:
match = any(
any(a.lower() in c.lower() for a in allowed_cats)
for c in categories
)
if not match:
decision = "flag"
reason = "Category outside agent role scope"
record = {
"agent_id": agent_id,
"role": agent_role,
"url": url,
"categories": categories,
"page_type": page_type,
"decision": decision,
"reason": reason
}
self.audit_log.append(record)
return record
# Define governance policy
policy = {
"role_allowlists": {
"research": ["Technology", "Business", "News"],
"sales": ["Business", "Shopping", "Marketing"],
"support": ["Technology", "Customer Service"]
},
"blocked_page_types": [
"login", "checkout", "admin", "settings"
]
}
engine = GovernancePolicyEngine("your_key", policy)
result = engine.evaluate(
"https://example.com/pricing",
agent_id="agent-042",
agent_role="research"
)
print(f"[{result['decision'].upper()}] {result['reason']}")
class ComplianceAuditTrail {
constructor(apiKey, complianceFramework) {
this.apiKey = apiKey;
this.framework = complianceFramework;
this.entries = [];
}
async recordNavigation(url, agentId, taskId) {
const response = await fetch(
"https://www.websitecategorizationapi.com" +
"/api/iab/iab_web_content_filtering.php",
{
method: "POST",
headers: {
"Content-Type": "application/x-www-form-urlencoded"
},
body: new URLSearchParams({
query: url,
api_key: this.apiKey,
data_type: "url",
expanded_categories: "1"
})
}
);
const data = await response.json();
const filterCat =
data.filtering_taxonomy?.[0]?.[0]
?.replace("Category name: ", "") || "Unknown";
const violations = this.checkCompliance(
filterCat, data.page_type
);
const entry = {
timestamp: new Date().toISOString(),
agentId, taskId, url,
category: filterCat,
pageType: data.page_type || "unknown",
compliant: violations.length === 0,
violations
};
this.entries.push(entry);
return entry;
}
checkCompliance(category, pageType) {
const violations = [];
const rules = this.framework.rules || [];
rules.forEach(rule => {
if (rule.blockedCategories?.includes(category)) {
violations.push(
`${rule.name}: category '${category}' prohibited`
);
}
if (rule.blockedPageTypes?.includes(pageType)) {
violations.push(
`${rule.name}: page type '${pageType}' prohibited`
);
}
});
return violations;
}
generateReport() {
const total = this.entries.length;
const compliant = this.entries
.filter(e => e.compliant).length;
return {
framework: this.framework.name,
totalNavigations: total,
compliantNavigations: compliant,
complianceRate: ((compliant/total)*100).toFixed(1)+"%",
violations: this.entries.filter(e => !e.compliant)
};
}
}
Purpose-built domain databases for AI agent governance platforms. Includes IAB categories, 20+ page types, reputation scores, and popularity rankings. One-time purchase with perpetual license.
10 Million Domains with Page-Type Intelligence
One-time purchase: Perpetual license | Optional Updates: $1,599/year
20 Million Domains with Full Intelligence Suite
One-time purchase: Perpetual license | Optional Updates: $2,999/year
50 Million Domains with Complete Intelligence Suite
One-time purchase: Perpetual license | Optional Updates: $4,999/year
Also available: Enterprise URL Database up to 102M domains from $2,499. View all database tiers →
Search any IAB or Web Filtering category to see how many domains are in our 102M Enterprise Database — the foundational data that governance vendors build on.
How 102 million domains from our main Enterprise Database are distributed across IAB v3 taxonomy classifications
Spanning Tier 1 through Tier 4 classifications from our 102M Enterprise Database
Charts display domain counts for the top 50 out of 700+ categories in our 102M Enterprise Database. To check the number of domains for the remaining 650+ categories, use the Category Counter tool above .
The AI agent governance market is being shaped by a convergence of three forces. First, the rapid proliferation of browser-using AI agents across enterprise workflows. Second, the regulatory pressure from the EU AI Act, NIST AI Risk Management Framework, and emerging state-level AI legislation in the US. Third, the realization among CISOs and compliance officers that existing web security tools — built for human users with browsers and SSO sessions — simply do not work for autonomous agents.
This convergence is creating urgent demand for a new category of enterprise software. The market is nascent — most solutions in 2025 and 2026 are internal tools built by early-adopter engineering teams. But the pattern is unmistakable: every team building agent governance reaches the same conclusion within weeks of starting. They need a comprehensive, pre-classified URL database to power their policy engine. They need content categories, page types, and reputation scores. They need coverage that matches the breadth of the internet the agent will encounter. And they need it in a format that can be deployed locally for sub-millisecond latency.
The first instinct of many engineering teams is to build their own URL classifier. They fine-tune a transformer model on labeled URL data, deploy it as a microservice, and wire it into their agent middleware. This approach works in demos and proofs of concept. It fails at production scale for three reasons.
First, coverage. A fine-tuned classifier can handle the domains it was trained on, but the internet has over 350 million registered domains. Training data that covers even 1% of that space requires 3.5 million labeled examples — a dataset that takes months and significant resources to curate. Our database covers 102 million domains, representing 99.5% of active internet traffic. No in-house classifier comes close to this coverage without years of investment.
Second, latency. Model inference for URL classification typically takes 50-200ms per URL. An agent visiting 100 domains in a session adds 5-20 seconds of classification overhead. A database lookup takes under 1ms. At scale — thousands of agents, millions of URLs per day — the latency difference between model inference and database lookup is the difference between a responsive agent and a sluggish one.
Third, maintenance. Models drift. New domains appear daily. Content on existing domains changes. A production classifier requires continuous retraining, data pipeline maintenance, model monitoring, and infrastructure management. A database requires periodic updates — quarterly or annually — with no model infrastructure to maintain.
Content categories tell you what a website is about. Page types tell you what a page does. This distinction is critical for agent governance because the risk profile of a domain depends not just on its topic but on the specific functionality the agent encounters. A banking website (category: "Financial Services") has very different risk profiles for its marketing page (page type: "homepage") versus its login page (page type: "login") versus its funds transfer page (page type: "checkout").
Governance vendors who rely solely on domain-level categories cannot express policies like "agents may visit financial services websites but may not interact with login pages on those websites." This level of granularity requires page-type detection — a capability that our database provides for 20+ distinct page types. The combination of IAB categories plus page types gives governance vendors the multi-dimensional classification they need to build nuanced, production-grade policies.
Governance vendors integrating URL categorization data into their platforms have two deployment options. The first is an OEM licensing model where the vendor embeds the database directly into their product, distributing it to end customers as part of their governance platform. The second is a data feed model where the vendor queries our API for real-time classification, passing the cost to their customers as a usage-based fee. Both models are supported, and the choice depends on the vendor's architecture, pricing model, and customer deployment requirements.
For vendors building on-premise or VPC-deployed governance tools, the database license is the natural fit — it ships with the product, requires no external API dependency, and provides deterministic performance regardless of network conditions. For SaaS governance platforms, the API model provides always-current data without the need to manage database updates. Many vendors use both: the database for the top 10-20 million domains that account for 95%+ of agent traffic, and the API for the long tail of lesser-known domains.
The agent governance market is splitting along several axes. On the horizontal axis, vendors differentiate by whether they focus on prevention (blocking agent navigation in real-time), detection (monitoring and alerting on agent activity), or compliance (audit trails and regulatory reporting). On the vertical axis, vendors specialize in specific industries — financial services agent governance, healthcare agent compliance, government agent security — each with unique regulatory requirements and risk profiles.
Across all of these segments, the common denominator is URL categorization data. A prevention vendor needs categories to decide what to block. A detection vendor needs categories to classify what happened. A compliance vendor needs categories to map activity to regulatory requirements. This makes URL categorization the pick-and-shovel play of the agent governance gold rush — regardless of which governance vendors win market share, the underlying data layer is essential for all of them.
Enterprise security teams evaluating agent governance solutions have clear expectations shaped by their experience with traditional web security tools. They expect comprehensive domain coverage — anything less than 95% of the domains their agents encounter must have a classification. They expect multi-dimensional classification — content categories alone are insufficient; page types, reputation scores, and popularity signals are table stakes. They expect deterministic policy enforcement — probabilistic or model-based classification introduces uncertainty that compliance teams will not accept. And they expect integration with existing security infrastructure — SIEM, SOAR, GRC, and web proxy tools.
These expectations favor governance solutions built on comprehensive, pre-classified databases over solutions that attempt to classify URLs on-the-fly using AI models. The enterprise market will not tolerate the inconsistency, latency, and coverage gaps that model-based classification introduces. This is why URL categorization databases are the foundation — not a feature — of the emerging agent governance market.
The EU AI Act's requirements for high-risk AI system monitoring, transparency, and human oversight are creating compliance obligations that only governance tools can fulfill. Article 14 requires "effective human oversight," which for browser-using agents means the ability to monitor, restrict, and audit agent web access. Article 9 requires "risk management throughout the AI system lifecycle," which necessitates continuous monitoring of agent behavior — including what websites agents access and what actions they take.
In the United States, the NIST AI Risk Management Framework similarly emphasizes the importance of monitoring AI system behavior and managing risks throughout deployment. State-level AI legislation in Colorado, Connecticut, and other states is adding sector-specific requirements for AI system governance. For organizations deploying AI agents in regulated industries — financial services, healthcare, education, government — the regulatory tailwind is transforming agent governance from a nice-to-have security investment into a mandatory compliance requirement.
The governance vendors who win this market will be those who ship the fastest, integrate the deepest, and cover the broadest set of enterprise requirements. All of them need a classification data layer to build on. Organizations that need agent governance today — before the vendor market matures — can deploy URL categorization databases directly as a foundational governance layer, building custom policy engines on top. When governance platforms mature and standardize, the same database provides the data feed those platforms consume. Either way, the 102 million domain database is the starting point for governing AI agent web access in production.
Whether you are building a governance platform or deploying agent controls internally, start with the data layer that every solution needs. 102 million classified domains, one-time purchase, perpetual license.