AI agents need structured, hierarchical domain classifications to enforce web browsing policies. The taxonomy provider you choose determines whether your agent governance is granular and auditable or coarse and brittle. Here is what to evaluate, why IAB taxonomy has become the industry standard, and how to integrate taxonomy data into your agent policy engine.
Most domain classification providers offer shallow, single-tier category lists that cannot support the nuanced policy rules that enterprise AI agent deployments require.
When teams evaluate domain taxonomy providers for AI agent policy, they quickly discover that most classification vendors were built for advertising, not security. Their taxonomies are designed to place ads next to relevant content, not to enforce granular allow/block decisions on autonomous agents. A taxonomy that groups "Banking" and "Cryptocurrency" under a single "Finance" label cannot distinguish between a corporate treasury research task and a speculative trading site visit. The result is policies that are either too permissive (allowing agents onto risky pages) or too restrictive (blocking legitimate research destinations).
The IAB (Interactive Advertising Bureau) Content Taxonomy v3 is a hierarchical, open-standard classification system with 700+ categories organized across four tiers. Unlike proprietary classification schemes, IAB taxonomy is maintained by an industry consortium, documented publicly, and adopted across thousands of platforms. When you build agent policy rules on IAB taxonomy, you are building on a stable, interoperable foundation that will not break when you switch vendors or upgrade your agent infrastructure.
Our database applies IAB taxonomy to 102 million domains and enriches each entry with web filtering categories, 20+ page-type labels, reputation scores, and popularity rankings. This means your agent policy engine can operate at any level of granularity — from broad Tier 1 rules ("block all Adult content") to surgical Tier 4 rules ("allow Financial Services > Banking > Commercial Banking but block Financial Services > Investing > Cryptocurrency") — all using a standardized, portable vocabulary.
Six critical dimensions that separate taxonomy providers built for agent governance from those built for advertising
A four-tier taxonomy with 700+ categories lets you write policies that distinguish between "Technology > Computing > Cloud Computing > Infrastructure as a Service" and a generic "Technology" label. Depth enables precision. Flat taxonomies force you to over-block or under-filter. Ask every provider: how many tiers does your taxonomy have, and how many leaf-node categories exist at the deepest level?
An agent filtering database is only useful if it covers the domains your agents actually visit. Providers with 1 to 10 million domains leave massive gaps. Our 102 million domain database covers 99.5% of the active internet measured by the Google Chrome User Experience Report. Every "Unknown" result in your policy engine is a decision you have to make without data — and in security, that default is usually "block," which kills agent productivity.
Domain-level categories answer "what is this site about?" but page-type metadata answers "what is this specific page designed to do?" A domain categorized as "Business and Finance" could serve a public investor relations page, a login portal, an admin dashboard, or a checkout flow. Page-type labels — login, checkout, settings, admin, careers, pricing, documentation — give your policy engine the granularity to block dangerous page types while allowing benign ones on the same domain.
Proprietary taxonomies create vendor lock-in. If your taxonomy provider uses custom category names like "BIZ_FIN_003" instead of the IAB standard "Business and Finance > Financial Services > Banking," then every policy rule, every audit report, and every compliance mapping is tied to that specific vendor. IAB taxonomy is an open standard — you can switch providers, merge datasets, or build custom overlays without rewriting your policy logic.
The internet changes constantly. Approximately 50,000 new domains are registered every day, and existing domains shift content regularly. A taxonomy provider that updates quarterly will have stale classifications for millions of domains. Look for providers that offer quarterly database refreshes at minimum, with a real-time API fallback for domains not yet in the offline database. Our database offers quarterly updates with optional annual refresh subscriptions.
How the taxonomy data reaches your agent stack matters. Providers that only offer API-based access introduce latency and external dependencies into your agent's decision path. A bulk database download — CSV, JSON, or SQL dump — lets you load the data into your own infrastructure and query it locally in sub-millisecond time. Our database ships as a downloadable file you can ingest into Redis, PostgreSQL, SQLite, DynamoDB, or any key-value store.
Production-ready code showing how IAB taxonomy tiers map to granular agent governance decisions
import http.client
import json
class TaxonomyPolicyEngine:
"""Maps IAB taxonomy tiers to agent policy actions."""
# Tier 1 hard blocks — entire verticals off-limits
TIER1_BLOCKED = [
"Adult Content", "Illegal Content",
"Sensitive Topics", "Arms & Ammunition"
]
# Tier 2 conditional rules — granular allow/block
TIER2_RULES = {
"Financial Services > Cryptocurrency": "block",
"Financial Services > Banking": "allow",
"Technology & Computing > Hacking": "block",
"Health & Fitness > Pharmaceuticals": "review",
}
# Page types that override category decisions
BLOCKED_PAGE_TYPES = [
"login", "checkout", "admin", "settings"
]
def __init__(self, api_key):
self.api_key = api_key
self.conn = http.client.HTTPSConnection(
"www.websitecategorizationapi.com"
)
def classify(self, url):
payload = (
f"query={url}"
f"&api_key={self.api_key}"
f"&data_type=url"
f"&expanded_categories=1"
)
headers = {
"Content-Type": "application/x-www-form-urlencoded"
}
self.conn.request(
"POST",
"/api/iab/iab_web_content_filtering.php",
payload,
headers
)
res = self.conn.getresponse()
return json.loads(res.read().decode("utf-8"))
def evaluate_policy(self, url):
data = self.classify(url)
page_type = data.get("page_type", "unknown")
# Page-type override: block dangerous pages
if page_type in self.BLOCKED_PAGE_TYPES:
return {
"action": "block",
"reason": f"Page type '{page_type}' is restricted",
"url": url
}
# Extract full taxonomy path
categories = data.get("iab_classification", [])
for cat_entry in categories:
cat_path = cat_entry[0].replace(
"Category name: ", ""
)
# Check Tier 1 blocks
tier1 = cat_path.split(" > ")[0]
if tier1 in self.TIER1_BLOCKED:
return {
"action": "block",
"reason": f"Tier 1 category blocked: {tier1}",
"url": url
}
# Check Tier 2 rules
for rule_path, action in self.TIER2_RULES.items():
if rule_path in cat_path:
return {
"action": action,
"reason": f"Tier 2 rule: {rule_path}",
"url": url
}
return {"action": "allow", "reason": "No policy match", "url": url}
# Usage
engine = TaxonomyPolicyEngine(api_key="your_api_key")
result = engine.evaluate_policy("https://example.com/trading")
print(f"Decision: {result['action']} — {result['reason']}")
class TaxonomyGuard {
constructor(apiKey) {
this.apiKey = apiKey;
this.tier1Blocked = new Set([
"Adult Content", "Illegal Content", "Malware"
]);
this.tier2Rules = new Map([
["Cryptocurrency", "block"],
["Banking", "allow"],
["Hacking", "block"],
["Pharmaceuticals", "review"]
]);
}
async evaluate(targetURL) {
const resp = await fetch(
"https://www.websitecategorizationapi.com" +
"/api/iab/iab_web_content_filtering.php",
{
method: "POST",
headers: {
"Content-Type": "application/x-www-form-urlencoded"
},
body: new URLSearchParams({
query: targetURL,
api_key: this.apiKey,
data_type: "url",
expanded_categories: "1"
})
}
);
const data = await resp.json();
// Walk each taxonomy tier
const categories = data.iab_classification || [];
for (const entry of categories) {
const path = entry[0].replace("Category name: ", "");
const tiers = path.split(" > ");
// Tier 1 hard block
if (this.tier1Blocked.has(tiers[0])) {
return { action: "block", tier: 1, match: tiers[0] };
}
// Tier 2+ granular rules
for (const [keyword, action] of this.tier2Rules) {
if (tiers.some(t => t.includes(keyword))) {
return { action, tier: 2, match: keyword };
}
}
}
return { action: "allow", tier: null, match: null };
}
}
// Usage in agent middleware
const guard = new TaxonomyGuard("your_api_key");
const decision = await guard.evaluate("https://example.com");
if (decision.action === "block") {
console.log(`Blocked at Tier ${decision.tier}: ${decision.match}`);
}
Purpose-built domain databases for AI agent filtering. Includes IAB categories, 20+ page types, reputation scores, and popularity rankings. One-time purchase with perpetual license.
10 Million Domains with Page-Type Intelligence
One-time purchase: Perpetual license | Optional Updates: $1,599/year
20 Million Domains with Full Intelligence Suite
One-time purchase: Perpetual license | Optional Updates: $2,999/year
50 Million Domains with Complete Intelligence Suite
One-time purchase: Perpetual license | Optional Updates: $4,999/year
Also available: Enterprise URL Database up to 102M domains from $2,499. View all database tiers →
Search any IAB or Web Filtering category to see how many domains are in our 102M Enterprise Database — the same taxonomy data your agent policy rules will reference.
How 102 million domains from our main Enterprise Database are distributed across IAB v3 taxonomy classifications
Spanning Tier 1 through Tier 4 classifications from our 102M Enterprise Database
Charts display domain counts for the top 50 out of 700+ categories in our 102M Enterprise Database. To check the number of domains for the remaining 650+ categories, use the Category Counter tool above .
When enterprise teams started deploying web-browsing AI agents in 2024 and 2025, they faced a fundamental vocabulary problem. Security policies need to reference categories — "block all Adult content," "allow Business and Finance," "flag Gambling for review" — but there was no universal agreement on what those category names should be, how they should be organized, or how deep the hierarchy should go. Teams that built policies on proprietary vendor taxonomies found themselves locked into a single classification provider, unable to switch without rewriting hundreds of policy rules.
The IAB Content Taxonomy solved this problem the same way it solved the equivalent problem in programmatic advertising a decade earlier: by establishing an open, hierarchical, industry-maintained standard that any vendor can implement and any buyer can adopt. Version 3 of the IAB taxonomy defines over 700 categories organized across four tiers of specificity. Tier 1 provides 28 broad verticals (Technology, Finance, Health, Education, etc.). Tier 2 breaks those into 150+ sub-verticals. Tiers 3 and 4 add increasingly specific sub-categories, enabling policy rules that can distinguish between "Business and Finance > Financial Services > Banking > Commercial Banking" and "Business and Finance > Financial Services > Investing > Cryptocurrency."
Consider a financial services company deploying an AI agent to research competitor products. A single-tier taxonomy would classify both a competitor's public marketing site and a cryptocurrency exchange under "Finance." The policy engine has no way to allow one and block the other. With IAB v3's four-tier hierarchy, the policy engine can write rules at the exact level of granularity needed: allow "Financial Services > Banking" at Tier 3, block "Financial Services > Investing > Cryptocurrency" at Tier 4, and flag "Financial Services > Insurance" for human review at Tier 3.
This depth also enables role-based access control for agents. A compliance research agent might have access to "Legal Services" Tier 2 categories, while a marketing agent is restricted to "Advertising and Marketing" sub-categories. The taxonomy hierarchy makes these permission boundaries explicit and auditable, rather than encoded in opaque prompt instructions that can be jailbroken.
A taxonomy is a vocabulary. A database is the dictionary that applies that vocabulary to every domain on the internet. You can have the most sophisticated taxonomy in the world, but if your provider only classifies 5 million domains, your policy engine will return "Unknown" for 95% of the URLs an agent encounters. Each "Unknown" result forces your policy engine into a default decision — block (which kills agent productivity) or allow (which defeats the purpose of filtering). Neither default is acceptable for production agent deployments.
Our database classifies 102 million domains using the IAB taxonomy, covering 99.5% of the active internet. This means that for virtually every URL your agent will encounter during normal operation, the taxonomy lookup returns a concrete classification that your policy engine can act on. The remaining 0.5% — newly registered domains, parked pages, and extremely niche sites — are handled by a real-time API fallback that classifies any URL on demand using the same IAB taxonomy, ensuring consistent policy enforcement regardless of the data source.
Domain-level IAB categories tell you what a website is about. Page-type labels tell you what a specific page is designed to do. This distinction is critical for agent governance because the same domain can serve pages with radically different risk profiles. A domain classified as "Technology > Computing > Cloud Computing" might serve a public documentation page (safe for any agent), a login portal (dangerous — the agent could attempt authentication), a pricing page (potentially sensitive competitive intelligence), or an admin console (critical — the agent could modify settings).
Our database includes 20+ page-type labels — homepage, about, contact, pricing, careers, login, signup, checkout, settings, admin, legal, privacy, terms, blog, documentation, API, support, FAQ, forum, and product — that let your policy engine make page-level decisions, not just domain-level decisions. A policy rule like "allow Technology domains except login and admin pages" requires both taxonomy categories and page-type metadata. Most taxonomy providers offer only the first half of that equation.
IAB taxonomy was designed for content classification. Web filtering categories were designed for security. Our database includes both, giving your agent policy engine two complementary lenses on every domain. Web filtering categories include Malware, Phishing, Spam, Adult, Gambling, Weapons, Drugs, Hacking, and Proxy/VPN — the same categories that enterprise web proxies and CASBs use to protect human users. Extending these same categories to AI agents ensures a consistent security posture across your entire organization, whether the web session is initiated by a person or an autonomous agent.
One of the most underappreciated benefits of using IAB taxonomy for agent policy is stability. Proprietary taxonomies change at the vendor's discretion — categories get renamed, merged, or deleted, breaking every policy rule that referenced them. IAB taxonomy versions are maintained by a consortium with a formal change process. When IAB v3 replaced v2, the mapping between old and new categories was published and documented, allowing teams to migrate policy rules systematically rather than scrambling to identify what broke.
Our database includes both IAB v2 and v3 classifications for every domain, allowing teams to operate on whichever version their existing policy infrastructure uses and to plan their migration at their own pace. This dual-version approach eliminates the "rip and replace" risk that comes with single-taxonomy providers.
Categories and page types answer the "what" questions. Reputation and popularity signals answer the "how trustworthy" and "how well-known" questions. Our database enriches each domain with OpenPageRank scores (domain authority on a 0-10 scale) and global popularity rankings derived from the Google Chrome User Experience Report. These signals let your policy engine add nuance to category-based decisions. A domain classified as "Business and Finance" with a PageRank of 8 and a top-10,000 global ranking is likely a major financial institution. A domain with the same category but a PageRank of 1 and no ranking data is far more likely to be a scam site, a newly registered phishing domain, or a low-quality content farm.
The first mistake is evaluating providers on taxonomy size alone. A provider with 2,000 categories sounds impressive until you realize that most of those categories have fewer than 100 classified domains. Taxonomy breadth without domain coverage is an empty vocabulary. The second mistake is choosing a provider optimized for advertising use cases. Advertising taxonomies are designed to maximize ad relevance, not to enforce security policies. They often lack the security-focused categories (Malware, Phishing, Hacking) that agent policy engines need. The third mistake is ignoring delivery format. An API-only provider introduces latency and external dependencies into your agent's decision loop. For production agent deployments, you need a local database that your policy engine can query without leaving your network.
If your organization is already running agent policies on a proprietary taxonomy, migrating to IAB is a three-step process. First, create a mapping table between your existing categories and IAB v3 categories. Most proprietary taxonomies use 30 to 50 categories, making the mapping exercise manageable. Second, run both taxonomies in parallel for 30 days, comparing policy decisions to identify mismatches. Third, cut over to IAB-based rules once the parallel run confirms equivalence. Our team provides migration support, including pre-built mapping tables for the most common proprietary taxonomies used by web filtering vendors.
Stop building agent policies on proprietary category lists. Deploy IAB taxonomy with 102 million pre-classified domains, 20+ page types, and web filtering categories — all in a single downloadable database.