Finding the Right Domain Taxonomy Provider for AI Agent Policy

The Problem: Flat Taxonomies Create Flat Policies

Most domain classification providers offer shallow, single-tier category lists that cannot support the nuanced policy rules that enterprise AI agent deployments require.

Why Generic Classification Falls Short for Agent Governance

When teams evaluate domain taxonomy providers for AI agent policy, they quickly discover that most classification vendors were built for advertising, not security. Their taxonomies are designed to place ads next to relevant content, not to enforce granular allow/block decisions on autonomous agents. A taxonomy that groups "Banking" and "Cryptocurrency" under a single "Finance" label cannot distinguish between a corporate treasury research task and a speculative trading site visit. The result is policies that are either too permissive (allowing agents onto risky pages) or too restrictive (blocking legitimate research destinations).

Single-tier categories: Flat lists with 30 to 50 categories force you into binary allow/block decisions with no granularity between "Finance" and "Cryptocurrency Trading Platforms"
No page-type metadata: Category alone does not tell you whether the agent is about to visit a login page, a checkout flow, or a public documentation site within the same domain
Inconsistent labeling: Providers with proprietary taxonomies create vendor lock-in — switching providers means rewriting every policy rule from scratch
Sparse coverage: Many providers classify only the top 1 to 5 million domains, leaving 95%+ of the internet as "Unknown" in your policy engine

The Solution: IAB Content Taxonomy as the Standard for Agent Policy

The IAB (Interactive Advertising Bureau) Content Taxonomy v3 is a hierarchical, open-standard classification system with 700+ categories organized across four tiers. Unlike proprietary classification schemes, IAB taxonomy is maintained by an industry consortium, documented publicly, and adopted across thousands of platforms. When you build agent policy rules on IAB taxonomy, you are building on a stable, interoperable foundation that will not break when you switch vendors or upgrade your agent infrastructure.

Our database applies IAB taxonomy to 102 million domains and enriches each entry with web filtering categories, 20+ page-type labels, reputation scores, and popularity rankings. This means your agent policy engine can operate at any level of granularity — from broad Tier 1 rules ("block all Adult content") to surgical Tier 4 rules ("allow Financial Services > Banking > Commercial Banking but block Financial Services > Investing > Cryptocurrency") — all using a standardized, portable vocabulary.

What to Evaluate in a Domain Taxonomy Provider

Six critical dimensions that separate taxonomy providers built for agent governance from those built for advertising

Taxonomy Depth and Granularity

A four-tier taxonomy with 700+ categories lets you write policies that distinguish between "Technology > Computing > Cloud Computing > Infrastructure as a Service" and a generic "Technology" label. Depth enables precision. Flat taxonomies force you to over-block or under-filter. Ask every provider: how many tiers does your taxonomy have, and how many leaf-node categories exist at the deepest level?

Domain Coverage Volume

An agent filtering database is only useful if it covers the domains your agents actually visit. Providers with 1 to 10 million domains leave massive gaps. Our 102 million domain database covers 99.5% of the active internet measured by the Google Chrome User Experience Report. Every "Unknown" result in your policy engine is a decision you have to make without data — and in security, that default is usually "block," which kills agent productivity.

Page-Type Metadata

Domain-level categories answer "what is this site about?" but page-type metadata answers "what is this specific page designed to do?" A domain categorized as "Business and Finance" could serve a public investor relations page, a login portal, an admin dashboard, or a checkout flow. Page-type labels — login, checkout, settings, admin, careers, pricing, documentation — give your policy engine the granularity to block dangerous page types while allowing benign ones on the same domain.

Open vs. Proprietary Taxonomy

Proprietary taxonomies create vendor lock-in. If your taxonomy provider uses custom category names like "BIZ_FIN_003" instead of the IAB standard "Business and Finance > Financial Services > Banking," then every policy rule, every audit report, and every compliance mapping is tied to that specific vendor. IAB taxonomy is an open standard — you can switch providers, merge datasets, or build custom overlays without rewriting your policy logic.

Update Frequency and Freshness

The internet changes constantly. Approximately 50,000 new domains are registered every day, and existing domains shift content regularly. A taxonomy provider that updates quarterly will have stale classifications for millions of domains. Look for providers that offer quarterly database refreshes at minimum, with a real-time API fallback for domains not yet in the offline database. Our database offers quarterly updates with optional annual refresh subscriptions.

Delivery Format and Integration Ease

How the taxonomy data reaches your agent stack matters. Providers that only offer API-based access introduce latency and external dependencies into your agent's decision path. A bulk database download — CSV, JSON, or SQL dump — lets you load the data into your own infrastructure and query it locally in sub-millisecond time. Our database ships as a downloadable file you can ingest into Redis, PostgreSQL, SQLite, DynamoDB, or any key-value store.

Integrating Taxonomy Data into Agent Policy Rules

Production-ready code showing how IAB taxonomy tiers map to granular agent governance decisions

Python — Tiered Taxonomy Policy Engine

import http.client
import json

class TaxonomyPolicyEngine:
    """Maps IAB taxonomy tiers to agent policy actions."""

    # Tier 1 hard blocks — entire verticals off-limits
    TIER1_BLOCKED = [
        "Adult Content", "Illegal Content",
        "Sensitive Topics", "Arms & Ammunition"
    ]

    # Tier 2 conditional rules — granular allow/block
    TIER2_RULES = {
        "Financial Services > Cryptocurrency": "block",
        "Financial Services > Banking": "allow",
        "Technology & Computing > Hacking": "block",
        "Health & Fitness > Pharmaceuticals": "review",
    }

    # Page types that override category decisions
    BLOCKED_PAGE_TYPES = [
        "login", "checkout", "admin", "settings"
    ]

    def __init__(self, api_key):
        self.api_key = api_key
        self.conn = http.client.HTTPSConnection(
            "www.websitecategorizationapi.com"
        )

    def classify(self, url):
        payload = (
            f"query={url}"
            f"&api_key={self.api_key}"
            f"&data_type=url"
            f"&expanded_categories=1"
        )
        headers = {
            "Content-Type": "application/x-www-form-urlencoded"
        }
        self.conn.request(
            "POST",
            "/api/iab/iab_web_content_filtering.php",
            payload,
            headers
        )
        res = self.conn.getresponse()
        return json.loads(res.read().decode("utf-8"))

    def evaluate_policy(self, url):
        data = self.classify(url)
        page_type = data.get("page_type", "unknown")

        # Page-type override: block dangerous pages
        if page_type in self.BLOCKED_PAGE_TYPES:
            return {
                "action": "block",
                "reason": f"Page type '{page_type}' is restricted",
                "url": url
            }

        # Extract full taxonomy path
        categories = data.get("iab_classification", [])
        for cat_entry in categories:
            cat_path = cat_entry[0].replace(
                "Category name: ", ""
            )

            # Check Tier 1 blocks
            tier1 = cat_path.split(" > ")[0]
            if tier1 in self.TIER1_BLOCKED:
                return {
                    "action": "block",
                    "reason": f"Tier 1 category blocked: {tier1}",
                    "url": url
                }

            # Check Tier 2 rules
            for rule_path, action in self.TIER2_RULES.items():
                if rule_path in cat_path:
                    return {
                        "action": action,
                        "reason": f"Tier 2 rule: {rule_path}",
                        "url": url
                    }

        return {"action": "allow", "reason": "No policy match", "url": url}

# Usage
engine = TaxonomyPolicyEngine(api_key="your_api_key")
result = engine.evaluate_policy("https://example.com/trading")
print(f"Decision: {result['action']} — {result['reason']}")

JavaScript — Taxonomy-Aware Navigation Guard

class TaxonomyGuard {
  constructor(apiKey) {
    this.apiKey = apiKey;
    this.tier1Blocked = new Set([
      "Adult Content", "Illegal Content", "Malware"
    ]);
    this.tier2Rules = new Map([
      ["Cryptocurrency", "block"],
      ["Banking", "allow"],
      ["Hacking", "block"],
      ["Pharmaceuticals", "review"]
    ]);
  }

  async evaluate(targetURL) {
    const resp = await fetch(
      "https://www.websitecategorizationapi.com" +
      "/api/iab/iab_web_content_filtering.php",
      {
        method: "POST",
        headers: {
          "Content-Type": "application/x-www-form-urlencoded"
        },
        body: new URLSearchParams({
          query: targetURL,
          api_key: this.apiKey,
          data_type: "url",
          expanded_categories: "1"
        })
      }
    );
    const data = await resp.json();

    // Walk each taxonomy tier
    const categories = data.iab_classification || [];
    for (const entry of categories) {
      const path = entry[0].replace("Category name: ", "");
      const tiers = path.split(" > ");

      // Tier 1 hard block
      if (this.tier1Blocked.has(tiers[0])) {
        return { action: "block", tier: 1, match: tiers[0] };
      }

      // Tier 2+ granular rules
      for (const [keyword, action] of this.tier2Rules) {
        if (tiers.some(t => t.includes(keyword))) {
          return { action, tier: 2, match: keyword };
        }
      }
    }
    return { action: "allow", tier: null, match: null };
  }
}

// Usage in agent middleware
const guard = new TaxonomyGuard("your_api_key");
const decision = await guard.evaluate("https://example.com");
if (decision.action === "block") {
  console.log(`Blocked at Tier ${decision.tier}: ${decision.match}`);
}

AI Agent Database Pricing

Purpose-built domain databases for AI agent filtering. Includes IAB categories, 20+ page types, reputation scores, and popularity rankings. One-time purchase with perpetual license.

AI Agent Database

AI Agent Domain Database 10M

$7,999

10 Million Domains with Page-Type Intelligence

One-time purchase: Perpetual license | Optional Updates: $1,599/year

10M+ Categorized Domains
IAB Taxonomies v2 & v3
20+ Page Type Labels
Web Filtering Categories
OpenPageRank Scores
Global Popularity Rankings

Get AI Agent DB 10M

Popular

AI Agent Domain Database 20M

$14,999

20 Million Domains with Full Intelligence Suite

One-time purchase: Perpetual license | Optional Updates: $2,999/year

20M+ Categorized Domains
IAB Taxonomies v2 & v3
20+ Page Type Labels
Web Filtering Categories
OpenPageRank Scores
Global & Country Rankings
Dedicated Account Manager

Get AI Agent DB 20M

Maximum Coverage

AI Agent Domain Database 50M

$24,999

50 Million Domains with Complete Intelligence Suite

One-time purchase: Perpetual license | Optional Updates: $4,999/year

50M+ Categorized Domains
IAB Taxonomies v2 & v3
20+ Page Type Labels
Web Filtering Categories
OpenPageRank Scores
Global & Country Rankings
Dedicated Account Manager

Get AI Agent DB 50M

Also available: Enterprise URL Database up to 102M domains from $2,499. View all database tiers →

Why IAB Taxonomy Is the Standard for AI Agent Policy

When enterprise teams started deploying web-browsing AI agents in 2024 and 2025, they faced a fundamental vocabulary problem. Security policies need to reference categories — "block all Adult content," "allow Business and Finance," "flag Gambling for review" — but there was no universal agreement on what those category names should be, how they should be organized, or how deep the hierarchy should go. Teams that built policies on proprietary vendor taxonomies found themselves locked into a single classification provider, unable to switch without rewriting hundreds of policy rules.

The IAB Content Taxonomy solved this problem the same way it solved the equivalent problem in programmatic advertising a decade earlier: by establishing an open, hierarchical, industry-maintained standard that any vendor can implement and any buyer can adopt. Version 3 of the IAB taxonomy defines over 700 categories organized across four tiers of specificity. Tier 1 provides 28 broad verticals (Technology, Finance, Health, Education, etc.). Tier 2 breaks those into 150+ sub-verticals. Tiers 3 and 4 add increasingly specific sub-categories, enabling policy rules that can distinguish between "Business and Finance > Financial Services > Banking > Commercial Banking" and "Business and Finance > Financial Services > Investing > Cryptocurrency."

How Taxonomy Depth Translates to Policy Precision

Consider a financial services company deploying an AI agent to research competitor products. A single-tier taxonomy would classify both a competitor's public marketing site and a cryptocurrency exchange under "Finance." The policy engine has no way to allow one and block the other. With IAB v3's four-tier hierarchy, the policy engine can write rules at the exact level of granularity needed: allow "Financial Services > Banking" at Tier 3, block "Financial Services > Investing > Cryptocurrency" at Tier 4, and flag "Financial Services > Insurance" for human review at Tier 3.

This depth also enables role-based access control for agents. A compliance research agent might have access to "Legal Services" Tier 2 categories, while a marketing agent is restricted to "Advertising and Marketing" sub-categories. The taxonomy hierarchy makes these permission boundaries explicit and auditable, rather than encoded in opaque prompt instructions that can be jailbroken.

The Coverage Problem: Why Volume of Classified Domains Matters

A taxonomy is a vocabulary. A database is the dictionary that applies that vocabulary to every domain on the internet. You can have the most sophisticated taxonomy in the world, but if your provider only classifies 5 million domains, your policy engine will return "Unknown" for 95% of the URLs an agent encounters. Each "Unknown" result forces your policy engine into a default decision — block (which kills agent productivity) or allow (which defeats the purpose of filtering). Neither default is acceptable for production agent deployments.

Our database classifies 102 million domains using the IAB taxonomy, covering 99.5% of the active internet. This means that for virtually every URL your agent will encounter during normal operation, the taxonomy lookup returns a concrete classification that your policy engine can act on. The remaining 0.5% — newly registered domains, parked pages, and extremely niche sites — are handled by a real-time API fallback that classifies any URL on demand using the same IAB taxonomy, ensuring consistent policy enforcement regardless of the data source.

Page-Type Labels: The Missing Layer in Most Taxonomy Providers

Domain-level IAB categories tell you what a website is about. Page-type labels tell you what a specific page is designed to do. This distinction is critical for agent governance because the same domain can serve pages with radically different risk profiles. A domain classified as "Technology > Computing > Cloud Computing" might serve a public documentation page (safe for any agent), a login portal (dangerous — the agent could attempt authentication), a pricing page (potentially sensitive competitive intelligence), or an admin console (critical — the agent could modify settings).

Our database includes 20+ page-type labels — homepage, about, contact, pricing, careers, login, signup, checkout, settings, admin, legal, privacy, terms, blog, documentation, API, support, FAQ, forum, and product — that let your policy engine make page-level decisions, not just domain-level decisions. A policy rule like "allow Technology domains except login and admin pages" requires both taxonomy categories and page-type metadata. Most taxonomy providers offer only the first half of that equation.

Web Filtering Categories: The Security Overlay

IAB taxonomy was designed for content classification. Web filtering categories were designed for security. Our database includes both, giving your agent policy engine two complementary lenses on every domain. Web filtering categories include Malware, Phishing, Spam, Adult, Gambling, Weapons, Drugs, Hacking, and Proxy/VPN — the same categories that enterprise web proxies and CASBs use to protect human users. Extending these same categories to AI agents ensures a consistent security posture across your entire organization, whether the web session is initiated by a person or an autonomous agent.

Building Policy Rules on a Stable Taxonomy Foundation

One of the most underappreciated benefits of using IAB taxonomy for agent policy is stability. Proprietary taxonomies change at the vendor's discretion — categories get renamed, merged, or deleted, breaking every policy rule that referenced them. IAB taxonomy versions are maintained by a consortium with a formal change process. When IAB v3 replaced v2, the mapping between old and new categories was published and documented, allowing teams to migrate policy rules systematically rather than scrambling to identify what broke.

Our database includes both IAB v2 and v3 classifications for every domain, allowing teams to operate on whichever version their existing policy infrastructure uses and to plan their migration at their own pace. This dual-version approach eliminates the "rip and replace" risk that comes with single-taxonomy providers.

Reputation and Popularity Signals: Context Beyond Categories

Categories and page types answer the "what" questions. Reputation and popularity signals answer the "how trustworthy" and "how well-known" questions. Our database enriches each domain with OpenPageRank scores (domain authority on a 0-10 scale) and global popularity rankings derived from the Google Chrome User Experience Report. These signals let your policy engine add nuance to category-based decisions. A domain classified as "Business and Finance" with a PageRank of 8 and a top-10,000 global ranking is likely a major financial institution. A domain with the same category but a PageRank of 1 and no ranking data is far more likely to be a scam site, a newly registered phishing domain, or a low-quality content farm.

Related topics: Policy Engine for Agent Browsing Content Category Feed for Governance URL Categorization Database for Filtering Page Type Classification API Firewall by Site Category Site Reputation Database for LLM Agents

Common Mistakes When Selecting a Taxonomy Provider

The first mistake is evaluating providers on taxonomy size alone. A provider with 2,000 categories sounds impressive until you realize that most of those categories have fewer than 100 classified domains. Taxonomy breadth without domain coverage is an empty vocabulary. The second mistake is choosing a provider optimized for advertising use cases. Advertising taxonomies are designed to maximize ad relevance, not to enforce security policies. They often lack the security-focused categories (Malware, Phishing, Hacking) that agent policy engines need. The third mistake is ignoring delivery format. An API-only provider introduces latency and external dependencies into your agent's decision loop. For production agent deployments, you need a local database that your policy engine can query without leaving your network.

How to Migrate from a Proprietary Taxonomy to IAB

If your organization is already running agent policies on a proprietary taxonomy, migrating to IAB is a three-step process. First, create a mapping table between your existing categories and IAB v3 categories. Most proprietary taxonomies use 30 to 50 categories, making the mapping exercise manageable. Second, run both taxonomies in parallel for 30 days, comparing policy decisions to identify mismatches. Third, cut over to IAB-based rules once the parallel run confirms equivalence. Our team provides migration support, including pre-built mapping tables for the most common proprietary taxonomies used by web filtering vendors.

Build Agent Policy on an Industry-Standard Taxonomy

Stop building agent policies on proprietary category lists. Deploy IAB taxonomy with 102 million pre-classified domains, 20+ page types, and web filtering categories — all in a single downloadable database.

View AI Agent Database View 102M Enterprise Database

Finding the Right Domain Taxonomy Provider for AI Agent Policy

The Problem: Flat Taxonomies Create Flat Policies

Why Generic Classification Falls Short for Agent Governance

The Solution: IAB Content Taxonomy as the Standard for Agent Policy

Taxonomy Hierarchy Visualization

What to Evaluate in a Domain Taxonomy Provider

Taxonomy Depth and Granularity

Domain Coverage Volume

Page-Type Metadata

Open vs. Proprietary Taxonomy

Update Frequency and Freshness

Delivery Format and Integration Ease

Provider Evaluation Dimensions

Over 10 Billion Links Individually Analyzed

Integrating Taxonomy Data into Agent Policy Rules

Python — Tiered Taxonomy Policy Engine

JavaScript — Taxonomy-Aware Navigation Guard

Taxonomy Classification Pipeline

Why Pre-Classified URLs for 102M Domains
Changes Everything for AI Agents

Orders of Magnitude Faster

Dramatically Lower Cost

Zero Hallucination Risk

AI Agent Database Pricing

How Many Domains in Each Category?

Domain Distribution by Category in Our 102M Enterprise Database

Top 50 IAB v3 Categories

Four-Tier Taxonomy Cascade

Why IAB Taxonomy Is the Standard for AI Agent Policy

How Taxonomy Depth Translates to Policy Precision

The Coverage Problem: Why Volume of Classified Domains Matters

Page-Type Labels: The Missing Layer in Most Taxonomy Providers

Web Filtering Categories: The Security Overlay

Building Policy Rules on a Stable Taxonomy Foundation

Reputation and Popularity Signals: Context Beyond Categories

Common Mistakes When Selecting a Taxonomy Provider

How to Migrate from a Proprietary Taxonomy to IAB

Policy Decision Network

Build Agent Policy on an Industry-Standard Taxonomy

You are on the list!

Finding the Right Domain Taxonomy Provider for AI Agent Policy

The Problem: Flat Taxonomies Create Flat Policies

Why Generic Classification Falls Short for Agent Governance

The Solution: IAB Content Taxonomy as the Standard for Agent Policy

Taxonomy Hierarchy Visualization

What to Evaluate in a Domain Taxonomy Provider

Taxonomy Depth and Granularity

Domain Coverage Volume

Page-Type Metadata

Open vs. Proprietary Taxonomy

Update Frequency and Freshness

Delivery Format and Integration Ease

Provider Evaluation Dimensions

Over 10 Billion Links Individually Analyzed

Integrating Taxonomy Data into Agent Policy Rules

Python — Tiered Taxonomy Policy Engine

JavaScript — Taxonomy-Aware Navigation Guard

Taxonomy Classification Pipeline

Why Pre-Classified URLs for 102M Domains Changes Everything for AI Agents

Orders of Magnitude Faster

Dramatically Lower Cost

Zero Hallucination Risk

AI Agent Database Pricing

How Many Domains in Each Category?

Domain Distribution by Category in Our 102M Enterprise Database

Top 50 IAB v3 Categories

Four-Tier Taxonomy Cascade

Why IAB Taxonomy Is the Standard for AI Agent Policy

How Taxonomy Depth Translates to Policy Precision

The Coverage Problem: Why Volume of Classified Domains Matters

Page-Type Labels: The Missing Layer in Most Taxonomy Providers

Web Filtering Categories: The Security Overlay

Building Policy Rules on a Stable Taxonomy Foundation

Reputation and Popularity Signals: Context Beyond Categories

Common Mistakes When Selecting a Taxonomy Provider

How to Migrate from a Proprietary Taxonomy to IAB

Policy Decision Network

Build Agent Policy on an Industry-Standard Taxonomy

You are on the list!

Why Pre-Classified URLs for 102M Domains
Changes Everything for AI Agents