The URL Category List You Need to Power AI Agent Guardrails

The Problem: Guardrails Without Data Are Theater

You can write the most elegant policy engine in the world, but if it has no category data to evaluate against, it is making decisions in the dark.

Most Category Lists Are Too Thin for Production

Teams building AI agent guardrails typically start with one of three approaches: a manually curated list of known-good and known-bad domains, a free URL categorization source with limited coverage, or an LLM-based classifier that evaluates each URL at runtime. All three fail at production scale for different reasons. Manual lists cover hundreds or thousands of domains when agents encounter millions. Free sources classify broad categories but lack the granularity for nuanced policy decisions. LLM classifiers add latency, cost, and non-determinism to every navigation event.

Coverage gaps: A category list covering 1 million domains sounds large until your agent hits the other 100 million active domains on the internet and gets "unknown" for every one
Granularity deficit: Knowing a domain is "News" is not enough — is it mainstream news, satire, opinion, tabloid, or state-sponsored media? Guardrail decisions need tier-level specificity
Missing page types: Content categories tell you what a site is about but not what the page does — login, checkout, admin, and settings pages require distinct policy treatment regardless of content topic
Stale data: Category lists that update annually miss the approximately 50,000 new domains registered daily, creating a growing coverage gap that agents exploit

The Solution: A Multi-Dimensional Category List at Internet Scale

A production-grade URL category list for AI agent guardrails needs three dimensions of classification. First, content categories from the IAB Content Taxonomy v3, which organizes websites into a four-tier hierarchy of 700+ categories — from broad verticals like "Technology and Computing" down to specific topics like "Artificial Intelligence > Machine Learning > Computer Vision." Second, web filtering categories that label domains by threat and sensitivity type — Malware, Phishing, Adult, Gambling, Weapons, Drugs, and 25+ additional labels used by enterprise web proxies and CASBs worldwide.

Third, page-type classifications that identify the functional purpose of each page — homepage, blog, product, pricing, documentation, login, checkout, admin, settings, and 12+ additional types. Combined, these three dimensions produce a category list that answers three questions simultaneously for every URL: what is this site about, is it dangerous, and what kind of page is the agent about to interact with? Our database pre-computes all three dimensions for 102 million domains, delivering the answer in under one millisecond.

Three Category Dimensions Every Guardrail Needs

Content categories, filtering labels, and page types working together for complete coverage

IAB Content Taxonomy (700+ Categories)

The IAB Content Taxonomy v3 is the industry standard for website content classification. Its four-tier hierarchy lets you write policy rules at exactly the granularity you need. Block all Tier 1 "Sensitive Subjects" categories with a single rule. Allow Tier 3 "Financial Services > Banking > Personal Banking" while blocking "Financial Services > Cryptocurrency." The taxonomy is maintained by the Interactive Advertising Bureau and adopted by the entire digital advertising ecosystem, ensuring consistent category definitions across vendors.

Web Filtering Labels (30+ Types)

Web filtering categories address the security and compliance dimension that IAB categories do not cover. A domain classified as IAB "Technology and Computing" could be a legitimate software company or a malware distribution platform. Web filtering labels like Malware, Phishing, Spam, Adult, Gambling, Weapons, and Drugs add the threat-assessment layer that guardrail systems require. These labels align with the same categories that Zscaler, Palo Alto, and Cisco use in their enterprise web proxies, enabling consistent policy across human and agent traffic.

Page Type Classifications (20+ Types)

Content and filtering categories operate at the domain level. Page types operate at the page level, identifying whether the agent is about to land on a blog post, a product page, a login form, a checkout flow, or an admin panel. This distinction is critical for guardrails because the same domain can host pages with vastly different risk profiles. A company's marketing blog is safe for agent reading; its employee login portal is not. Page-type classification bridges this gap with 20+ functional labels that map directly to policy actions.

Integration Code for Category-Based Guardrails

Production-ready snippets to wire category lists into your agent guardrail pipeline

Python — Multi-Dimensional Category Guardrail

import http.client
import json

class CategoryGuardrail:
    """Three-dimensional category evaluation for AI agent guardrails."""

    BLOCKED_FILTERING = [
        "Malware", "Phishing", "Spam", "Adult", "Gambling",
        "Weapons", "Drugs", "Hate Speech", "Illegal Content"
    ]
    BLOCKED_PAGE_TYPES = ["login", "checkout", "admin", "settings"]
    BLOCKED_IAB_TIER1 = ["Sensitive Subjects", "Illegal Content"]

    def __init__(self, api_key):
        self.api_key = api_key
        self.conn = http.client.HTTPSConnection(
            "www.websitecategorizationapi.com"
        )

    def classify(self, url):
        payload = (
            f"query={url}&api_key={self.api_key}"
            f"&data_type=url&expanded_categories=1"
        )
        headers = {"Content-Type": "application/x-www-form-urlencoded"}
        self.conn.request("POST",
            "/api/iab/iab_web_content_filtering.php",
            payload, headers)
        return json.loads(
            self.conn.getresponse().read().decode("utf-8")
        )

    def evaluate(self, url):
        data = self.classify(url)
        page_type = data.get("page_type", "unknown")
        iab_cats = [c[0].split("Category name: ")[1]
                    for c in data.get("iab_classification", [])]
        filter_cat = data.get("filtering_taxonomy", [[""]])[0][0]
        filter_name = filter_cat.replace("Category name: ", "")

        # Dimension 1: Web filtering threat check
        for blocked in self.BLOCKED_FILTERING:
            if blocked.lower() in filter_name.lower():
                return {"action": "block", "reason": f"Filtering: {filter_name}"}

        # Dimension 2: Page type check
        if page_type in self.BLOCKED_PAGE_TYPES:
            return {"action": "block", "reason": f"Page type: {page_type}"}

        # Dimension 3: IAB content check
        for cat in iab_cats:
            for blocked in self.BLOCKED_IAB_TIER1:
                if blocked.lower() in cat.lower():
                    return {"action": "block", "reason": f"IAB: {cat}"}

        return {"action": "allow", "categories": iab_cats,
                "page_type": page_type, "filter": filter_name}

# Usage
guardrail = CategoryGuardrail(api_key="your_api_key")
result = guardrail.evaluate("https://example.com/pricing")
print(f"Decision: {result['action']}")

JavaScript — Category List Policy Evaluator

class CategoryPolicyEvaluator {
  constructor(apiKey) {
    this.apiKey = apiKey;
    this.blockedFiltering = new Set([
      "Malware", "Phishing", "Spam", "Adult", "Gambling"
    ]);
    this.blockedPageTypes = new Set([
      "login", "checkout", "admin", "settings", "signup"
    ]);
  }

  async evaluate(url) {
    const res = await fetch(
      "https://www.websitecategorizationapi.com" +
      "/api/iab/iab_web_content_filtering.php",
      {
        method: "POST",
        headers: { "Content-Type": "application/x-www-form-urlencoded" },
        body: new URLSearchParams({
          query: url, api_key: this.apiKey,
          data_type: "url", expanded_categories: "1"
        })
      }
    );
    const data = await res.json();
    const pageType = data.page_type || "unknown";
    const filterCat = data.filtering_taxonomy?.[0]?.[0]
      ?.replace("Category name: ", "") || "";

    // Three-dimensional evaluation
    if ([...this.blockedFiltering].some(b =>
        filterCat.toLowerCase().includes(b.toLowerCase())))
      return { action: "block", dimension: "filtering", detail: filterCat };

    if (this.blockedPageTypes.has(pageType))
      return { action: "block", dimension: "page_type", detail: pageType };

    return { action: "allow", pageType, filterCategory: filterCat };
  }
}

AI Agent Database Pricing

The complete URL category list for AI agent guardrails. IAB taxonomy, web filtering labels, page types, and reputation data. One-time purchase with perpetual license.

AI Agent Database

AI Agent Domain Database 10M

$7,999

10 Million Domains with Page-Type Intelligence

One-time purchase: Perpetual license | Optional Updates: $1,599/year

10M+ Categorized Domains
IAB Taxonomies v2 & v3
20+ Page Type Labels
Web Filtering Categories
OpenPageRank Scores
Global Popularity Rankings

Get AI Agent DB 10M

Popular

AI Agent Domain Database 20M

$14,999

20 Million Domains with Full Intelligence Suite

One-time purchase: Perpetual license | Optional Updates: $2,999/year

20M+ Categorized Domains
IAB Taxonomies v2 & v3
20+ Page Type Labels
Web Filtering Categories
OpenPageRank Scores
Global & Country Rankings
Dedicated Account Manager

Get AI Agent DB 20M

Maximum Coverage

AI Agent Domain Database 50M

$24,999

50 Million Domains with Complete Intelligence Suite

One-time purchase: Perpetual license | Optional Updates: $4,999/year

50M+ Categorized Domains
IAB Taxonomies v2 & v3
20+ Page Type Labels
Web Filtering Categories
OpenPageRank Scores
Global & Country Rankings
Dedicated Account Manager

Get AI Agent DB 50M

Also available: Enterprise URL Database up to 102M domains from $2,499. View all database tiers →

Anatomy of a Production-Grade URL Category List

A URL category list for AI agent guardrails is fundamentally different from a category list built for ad targeting, content moderation, or SEO analysis. Ad-targeting lists optimize for marketing relevance — they need to know whether a user visiting a domain is likely to be interested in sports equipment. Content moderation lists optimize for safety — they need to flag domains hosting harmful material. SEO lists optimize for competitive intelligence — they need to identify domains ranking for specific keyword clusters.

Agent guardrail lists must optimize for all three dimensions simultaneously, plus a fourth dimension that none of the others address: functional page-type awareness. An agent guardrail needs to know the content topic (is this a technology site?), the safety classification (is this a malware site?), the reputation quality (is this a trustworthy site?), and the page function (is this a login page?). Any category list that covers fewer than these four dimensions leaves guardrails with blind spots that agents will inevitably encounter.

The IAB Content Taxonomy v3 Explained

The IAB Content Taxonomy v3 is maintained by the Interactive Advertising Bureau, the industry body that sets standards for digital advertising. Version 3 introduces a four-tier hierarchy that provides categorization at multiple levels of granularity. Tier 1 contains 29 top-level categories like "Technology and Computing," "Business and Finance," "Health and Fitness," and "Sensitive Subjects." Each Tier 1 category branches into Tier 2 subcategories — "Technology and Computing" branches into "Computing," "Consumer Electronics," "Robotics," and ten additional subcategories. Tier 2 branches into Tier 3, and Tier 3 into Tier 4, producing a total of over 700 distinct category paths.

For agent guardrails, the multi-tier structure enables policy rules at exactly the right granularity. A broad blocking rule at Tier 1 — "block all Sensitive Subjects domains" — catches adult content, illegal activities, and controversial topics with a single rule. A narrow allowance rule at Tier 4 — "allow Technology and Computing > Computing > Artificial Intelligence > Machine Learning" — permits the agent to research ML-specific content while blocking the broader technology category if needed. The tiered structure means you never have to choose between precision and coverage.

Web Filtering Categories: The Security Dimension

IAB content categories describe what a website is about. Web filtering categories describe what a website does — specifically, whether it poses a security or compliance threat. The web filtering taxonomy used in our database aligns with the categories deployed by major enterprise web proxies including Zscaler, Palo Alto Networks, and Cisco Umbrella. This alignment is intentional: it allows organizations to extend their existing web proxy policies to AI agent traffic without building a separate category mapping.

The filtering categories most relevant to agent guardrails include Malware (domains distributing malicious software), Phishing (domains impersonating legitimate services), Spam (domains distributing unsolicited content), Adult (domains hosting sexually explicit content), Gambling (domains hosting gambling operations), Weapons (domains selling or promoting weapons), Drugs (domains selling or promoting controlled substances), Hate Speech (domains promoting hate-based ideologies), and Illegal Content (domains hosting content that violates applicable laws). Each category represents a hard-block candidate in most enterprise agent deployments.

Page Types: The Missing Dimension

Content categories and filtering labels both operate at the domain level. They tell you what the entire site is about and whether the site poses a security threat. What they cannot tell you is what the specific page the agent is about to visit does. A single domain — say, a SaaS company — hosts a public marketing page, a documentation hub, a customer login portal, a billing checkout flow, and an internal admin dashboard. All five pages share the same domain and therefore the same IAB and filtering categories. But they have vastly different risk profiles for agent interaction.

Page-type classification closes this gap by labeling each page with its functional purpose. Our database classifies pages into 20+ types: homepage, about, contact, pricing, careers, blog, documentation, product, login, signup, checkout, settings, admin, legal, privacy policy, terms of service, API reference, support, FAQ, and forum. Each page type maps to a specific guardrail action — read, restrict, or block — enabling per-page policy enforcement on top of per-domain category rules.

Coverage Depth: Why 102 Million Domains Matters

A category list is only as useful as its coverage. If your list classifies 5 million domains but your agent encounters 20 million distinct domains in a month, 75% of its navigation events hit the "unknown" fallback — which means your guardrail is making blind decisions three-quarters of the time. Our 102 million domain database covers 99.5% of the active internet as measured by the Google Chrome User Experience Report, which means that for virtually every domain an agent will encounter during normal operation, the category list has a pre-computed classification ready.

The remaining 0.5% consists of newly registered domains (less than 7 days old), parked pages with no content, and extremely niche sites with near-zero traffic. For these edge cases, the real-time API provides on-demand classification using the same taxonomy, ensuring 100% effective coverage in practice. The combination of offline database and online API means your guardrails never return "unknown" — every URL gets a definitive classification.

Category List Maintenance and Update Cycles

The internet is not static, and neither is a production category list. Domains change ownership, alter their content focus, get compromised by malicious actors, or go offline entirely. A category list that was accurate six months ago has accumulated classification drift that grows worse over time. Our optional annual update subscription provides quarterly refreshes that re-classify all 102 million domains, incorporate newly registered domains, update web filtering threat intelligence, and refine PageRank and popularity scores based on the latest link graph and traffic data.

For organizations that require more frequent updates, the real-time API serves as a continuous update mechanism. When the offline database returns a classification for a domain, the harness can optionally verify it against the API on a sampling basis — checking 1% of navigations against the live classifier to detect classification drift. This hybrid approach maintains the sub-millisecond latency of the offline database while incorporating the freshness of the live API.

Mapping Categories to Guardrail Actions

The final step in deploying a URL category list for agent guardrails is defining the mapping from categories to actions. This mapping is the policy layer — it translates raw classification data into operational decisions. A typical enterprise mapping defines three action types: allow (the agent can navigate freely), restrict (the agent can navigate but with limited capabilities), and block (the agent cannot navigate and an audit log entry is generated).

The category-to-action mapping is defined declaratively — typically as a JSON or YAML configuration file — and evaluated deterministically by the guardrail engine. There is no model inference in the decision path, no prompt evaluation, and no probabilistic output. The URL is classified, the classification is matched against the policy mapping, and the action is executed. This deterministic pipeline ensures that the same URL always produces the same guardrail decision, which is a requirement for audit compliance and a prerequisite for enterprise adoption.

Related topics: Domain Taxonomy Provider Content Category Feed Categorized URL Feed Category-Based Blocking URL Categorization Database Firewall by Site Category Enterprise Guardrails

Integrating Category Lists with Popular Agent Frameworks

Whether you build on LangChain, CrewAI, AutoGen, or a custom agent orchestration layer, integrating the category list follows the same middleware pattern. The category database is loaded into a fast key-value store (Redis, SQLite, or in-memory dictionary). A pre-navigation hook intercepts every URL the agent intends to visit. The hook queries the category store, evaluates the result against the policy mapping, and returns an allow or block decision to the agent runtime. The entire check completes in under one millisecond, adding negligible latency to the agent's workflow while providing deterministic, auditable guardrail enforcement for every navigation event.

Get the Category List Your Guardrails Need

700+ IAB categories, 30+ filtering labels, 20+ page types, 102 million domains. The most comprehensive URL category list built specifically for AI agent guardrails. One-time purchase, perpetual license.

View AI Agent Database View 102M Enterprise Database