A guardrail without a category list is just a suggestion. When your AI agent encounters a URL, it needs a definitive answer: what kind of site is this? Our category list spans 700+ IAB content categories, 30+ web filtering labels, and 20+ page-type classifications — covering 102 million domains so your guardrails have the data density to make real-time enforcement decisions.
You can write the most elegant policy engine in the world, but if it has no category data to evaluate against, it is making decisions in the dark.
Teams building AI agent guardrails typically start with one of three approaches: a manually curated list of known-good and known-bad domains, a free URL categorization source with limited coverage, or an LLM-based classifier that evaluates each URL at runtime. All three fail at production scale for different reasons. Manual lists cover hundreds or thousands of domains when agents encounter millions. Free sources classify broad categories but lack the granularity for nuanced policy decisions. LLM classifiers add latency, cost, and non-determinism to every navigation event.
A production-grade URL category list for AI agent guardrails needs three dimensions of classification. First, content categories from the IAB Content Taxonomy v3, which organizes websites into a four-tier hierarchy of 700+ categories — from broad verticals like "Technology and Computing" down to specific topics like "Artificial Intelligence > Machine Learning > Computer Vision." Second, web filtering categories that label domains by threat and sensitivity type — Malware, Phishing, Adult, Gambling, Weapons, Drugs, and 25+ additional labels used by enterprise web proxies and CASBs worldwide.
Third, page-type classifications that identify the functional purpose of each page — homepage, blog, product, pricing, documentation, login, checkout, admin, settings, and 12+ additional types. Combined, these three dimensions produce a category list that answers three questions simultaneously for every URL: what is this site about, is it dangerous, and what kind of page is the agent about to interact with? Our database pre-computes all three dimensions for 102 million domains, delivering the answer in under one millisecond.
Content categories, filtering labels, and page types working together for complete coverage
The IAB Content Taxonomy v3 is the industry standard for website content classification. Its four-tier hierarchy lets you write policy rules at exactly the granularity you need. Block all Tier 1 "Sensitive Subjects" categories with a single rule. Allow Tier 3 "Financial Services > Banking > Personal Banking" while blocking "Financial Services > Cryptocurrency." The taxonomy is maintained by the Interactive Advertising Bureau and adopted by the entire digital advertising ecosystem, ensuring consistent category definitions across vendors.
Web filtering categories address the security and compliance dimension that IAB categories do not cover. A domain classified as IAB "Technology and Computing" could be a legitimate software company or a malware distribution platform. Web filtering labels like Malware, Phishing, Spam, Adult, Gambling, Weapons, and Drugs add the threat-assessment layer that guardrail systems require. These labels align with the same categories that Zscaler, Palo Alto, and Cisco use in their enterprise web proxies, enabling consistent policy across human and agent traffic.
Content and filtering categories operate at the domain level. Page types operate at the page level, identifying whether the agent is about to land on a blog post, a product page, a login form, a checkout flow, or an admin panel. This distinction is critical for guardrails because the same domain can host pages with vastly different risk profiles. A company's marketing blog is safe for agent reading; its employee login portal is not. Page-type classification bridges this gap with 20+ functional labels that map directly to policy actions.
Production-ready snippets to wire category lists into your agent guardrail pipeline
import http.client
import json
class CategoryGuardrail:
"""Three-dimensional category evaluation for AI agent guardrails."""
BLOCKED_FILTERING = [
"Malware", "Phishing", "Spam", "Adult", "Gambling",
"Weapons", "Drugs", "Hate Speech", "Illegal Content"
]
BLOCKED_PAGE_TYPES = ["login", "checkout", "admin", "settings"]
BLOCKED_IAB_TIER1 = ["Sensitive Subjects", "Illegal Content"]
def __init__(self, api_key):
self.api_key = api_key
self.conn = http.client.HTTPSConnection(
"www.websitecategorizationapi.com"
)
def classify(self, url):
payload = (
f"query={url}&api_key={self.api_key}"
f"&data_type=url&expanded_categories=1"
)
headers = {"Content-Type": "application/x-www-form-urlencoded"}
self.conn.request("POST",
"/api/iab/iab_web_content_filtering.php",
payload, headers)
return json.loads(
self.conn.getresponse().read().decode("utf-8")
)
def evaluate(self, url):
data = self.classify(url)
page_type = data.get("page_type", "unknown")
iab_cats = [c[0].split("Category name: ")[1]
for c in data.get("iab_classification", [])]
filter_cat = data.get("filtering_taxonomy", [[""]])[0][0]
filter_name = filter_cat.replace("Category name: ", "")
# Dimension 1: Web filtering threat check
for blocked in self.BLOCKED_FILTERING:
if blocked.lower() in filter_name.lower():
return {"action": "block", "reason": f"Filtering: {filter_name}"}
# Dimension 2: Page type check
if page_type in self.BLOCKED_PAGE_TYPES:
return {"action": "block", "reason": f"Page type: {page_type}"}
# Dimension 3: IAB content check
for cat in iab_cats:
for blocked in self.BLOCKED_IAB_TIER1:
if blocked.lower() in cat.lower():
return {"action": "block", "reason": f"IAB: {cat}"}
return {"action": "allow", "categories": iab_cats,
"page_type": page_type, "filter": filter_name}
# Usage
guardrail = CategoryGuardrail(api_key="your_api_key")
result = guardrail.evaluate("https://example.com/pricing")
print(f"Decision: {result['action']}")
class CategoryPolicyEvaluator {
constructor(apiKey) {
this.apiKey = apiKey;
this.blockedFiltering = new Set([
"Malware", "Phishing", "Spam", "Adult", "Gambling"
]);
this.blockedPageTypes = new Set([
"login", "checkout", "admin", "settings", "signup"
]);
}
async evaluate(url) {
const res = await fetch(
"https://www.websitecategorizationapi.com" +
"/api/iab/iab_web_content_filtering.php",
{
method: "POST",
headers: { "Content-Type": "application/x-www-form-urlencoded" },
body: new URLSearchParams({
query: url, api_key: this.apiKey,
data_type: "url", expanded_categories: "1"
})
}
);
const data = await res.json();
const pageType = data.page_type || "unknown";
const filterCat = data.filtering_taxonomy?.[0]?.[0]
?.replace("Category name: ", "") || "";
// Three-dimensional evaluation
if ([...this.blockedFiltering].some(b =>
filterCat.toLowerCase().includes(b.toLowerCase())))
return { action: "block", dimension: "filtering", detail: filterCat };
if (this.blockedPageTypes.has(pageType))
return { action: "block", dimension: "page_type", detail: pageType };
return { action: "allow", pageType, filterCategory: filterCat };
}
}
The complete URL category list for AI agent guardrails. IAB taxonomy, web filtering labels, page types, and reputation data. One-time purchase with perpetual license.
10 Million Domains with Page-Type Intelligence
One-time purchase: Perpetual license | Optional Updates: $1,599/year
20 Million Domains with Full Intelligence Suite
One-time purchase: Perpetual license | Optional Updates: $2,999/year
50 Million Domains with Complete Intelligence Suite
One-time purchase: Perpetual license | Optional Updates: $4,999/year
Also available: Enterprise URL Database up to 102M domains from $2,499. View all database tiers →
Explore the depth of our URL category list — search any IAB or Web Filtering category to see domain counts, PageRank distributions, and popularity tiers.
How 102 million domains from our main Enterprise Database are distributed across IAB v3 taxonomy classifications
Spanning Tier 1 through Tier 4 classifications from our 102M Enterprise Database
Charts display domain counts for the top 50 out of 700+ categories in our 102M Enterprise Database. To check the number of domains for the remaining 650+ categories, use the Category Counter tool above .
A URL category list for AI agent guardrails is fundamentally different from a category list built for ad targeting, content moderation, or SEO analysis. Ad-targeting lists optimize for marketing relevance — they need to know whether a user visiting a domain is likely to be interested in sports equipment. Content moderation lists optimize for safety — they need to flag domains hosting harmful material. SEO lists optimize for competitive intelligence — they need to identify domains ranking for specific keyword clusters.
Agent guardrail lists must optimize for all three dimensions simultaneously, plus a fourth dimension that none of the others address: functional page-type awareness. An agent guardrail needs to know the content topic (is this a technology site?), the safety classification (is this a malware site?), the reputation quality (is this a trustworthy site?), and the page function (is this a login page?). Any category list that covers fewer than these four dimensions leaves guardrails with blind spots that agents will inevitably encounter.
The IAB Content Taxonomy v3 is maintained by the Interactive Advertising Bureau, the industry body that sets standards for digital advertising. Version 3 introduces a four-tier hierarchy that provides categorization at multiple levels of granularity. Tier 1 contains 29 top-level categories like "Technology and Computing," "Business and Finance," "Health and Fitness," and "Sensitive Subjects." Each Tier 1 category branches into Tier 2 subcategories — "Technology and Computing" branches into "Computing," "Consumer Electronics," "Robotics," and ten additional subcategories. Tier 2 branches into Tier 3, and Tier 3 into Tier 4, producing a total of over 700 distinct category paths.
For agent guardrails, the multi-tier structure enables policy rules at exactly the right granularity. A broad blocking rule at Tier 1 — "block all Sensitive Subjects domains" — catches adult content, illegal activities, and controversial topics with a single rule. A narrow allowance rule at Tier 4 — "allow Technology and Computing > Computing > Artificial Intelligence > Machine Learning" — permits the agent to research ML-specific content while blocking the broader technology category if needed. The tiered structure means you never have to choose between precision and coverage.
IAB content categories describe what a website is about. Web filtering categories describe what a website does — specifically, whether it poses a security or compliance threat. The web filtering taxonomy used in our database aligns with the categories deployed by major enterprise web proxies including Zscaler, Palo Alto Networks, and Cisco Umbrella. This alignment is intentional: it allows organizations to extend their existing web proxy policies to AI agent traffic without building a separate category mapping.
The filtering categories most relevant to agent guardrails include Malware (domains distributing malicious software), Phishing (domains impersonating legitimate services), Spam (domains distributing unsolicited content), Adult (domains hosting sexually explicit content), Gambling (domains hosting gambling operations), Weapons (domains selling or promoting weapons), Drugs (domains selling or promoting controlled substances), Hate Speech (domains promoting hate-based ideologies), and Illegal Content (domains hosting content that violates applicable laws). Each category represents a hard-block candidate in most enterprise agent deployments.
Content categories and filtering labels both operate at the domain level. They tell you what the entire site is about and whether the site poses a security threat. What they cannot tell you is what the specific page the agent is about to visit does. A single domain — say, a SaaS company — hosts a public marketing page, a documentation hub, a customer login portal, a billing checkout flow, and an internal admin dashboard. All five pages share the same domain and therefore the same IAB and filtering categories. But they have vastly different risk profiles for agent interaction.
Page-type classification closes this gap by labeling each page with its functional purpose. Our database classifies pages into 20+ types: homepage, about, contact, pricing, careers, blog, documentation, product, login, signup, checkout, settings, admin, legal, privacy policy, terms of service, API reference, support, FAQ, and forum. Each page type maps to a specific guardrail action — read, restrict, or block — enabling per-page policy enforcement on top of per-domain category rules.
A category list is only as useful as its coverage. If your list classifies 5 million domains but your agent encounters 20 million distinct domains in a month, 75% of its navigation events hit the "unknown" fallback — which means your guardrail is making blind decisions three-quarters of the time. Our 102 million domain database covers 99.5% of the active internet as measured by the Google Chrome User Experience Report, which means that for virtually every domain an agent will encounter during normal operation, the category list has a pre-computed classification ready.
The remaining 0.5% consists of newly registered domains (less than 7 days old), parked pages with no content, and extremely niche sites with near-zero traffic. For these edge cases, the real-time API provides on-demand classification using the same taxonomy, ensuring 100% effective coverage in practice. The combination of offline database and online API means your guardrails never return "unknown" — every URL gets a definitive classification.
The internet is not static, and neither is a production category list. Domains change ownership, alter their content focus, get compromised by malicious actors, or go offline entirely. A category list that was accurate six months ago has accumulated classification drift that grows worse over time. Our optional annual update subscription provides quarterly refreshes that re-classify all 102 million domains, incorporate newly registered domains, update web filtering threat intelligence, and refine PageRank and popularity scores based on the latest link graph and traffic data.
For organizations that require more frequent updates, the real-time API serves as a continuous update mechanism. When the offline database returns a classification for a domain, the harness can optionally verify it against the API on a sampling basis — checking 1% of navigations against the live classifier to detect classification drift. This hybrid approach maintains the sub-millisecond latency of the offline database while incorporating the freshness of the live API.
The final step in deploying a URL category list for agent guardrails is defining the mapping from categories to actions. This mapping is the policy layer — it translates raw classification data into operational decisions. A typical enterprise mapping defines three action types: allow (the agent can navigate freely), restrict (the agent can navigate but with limited capabilities), and block (the agent cannot navigate and an audit log entry is generated).
The category-to-action mapping is defined declaratively — typically as a JSON or YAML configuration file — and evaluated deterministically by the guardrail engine. There is no model inference in the decision path, no prompt evaluation, and no probabilistic output. The URL is classified, the classification is matched against the policy mapping, and the action is executed. This deterministic pipeline ensures that the same URL always produces the same guardrail decision, which is a requirement for audit compliance and a prerequisite for enterprise adoption.
Whether you build on LangChain, CrewAI, AutoGen, or a custom agent orchestration layer, integrating the category list follows the same middleware pattern. The category database is loaded into a fast key-value store (Redis, SQLite, or in-memory dictionary). A pre-navigation hook intercepts every URL the agent intends to visit. The hook queries the category store, evaluates the result against the policy mapping, and returns an allow or block decision to the agent runtime. The entire check completes in under one millisecond, adding negligible latency to the agent's workflow while providing deterministic, auditable guardrail enforcement for every navigation event.
700+ IAB categories, 30+ filtering labels, 20+ page types, 102 million domains. The most comprehensive URL category list built specifically for AI agent guardrails. One-time purchase, perpetual license.