Autonomous AI agents are making thousands of web navigation decisions per hour without any structured awareness of what they are accessing. A streaming content category feed powered by our 102 million domain database gives your governance layer the real-time intelligence it needs to monitor, filter, and control every agent action before it reaches the open internet.
Most agent governance frameworks assume you already have a reliable stream of category intelligence. In reality, most teams are flying blind.
Enterprise teams often build agent governance around static configuration files or manually curated blocklists. These lists go stale within days. New domains appear at a rate of 50,000 per day, existing domains change content, and entire industries shift categories during mergers and acquisitions. A blocklist you created last quarter might still reference domains that have been parked, sold, or repurposed entirely.
Instead of treating domain categorization as a one-time data export, treat it as a continuous feed. Our 102 million domain database becomes the upstream source for your agent governance pipeline. You subscribe to category updates, ingest them into your policy engine, and every agent in your fleet instantly inherits the latest classification intelligence. When a domain changes from "News" to "Gambling" after an acquisition, your feed reflects that change in the next update cycle.
The feed delivers IAB v3 taxonomy categories, web filtering classifications, page-type labels, reputation scores, and popularity rankings — all the signals your policy engine needs to make deterministic allow/block/review decisions. No model inference required, no probabilistic guessing, no stale data.
Three feed architectures that transform static data into a living governance layer
Download the full 102M database and ingest it into your local data store — Redis, PostgreSQL, Elasticsearch, or a cloud warehouse. Schedule quarterly refresh downloads to keep your category data current. This approach is ideal for air-gapped environments or teams that need complete control over data residency.
After the initial bulk load, receive incremental updates that contain only the domains whose categories have changed since your last sync. Delta feeds reduce bandwidth and processing overhead by 95%, letting you maintain a fresh local copy without re-ingesting 102 million records each cycle.
For domains not yet in your local feed, the real-time API classifies any URL on demand and returns the same structured response — IAB categories, page types, reputation scores — that you receive in the feed. Use this as a fallback layer to achieve 100% coverage beyond the 102M base.
Production-ready snippets to ingest category feeds into your agent governance pipeline
import http.client
import json
class CategoryFeedConsumer:
"""Consumes category data from the 102M database and
maintains a local governance cache for agent decisions."""
GOVERNANCE_ACTIONS = {
"Adult": "block",
"Malware": "block",
"Illegal Content": "block",
"Gambling": "review",
"Social Networking": "monitor",
}
def __init__(self, api_key):
self.api_key = api_key
self.category_cache = {}
self.conn = http.client.HTTPSConnection(
"www.websitecategorizationapi.com"
)
def fetch_category(self, domain):
payload = (
f"query={domain}"
f"&api_key={self.api_key}"
f"&data_type=url"
f"&expanded_categories=1"
)
headers = {
"Content-Type": "application/x-www-form-urlencoded"
}
self.conn.request(
"POST",
"/api/iab/iab_web_content_filtering.php",
payload,
headers
)
res = self.conn.getresponse()
data = json.loads(res.read().decode("utf-8"))
self.category_cache[domain] = data
return data
def evaluate_governance(self, domain):
data = self.category_cache.get(domain)
if not data:
data = self.fetch_category(domain)
categories = [
c[0].split("Category name: ")[1]
for c in data.get("iab_classification", [])
]
for cat in categories:
for pattern, action in self.GOVERNANCE_ACTIONS.items():
if pattern.lower() in cat.lower():
return action, f"Policy: {action} for {cat}"
return "allow", "No governance rule triggered"
# Usage in governance pipeline
feed = CategoryFeedConsumer(api_key="your_api_key")
action, reason = feed.evaluate_governance("example.com")
print(f"Governance decision: {action} — {reason}")
class CategoryFeedHandler {
constructor(apiKey) {
this.apiKey = apiKey;
this.feedCache = new Map();
this.governanceRules = new Map([
["Adult", "block"],
["Malware", "block"],
["Gambling", "review"],
["Phishing", "block"]
]);
}
async enrichDomain(domain) {
if (this.feedCache.has(domain)) {
return this.feedCache.get(domain);
}
const response = await fetch(
"https://www.websitecategorizationapi.com" +
"/api/iab/iab_web_content_filtering.php",
{
method: "POST",
headers: {
"Content-Type": "application/x-www-form-urlencoded"
},
body: new URLSearchParams({
query: domain,
api_key: this.apiKey,
data_type: "url",
expanded_categories: "1"
})
}
);
const classification = await response.json();
this.feedCache.set(domain, classification);
return classification;
}
async applyGovernance(domain) {
const data = await this.enrichDomain(domain);
const filterCat =
data.filtering_taxonomy?.[0]?.[0]
?.replace("Category name: ", "") || "Unknown";
for (const [pattern, action] of this.governanceRules) {
if (filterCat.includes(pattern)) {
return { domain, category: filterCat, action };
}
}
return { domain, category: filterCat, action: "allow" };
}
}
Purpose-built domain databases for AI agent filtering. Includes IAB categories, 20+ page types, reputation scores, and popularity rankings. One-time purchase with perpetual license.
10 Million Domains with Page-Type Intelligence
One-time purchase: Perpetual license | Optional Updates: $1,599/year
20 Million Domains with Full Intelligence Suite
One-time purchase: Perpetual license | Optional Updates: $2,999/year
50 Million Domains with Complete Intelligence Suite
One-time purchase: Perpetual license | Optional Updates: $4,999/year
Also available: Enterprise URL Database up to 102M domains from $2,499. View all database tiers →
Search any IAB or Web Filtering category to see how many domains are in our 102M Enterprise Database — the same data your content category feed delivers to your governance engine.
How 102 million domains from our main Enterprise Database are distributed across IAB v3 taxonomy classifications
Spanning Tier 1 through Tier 4 classifications from our 102M Enterprise Database
Charts display domain counts for the top 50 out of 700+ categories in our 102M Enterprise Database. To check the number of domains for the remaining 650+ categories, use the Category Counter tool above .
The rise of agentic AI represents a fundamental shift in how organizations interact with the internet. Instead of a human employee navigating websites one tab at a time, an AI agent can open hundreds of connections simultaneously, following link chains across domains, executing search queries, parsing results, and making autonomous decisions about which sites to visit next. This scale of autonomous web activity is unprecedented — and it demands a governance model that operates at the same speed and scale.
Traditional web filtering solutions were designed for human browsing patterns: one user, one browser session, a few hundred page views per day. An agentic AI deployment can generate thousands of URL requests per minute across a fleet of agent instances. The filtering layer must match this throughput, which is why a pre-loaded category feed — rather than per-request API calls — is the optimal architecture for agent governance at scale.
The most robust category feed architecture begins with a bulk load of the complete 102 million domain database into a local data store. This initial ingestion creates the baseline category intelligence that every agent instance can query with sub-millisecond latency. No external API call is needed for domains that exist in the local store, which eliminates both network latency and single-point-of-failure risk.
After the initial load, the feed switches to delta updates. Instead of re-ingesting 102 million records every refresh cycle, you receive only the records that have changed — new domains added, existing domains re-categorized, or domains removed. This incremental approach reduces processing overhead by 95% while keeping your local store current. For most deployments, quarterly full refreshes combined with ongoing delta updates provide the optimal balance between freshness and efficiency.
Once the category feed is loaded into your local store, every governance decision becomes a deterministic lookup. When an agent signals intent to navigate to a URL, the governance middleware extracts the domain, queries the local category store, and receives a structured response containing the IAB taxonomy classification, web filtering category, page type, reputation score, and popularity ranking. The middleware then evaluates this data against your policy rules and returns an allow, block, or review decision — all within microseconds.
This deterministic approach is fundamentally different from model-based filtering, where a secondary LLM evaluates each URL. Model-based filtering introduces latency (500ms to 2 seconds per evaluation), non-determinism (the same URL may receive different classifications on consecutive calls), and cost ($0.01 to $0.03 per evaluation at scale). A feed-based lookup eliminates all three of these problems simultaneously.
The category feed is not a policy engine — it is the data source that policy engines consume. Your policy engine defines the rules: which IAB categories are allowed, which web filtering categories are blocked, which page types require human review, and which reputation scores trigger enhanced monitoring. The feed provides the raw category intelligence; the policy engine applies your organizational logic to that intelligence.
This separation of concerns — data source versus decision engine — is critical for maintainability. When your organization changes its policies (for example, deciding to allow agents to access social media sites that were previously blocked), you update the policy engine rules without touching the category feed. When the category data changes (for example, a domain migrating from "News" to "Gambling"), the feed updates automatically without requiring policy rule changes.
Domain categories are not static. A news website might add a gambling section. A legitimate business domain might be compromised and begin hosting malware. A social media platform might launch a financial services product. These category changes happen continuously across the internet, and your governance layer must reflect them.
Our 102M database is continuously re-evaluated using machine learning classifiers that analyze page content, link graphs, DNS records, and traffic patterns. When a domain's category changes, the change propagates through the feed pipeline within the next update cycle. For organizations that require near-real-time category freshness, the API fallback layer provides on-demand re-classification of any domain, bypassing the feed update cycle entirely.
Enterprise deployments often run dozens or hundreds of concurrent agent instances, each making independent navigation decisions. A centralized category feed architecture ensures that every agent instance references the same category data, eliminating policy drift that would occur if each agent maintained its own independent classification logic. The recommended pattern is a shared Redis or PostgreSQL instance that serves as the category store, with each agent querying it over the local network.
For globally distributed deployments, replicate the category store across regions. The 102M database compresses to approximately 8GB, making it practical to deploy regional replicas in every availability zone where your agents operate. This architecture provides sub-millisecond lookup latency regardless of the agent's geographic location.
Every category lookup generates a structured log entry: the timestamp, the requesting agent instance, the target domain, the resolved category, the page type, and the governance decision. These log entries form the audit trail that compliance teams need to demonstrate that your AI agents are operating within policy boundaries. For regulated industries — financial services, healthcare, government — this audit trail is not optional; it is a regulatory requirement.
The feed-based architecture makes audit trails inherently consistent. Because every agent references the same category data from the same feed, the audit logs tell a coherent story. If a domain was categorized as "Financial Services" at the time the agent visited it, the audit log reflects that exact classification — not a probabilistic guess that might differ if re-evaluated later.
Organizations that deploy category feeds early gain a structural advantage over competitors that attempt to build agent governance ad hoc. The feed provides a consistent, auditable, and scalable foundation for agent governance that can be extended as new governance requirements emerge. When regulators publish new rules about AI agent web access — and they will — organizations with feed-based governance can implement compliance changes by updating policy rules, not rebuilding infrastructure.
The 102M database covers 99.5% of the active internet. This coverage level means that your governance layer can make informed decisions about virtually every domain your agents will encounter, without falling back to expensive and slow model-based classification. The feed does not replace your policy engine — it empowers it with the structured data it needs to operate at the speed and scale of agentic AI.
Deploy the 102M domain database as your agent governance feed. One-time purchase, perpetual license, continuous category intelligence for every agent in your fleet.