OpenAI Operator, Anthropic Computer Use, and Google Project Mariner are navigating the open web autonomously. Without a structured domain allowlist, these agents visit any URL they encounter — including admin panels, payment gateways, and sensitive internal tools. Our 102 million domain categorization database lets you build deterministic allowlists grounded in IAB categories, page types, and reputation scores so your operator-style agents only visit pre-approved destinations.
When you deploy an agent that controls a web browser, every URL on the internet becomes a potential destination. Without an allowlist, there is no mechanism to constrain where the agent goes.
Operator-style agents like OpenAI Operator and Anthropic Computer Use are designed to complete multi-step tasks that involve navigating websites, clicking links, filling forms, and extracting data. When a user instructs an agent to "compare enterprise SaaS pricing," the agent may visit dozens of domains — and without a domain allowlist, it has no way to distinguish between a vendor's public pricing page, a competitor's internal wiki that happens to be indexed, a phishing clone of a legitimate vendor, or an adult content site that ranks for an ambiguous query.
Instead of manually curating a list of approved URLs — which becomes stale within weeks and cannot scale beyond a few hundred entries — you build your allowlist dynamically from a pre-classified domain database. Our 102 million domain database tags every domain with IAB v3 taxonomy categories, web filtering classifications, page-type labels (login, checkout, settings, pricing, careers, product, and 15+ more), reputation scores, and global popularity rankings.
Your allowlist becomes a set of rules: allow all domains categorized as "Technology & Computing" with page type "pricing" or "product." Allow domains in "Business and Finance" with popularity rank under 100,000. Block any domain with page type "login," "admin," or "checkout" regardless of category. The database provides the structured data; your policy engine enforces the rules. The result is a deterministic allowlist that covers millions of domains without manual curation, updates automatically with each database refresh, and enforces consistent policy across every agent session.
Three approaches to constructing and enforcing domain allowlists using pre-classified categorization data
Define which IAB categories your agent is permitted to access. A financial research agent gets access to "Business and Finance," "Technology & Computing," and "News" categories. A recruiting agent gets "Careers" and "Education." Every domain in the 102M database that matches your approved categories is automatically included in the allowlist. No manual URL entry required — the database does the work.
Even within approved categories, certain page types should remain off-limits. A domain categorized as "Business and Finance" may have a public pricing page (allowed) and a login portal (blocked). Page-type labels let you create exclusion rules that apply universally: block "login," "checkout," "admin," and "settings" page types across all categories. The agent can browse the approved domain but cannot reach sensitive functional pages.
Layer reputation and popularity signals on top of category rules. Only allow domains with an OpenPageRank score above a threshold, or restrict access to domains within the top 1 million by global popularity. This eliminates newly registered domains, parked pages, and low-reputation sites from the allowlist even if they technically belong to an approved category. Reputation gating adds a second defense layer.
Production-ready snippets to enforce category-based allowlists in your agent harness
import http.client
import json
from urllib.parse import urlparse
class WhitelistManager:
"""Manages a category-based domain allowlist for
Operator and Computer Use agents."""
BLOCKED_PAGE_TYPES = [
"login", "checkout", "settings",
"admin", "signup", "password_reset"
]
def __init__(self, api_key, approved_categories,
min_pagerank=0, max_popularity_rank=None):
self.api_key = api_key
self.approved_categories = [
c.lower() for c in approved_categories
]
self.min_pagerank = min_pagerank
self.max_popularity_rank = max_popularity_rank
self.conn = http.client.HTTPSConnection(
"www.websitecategorizationapi.com"
)
self._cache = {}
def _classify(self, domain):
if domain in self._cache:
return self._cache[domain]
payload = (
f"query={domain}"
f"&api_key={self.api_key}"
f"&data_type=url"
f"&expanded_categories=1"
)
headers = {
"Content-Type":
"application/x-www-form-urlencoded"
}
self.conn.request(
"POST",
"/api/iab/iab_web_content_filtering.php",
payload, headers
)
res = self.conn.getresponse()
data = json.loads(res.read().decode("utf-8"))
self._cache[domain] = data
return data
def is_domain_whitelisted(self, url):
"""Check if a URL passes the allowlist policy.
Returns (allowed, reason) tuple."""
domain = urlparse(url).netloc or url
data = self._classify(domain)
# Extract page type
page_type = data.get("page_type", "unknown")
if page_type in self.BLOCKED_PAGE_TYPES:
return False, (
f"Blocked page type: {page_type}"
)
# Extract categories
categories = []
for c in data.get("iab_classification", []):
name = c[0].split("Category name: ")[1]
categories.append(name.lower())
# Check against approved categories
matched = any(
any(approved in cat
for approved in self.approved_categories)
for cat in categories
)
if not matched:
return False, (
f"No approved category match: "
f"{categories}"
)
# Reputation gate
pagerank = data.get("open_pagerank", 0)
if pagerank < self.min_pagerank:
return False, (
f"PageRank {pagerank} below "
f"minimum {self.min_pagerank}"
)
# Popularity gate
if self.max_popularity_rank:
rank = data.get("global_rank", 999999999)
if rank > self.max_popularity_rank:
return False, (
f"Rank {rank} exceeds maximum "
f"{self.max_popularity_rank}"
)
return True, "Domain whitelisted - approved"
# Usage with OpenAI Operator / Computer Use agent
wl = WhitelistManager(
api_key="your_api_key",
approved_categories=[
"technology", "business and finance",
"news", "education"
],
min_pagerank=3,
max_popularity_rank=1000000
)
allowed, reason = wl.is_domain_whitelisted(
"https://techcrunch.com/pricing"
)
print(f"Allowed: {allowed}, Reason: {reason}")
async function validateAgentNavigation(targetURL, allowlistPolicy) {
// Classify the target domain
const response = await fetch(
"https://www.websitecategorizationapi.com" +
"/api/iab/iab_web_content_filtering.php",
{
method: "POST",
headers: {
"Content-Type": "application/x-www-form-urlencoded"
},
body: new URLSearchParams({
query: targetURL,
api_key: allowlistPolicy.apiKey,
data_type: "url",
expanded_categories: "1"
})
}
);
const classification = await response.json();
// Extract IAB categories
const iabCategories = (classification.iab_classification || [])
.map(c => c[0]?.replace("Category name: ", "") || "")
.filter(Boolean);
// Extract page type and filtering category
const pageType = classification.page_type || "unknown";
const filterCategory =
classification.filtering_taxonomy?.[0]?.[0]
?.replace("Category name: ", "") || "Unknown";
const pageRank = classification.open_pagerank || 0;
const decision = {
url: targetURL,
categories: iabCategories,
pageType: pageType,
filterCategory: filterCategory,
pageRank: pageRank,
action: "block",
reason: "No matching allowlist rule",
timestamp: new Date().toISOString()
};
// Check page type exclusions first
if (allowlistPolicy.blockedPageTypes.includes(pageType)) {
decision.reason = `Blocked page type: ${pageType}`;
return decision;
}
// Check against approved categories
const categoryMatch = iabCategories.some(cat =>
allowlistPolicy.approvedCategories.some(approved =>
cat.toLowerCase().includes(approved.toLowerCase())
)
);
if (!categoryMatch) {
decision.reason =
`Categories ${iabCategories.join(", ")} not in allowlist`;
return decision;
}
// Check minimum reputation threshold
if (pageRank < (allowlistPolicy.minPageRank || 0)) {
decision.reason =
`PageRank ${pageRank} below threshold`;
return decision;
}
decision.action = "allow";
decision.reason = "Domain passes allowlist policy";
return decision;
}
// Example usage for Operator-style agent
const policy = {
apiKey: "your_api_key",
approvedCategories: [
"Technology", "Business", "News", "Education"
],
blockedPageTypes: [
"login", "checkout", "admin", "settings", "signup"
],
minPageRank: 3
};
const result = await validateAgentNavigation(
"https://example.com/products", policy
);
if (result.action === "block") {
console.log(`Navigation blocked: ${result.reason}`);
}
Purpose-built domain databases for AI agent filtering. Includes IAB categories, 20+ page types, reputation scores, and popularity rankings. One-time purchase with perpetual license.
10 Million Domains with Page-Type Intelligence
One-time purchase: Perpetual license | Optional Updates: $1,599/year
20 Million Domains with Full Intelligence Suite
One-time purchase: Perpetual license | Optional Updates: $2,999/year
50 Million Domains with Complete Intelligence Suite
One-time purchase: Perpetual license | Optional Updates: $4,999/year
Also available: Enterprise URL Database up to 102M domains from $2,499. View all database tiers →
Search any IAB or Web Filtering category to see how many domains are in our 102M Enterprise Database — the same data your AI agent filtering rules will reference.
How 102 million domains from our main Enterprise Database are distributed across IAB v3 taxonomy classifications
Spanning Tier 1 through Tier 4 classifications from our 102M Enterprise Database
Charts display domain counts for the top 50 out of 700+ categories in our 102M Enterprise Database. To check the number of domains for the remaining 650+ categories, use the Category Counter tool above .
The first generation of AI agents operated within sandboxed environments — they could query APIs, search the web, and return text, but they never controlled a browser. That changed with the release of OpenAI Operator, Anthropic Computer Use, and Google Project Mariner. These agents see the screen, move the cursor, click buttons, type into input fields, and navigate between pages exactly as a human would. The implication is profound: every website on the public internet is now an attack surface that your agent can reach, and every page the agent visits creates a record, a cookie, a server log, and potentially a compliance event.
Traditional blocklists — lists of known-bad domains — are insufficient for this threat model. A blocklist catches domains that have already been identified as malicious or inappropriate. It does nothing about the millions of benign-but-irrelevant domains that an agent should not visit for a specific task. An allowlist inverts the model: instead of listing what is bad, you list what is approved. Everything not on the allowlist is denied by default. This deny-by-default posture is the only architecture that provides meaningful containment for autonomous browser agents.
The challenge with allowlists has always been scale. Manually curating a list of approved URLs is feasible for a few hundred entries but collapses at thousands. A domain categorization database solves this by converting manual curation into declarative rules. Instead of listing individual URLs, you declare which categories, page types, and reputation thresholds constitute your approved perimeter. The database resolves those rules against 102 million pre-classified domains, producing an effective allowlist of millions of entries that is maintained automatically.
OpenAI Operator is designed to complete multi-step tasks in a web browser on behalf of the user. The agent receives a natural-language instruction, plans a sequence of browser actions, and executes them — opening URLs, clicking links, reading page content, filling forms, and navigating between tabs. At each step, the agent decides which URL to visit next based on its understanding of the task and the current page content.
The allowlist integration point is between the agent's navigation decision and the browser's HTTP request. When the Operator agent decides to navigate to a new URL, the middleware intercepts the request, extracts the target domain, queries the categorization database, evaluates the result against the allowlist policy, and either permits or blocks the navigation. If the domain is blocked, the middleware returns a structured message to the agent explaining why — "domain not in approved categories" or "blocked page type: login" — so the agent can adjust its plan rather than simply failing.
This architecture is transparent to the agent itself. The Operator agent does not need to be aware of the allowlist; it simply receives a navigation failure and replans. This separation of concerns means you can update your allowlist policy without modifying the agent's code, prompts, or configuration. The policy is externalized into the middleware layer where your security team controls it.
Anthropic Computer Use operates at a lower level of abstraction than Operator. Instead of issuing high-level browser commands, Computer Use agents see pixel-level screenshots of the screen and generate mouse and keyboard actions. The agent literally sees what a human would see and interacts with the interface using the same input mechanisms. This makes Computer Use agents extraordinarily flexible — they can operate any application, not just web browsers — but it also makes them harder to constrain.
For Computer Use agents, the allowlist must be enforced at the network level or through a browser extension that intercepts navigation events. Because the agent is generating raw mouse clicks and keystrokes, there is no high-level "navigate to URL" command to intercept in the agent's action stream. Instead, you monitor the browser's actual navigation events. When the browser begins loading a new URL, the allowlist middleware checks the target domain and either allows the page to load or redirects to a block page. The agent sees the block page in its next screenshot and adjusts its behavior accordingly.
This network-level enforcement is where the categorization database provides the most value. Each URL check must resolve in under 10 milliseconds to avoid disrupting the agent's visual feedback loop. A local database lookup — the 102M domain database loaded into Redis or an in-memory hash table — satisfies this latency requirement trivially. A remote API call would not.
The most effective allowlist strategy is task-specific: each agent task gets its own allowlist policy tailored to the domains it legitimately needs to access. A financial research task gets access to "Business and Finance," "News," and "Technology & Computing" categories. A competitive intelligence task gets "Business and Finance" and "Shopping" but not "News" (to avoid the agent getting distracted by current events). A recruiting task gets "Careers," "Education," and "Business and Finance > Human Resources."
The IAB Content Taxonomy v3 enables this granularity with its four-tier hierarchy. Tier 1 provides 29 broad categories for coarse-grained control. Tier 2 breaks these into approximately 200 subcategories. Tier 3 and Tier 4 provide progressively finer distinctions. You can mix tiers in a single allowlist policy: allow all of Tier 1 "Technology & Computing" (broad access to tech sites), allow only Tier 2 "Business and Finance > Financial Services" (narrow access to financial sites), and block all of Tier 1 "Adult Content" (broad block on sensitive categories).
The database supports this by providing every domain's full category hierarchy. A single domain may have multiple categories assigned — for example, a fintech company's website might be tagged as both "Technology & Computing > Artificial Intelligence" and "Business and Finance > Financial Services." Your allowlist policy evaluates all assigned categories: if any one matches an approved category, the domain passes. If none match, it is blocked.
Category-based allowlisting controls which domains the agent can visit. Page-type exclusions control which pages on those domains the agent can access. This distinction is crucial because many approved domains contain pages that should be off-limits to autonomous agents. A SaaS vendor's website might be categorized as "Technology & Computing" and appear on your allowlist, but its login page, admin panel, and billing settings page should still be blocked.
Our database classifies pages into more than 20 distinct types: homepage, about, contact, pricing, careers, login, signup, checkout, settings, admin, legal, privacy policy, terms of service, blog, documentation, API reference, support, FAQ, forum, and product pages. For allowlist enforcement, the recommended default exclusion set includes login, signup, checkout, settings, admin, and password-reset page types. These functional pages represent the highest-risk surfaces for agent interaction — they involve authentication, payment, or configuration changes that an autonomous agent should never perform without explicit human authorization.
Page-type exclusions apply universally across all categories. Even if a domain belongs to your most trusted category with the highest reputation score, the agent should not access its login page. This universal application simplifies your policy definition: you define category-based inclusion rules and page-type-based exclusion rules separately, and the middleware evaluates both in sequence.
Not every domain in an approved IAB category should be on the allowlist. A domain registered yesterday that happens to be categorized as "Technology & Computing" is not as trustworthy as techcrunch.com, even though both share the same category. Reputation and popularity signals add a quality dimension to category-based allowlisting.
OpenPageRank scores range from 0 to 10, with higher scores indicating greater domain authority. Setting a minimum PageRank threshold of 3 or 4 eliminates the vast majority of low-quality, newly registered, and spammy domains. Global popularity rankings — derived from the Google Chrome User Experience Report — indicate how frequently real users visit each domain. Restricting the allowlist to domains within the top 1 million by global popularity provides broad coverage while excluding the long tail of rarely visited sites.
Combining category rules, page-type exclusions, and reputation thresholds produces a layered allowlist that is simultaneously broad (covering millions of domains) and precise (excluding low-quality sites and sensitive page types). The database provides all three signal types for every domain, so your policy engine evaluates them in a single lookup.
A static allowlist becomes stale as new domains are registered, existing domains change categories, and your agent's task scope evolves. The database-driven approach solves this by separating the policy (which categories and page types are approved) from the data (which domains belong to which categories). When you purchase a database update — available quarterly — your allowlist automatically expands to include newly classified domains and removes domains whose categories have changed.
Your policy rules remain the same. If you approve "Technology & Computing" today and receive a database update next quarter that adds 50,000 newly classified technology domains, all 50,000 are automatically included in your allowlist without any configuration change. Conversely, if a domain's category changes from "Technology & Computing" to "Adult Content," it is automatically excluded from your allowlist on the next database refresh. This automation is essential for maintaining allowlist accuracy at the scale of millions of domains.
Every allowlist decision — allow or block — should be logged with the full context: the target URL, the domain's categories, page type, reputation score, the policy rule that matched, and the timestamp. This audit trail serves three purposes. First, it enables your security team to review agent behavior and verify that the allowlist is functioning correctly. Second, it provides evidence of compliance for regulatory audits — you can demonstrate that your AI agent was constrained to approved domains for every navigation event. Third, it powers allowlist refinement: by analyzing blocked domains, you can identify categories or domains that should be added to the allowlist, and by analyzing allowed domains, you can identify patterns that suggest the allowlist is too permissive.
The categorization database makes these audit logs rich and actionable. Instead of logging raw URLs, you log structured data: domain, IAB category, page type, PageRank score, popularity rank, and policy decision. This structured data can be aggregated, visualized, and queried to produce security dashboards that show agent navigation patterns, category distribution of visited domains, blocked navigation attempts by reason, and allowlist coverage gaps.
Enterprise deployments often run multiple agents with different roles, each requiring a different allowlist. A financial analysis agent, a competitive intelligence agent, a customer support agent, and a recruiting agent all need access to different slices of the internet. The database-driven allowlist architecture supports this naturally: each agent gets its own policy configuration specifying approved categories, excluded page types, and reputation thresholds. All agents query the same underlying database, but each evaluates the results against its own policy rules.
This multi-agent architecture also supports hierarchical policies. A global policy defines universal blocks — no agent may access adult content, malware domains, or login pages. Agent-specific policies define additional category approvals on top of the global base. This inheritance model ensures that security-critical rules are enforced consistently across all agents while allowing task-specific flexibility at the individual agent level.
Deploying a category-based allowlist for your operator-style agents requires three steps. First, acquire the AI Agent Domain Database — the 10M tier covers the most popular domains; the 20M tier provides comprehensive coverage. Second, load the database into your preferred data store (Redis for speed, PostgreSQL for query flexibility, SQLite for simplicity). Third, implement the middleware layer that intercepts agent navigation requests, queries the database, and enforces your allowlist policy. The code snippets above provide production-ready starting points for both Python and JavaScript agent stacks.
Once deployed, your agents operate within a defined perimeter. Every navigation request is validated. Every blocked domain is logged. Every policy decision is deterministic and auditable. The result is an agent deployment that your security, compliance, and legal teams can approve — not because the agent is perfectly safe, but because the allowlist provides a verifiable, enforceable boundary around its web access.
Deploy category-based domain allowlists for OpenAI Operator, Anthropic Computer Use, and any autonomous browser agent. One-time purchase, perpetual license, 102 million domains classified and ready.