WebsiteCategorizationAPI
Home
Demo Tools - Categorization
Website Categorization Text Classification URL Database Taxonomy Mapper
Demo Tools - Website Intel
Technology Detector Quality Score Competitor Finder
Demo Tools - Brand Safety
Brand Safety Checker Brand Suitability Quality Checker
Demo Tools - Content
Sentiment Analyzer Context Aware Ads
Resources
API Documentation Pricing Login
Try Categorization

How to Whitelist Domains for Operator and Computer Use Agents

OpenAI Operator, Anthropic Computer Use, and Google Project Mariner are navigating the open web autonomously. Without a structured domain allowlist, these agents visit any URL they encounter — including admin panels, payment gateways, and sensitive internal tools. Our 102 million domain categorization database lets you build deterministic allowlists grounded in IAB categories, page types, and reputation scores so your operator-style agents only visit pre-approved destinations.

102M
Classified Domains
700+
IAB Categories
20+
Page Types
99.5%
Internet Coverage

The Problem: Operator-Style Agents Browse Without Boundaries

When you deploy an agent that controls a web browser, every URL on the internet becomes a potential destination. Without an allowlist, there is no mechanism to constrain where the agent goes.

Unbounded Agent Navigation Is an Enterprise Risk

Operator-style agents like OpenAI Operator and Anthropic Computer Use are designed to complete multi-step tasks that involve navigating websites, clicking links, filling forms, and extracting data. When a user instructs an agent to "compare enterprise SaaS pricing," the agent may visit dozens of domains — and without a domain allowlist, it has no way to distinguish between a vendor's public pricing page, a competitor's internal wiki that happens to be indexed, a phishing clone of a legitimate vendor, or an adult content site that ranks for an ambiguous query.

  • Credential exposure: Agents navigate to login and authentication pages, potentially triggering MFA flows or submitting cached credentials to unverified domains
  • Data leakage via forms: Without page-type awareness, agents may encounter and interact with contact forms, feedback surveys, or data collection pages on unapproved domains
  • Compliance violations: Agents accessing gambling, adult content, or regulated financial sites can trigger compliance incidents even if the agent did not interact with the content
  • Uncontrolled footprint: Every domain visit by a Computer Use agent leaves HTTP fingerprints, cookies, and server logs that your security team cannot retroactively scrub

The Solution: Category-Based Domain Allowlists from a 102M Classified Database

Instead of manually curating a list of approved URLs — which becomes stale within weeks and cannot scale beyond a few hundred entries — you build your allowlist dynamically from a pre-classified domain database. Our 102 million domain database tags every domain with IAB v3 taxonomy categories, web filtering classifications, page-type labels (login, checkout, settings, pricing, careers, product, and 15+ more), reputation scores, and global popularity rankings.

Your allowlist becomes a set of rules: allow all domains categorized as "Technology & Computing" with page type "pricing" or "product." Allow domains in "Business and Finance" with popularity rank under 100,000. Block any domain with page type "login," "admin," or "checkout" regardless of category. The database provides the structured data; your policy engine enforces the rules. The result is a deterministic allowlist that covers millions of domains without manual curation, updates automatically with each database refresh, and enforces consistent policy across every agent session.

Domain Allowlist Visualization

Approved domains form a trusted perimeter around agent operations

How to Build Allowlists for Operator-Style Agents

Three approaches to constructing and enforcing domain allowlists using pre-classified categorization data

Category-Based Allowlisting

Define which IAB categories your agent is permitted to access. A financial research agent gets access to "Business and Finance," "Technology & Computing," and "News" categories. A recruiting agent gets "Careers" and "Education." Every domain in the 102M database that matches your approved categories is automatically included in the allowlist. No manual URL entry required — the database does the work.

Page-Type Exclusion Rules

Even within approved categories, certain page types should remain off-limits. A domain categorized as "Business and Finance" may have a public pricing page (allowed) and a login portal (blocked). Page-type labels let you create exclusion rules that apply universally: block "login," "checkout," "admin," and "settings" page types across all categories. The agent can browse the approved domain but cannot reach sensitive functional pages.

Reputation-Gated Access

Layer reputation and popularity signals on top of category rules. Only allow domains with an OpenPageRank score above a threshold, or restrict access to domains within the top 1 million by global popularity. This eliminates newly registered domains, parked pages, and low-reputation sites from the allowlist even if they technically belong to an approved category. Reputation gating adds a second defense layer.

Agent Routing Through Allowlist Gates

Each navigation request is validated against approved category rules

Integration Code for Domain Allowlisting

Production-ready snippets to enforce category-based allowlists in your agent harness

Python — WhitelistManager for Agent Navigation

import http.client import json from urllib.parse import urlparse class WhitelistManager: """Manages a category-based domain allowlist for Operator and Computer Use agents.""" BLOCKED_PAGE_TYPES = [ "login", "checkout", "settings", "admin", "signup", "password_reset" ] def __init__(self, api_key, approved_categories, min_pagerank=0, max_popularity_rank=None): self.api_key = api_key self.approved_categories = [ c.lower() for c in approved_categories ] self.min_pagerank = min_pagerank self.max_popularity_rank = max_popularity_rank self.conn = http.client.HTTPSConnection( "www.websitecategorizationapi.com" ) self._cache = {} def _classify(self, domain): if domain in self._cache: return self._cache[domain] payload = ( f"query={domain}" f"&api_key={self.api_key}" f"&data_type=url" f"&expanded_categories=1" ) headers = { "Content-Type": "application/x-www-form-urlencoded" } self.conn.request( "POST", "/api/iab/iab_web_content_filtering.php", payload, headers ) res = self.conn.getresponse() data = json.loads(res.read().decode("utf-8")) self._cache[domain] = data return data def is_domain_whitelisted(self, url): """Check if a URL passes the allowlist policy. Returns (allowed, reason) tuple.""" domain = urlparse(url).netloc or url data = self._classify(domain) # Extract page type page_type = data.get("page_type", "unknown") if page_type in self.BLOCKED_PAGE_TYPES: return False, ( f"Blocked page type: {page_type}" ) # Extract categories categories = [] for c in data.get("iab_classification", []): name = c[0].split("Category name: ")[1] categories.append(name.lower()) # Check against approved categories matched = any( any(approved in cat for approved in self.approved_categories) for cat in categories ) if not matched: return False, ( f"No approved category match: " f"{categories}" ) # Reputation gate pagerank = data.get("open_pagerank", 0) if pagerank < self.min_pagerank: return False, ( f"PageRank {pagerank} below " f"minimum {self.min_pagerank}" ) # Popularity gate if self.max_popularity_rank: rank = data.get("global_rank", 999999999) if rank > self.max_popularity_rank: return False, ( f"Rank {rank} exceeds maximum " f"{self.max_popularity_rank}" ) return True, "Domain whitelisted - approved" # Usage with OpenAI Operator / Computer Use agent wl = WhitelistManager( api_key="your_api_key", approved_categories=[ "technology", "business and finance", "news", "education" ], min_pagerank=3, max_popularity_rank=1000000 ) allowed, reason = wl.is_domain_whitelisted( "https://techcrunch.com/pricing" ) print(f"Allowed: {allowed}, Reason: {reason}")

JavaScript — Agent Navigation Allowlist Validator

async function validateAgentNavigation(targetURL, allowlistPolicy) { // Classify the target domain const response = await fetch( "https://www.websitecategorizationapi.com" + "/api/iab/iab_web_content_filtering.php", { method: "POST", headers: { "Content-Type": "application/x-www-form-urlencoded" }, body: new URLSearchParams({ query: targetURL, api_key: allowlistPolicy.apiKey, data_type: "url", expanded_categories: "1" }) } ); const classification = await response.json(); // Extract IAB categories const iabCategories = (classification.iab_classification || []) .map(c => c[0]?.replace("Category name: ", "") || "") .filter(Boolean); // Extract page type and filtering category const pageType = classification.page_type || "unknown"; const filterCategory = classification.filtering_taxonomy?.[0]?.[0] ?.replace("Category name: ", "") || "Unknown"; const pageRank = classification.open_pagerank || 0; const decision = { url: targetURL, categories: iabCategories, pageType: pageType, filterCategory: filterCategory, pageRank: pageRank, action: "block", reason: "No matching allowlist rule", timestamp: new Date().toISOString() }; // Check page type exclusions first if (allowlistPolicy.blockedPageTypes.includes(pageType)) { decision.reason = `Blocked page type: ${pageType}`; return decision; } // Check against approved categories const categoryMatch = iabCategories.some(cat => allowlistPolicy.approvedCategories.some(approved => cat.toLowerCase().includes(approved.toLowerCase()) ) ); if (!categoryMatch) { decision.reason = `Categories ${iabCategories.join(", ")} not in allowlist`; return decision; } // Check minimum reputation threshold if (pageRank < (allowlistPolicy.minPageRank || 0)) { decision.reason = `PageRank ${pageRank} below threshold`; return decision; } decision.action = "allow"; decision.reason = "Domain passes allowlist policy"; return decision; } // Example usage for Operator-style agent const policy = { apiKey: "your_api_key", approvedCategories: [ "Technology", "Business", "News", "Education" ], blockedPageTypes: [ "login", "checkout", "admin", "settings", "signup" ], minPageRank: 3 }; const result = await validateAgentNavigation( "https://example.com/products", policy ); if (result.action === "block") { console.log(`Navigation blocked: ${result.reason}`); }

Approved Domain Trust Rings

Concentric trust tiers — from core-approved to reputation-gated to blocked

AI Agent Database Pricing

Purpose-built domain databases for AI agent filtering. Includes IAB categories, 20+ page types, reputation scores, and popularity rankings. One-time purchase with perpetual license.

AI Agent Database
AI Agent Domain Database 10M
$7,999

10 Million Domains with Page-Type Intelligence

One-time purchase: Perpetual license  |  Optional Updates: $1,599/year

  • 10M+ Categorized Domains
  • IAB Taxonomies v2 & v3
  • 20+ Page Type Labels
  • Web Filtering Categories
  • OpenPageRank Scores
  • Global Popularity Rankings
  • Priority Enterprise Support
Popular
AI Agent Domain Database 20M
$14,999

20 Million Domains with Full Intelligence Suite

One-time purchase: Perpetual license  |  Optional Updates: $2,999/year

  • 20M+ Categorized Domains
  • IAB Taxonomies v2 & v3
  • 20+ Page Type Labels
  • Web Filtering Categories
  • OpenPageRank Scores
  • Global & Country Rankings
  • Dedicated Account Manager
Maximum Coverage
AI Agent Domain Database 50M
$24,999

50 Million Domains with Complete Intelligence Suite

One-time purchase: Perpetual license  |  Optional Updates: $4,999/year

  • 50M+ Categorized Domains
  • IAB Taxonomies v2 & v3
  • 20+ Page Type Labels
  • Web Filtering Categories
  • OpenPageRank Scores
  • Global & Country Rankings
  • Dedicated Account Manager

Also available: Enterprise URL Database up to 102M domains from $2,499. View all database tiers →

How Many Domains in Each Category?

Search any IAB or Web Filtering category to see how many domains are in our 102M Enterprise Database — the same data your AI agent filtering rules will reference.

Popular:
Database Analytics

Domain Distribution by Category in Our 102M Enterprise Database

How 102 million domains from our main Enterprise Database are distributed across IAB v3 taxonomy classifications

Top 50 IAB v3 Categories

Spanning Tier 1 through Tier 4 classifications from our 102M Enterprise Database

IAB v3

Charts display domain counts for the top 50 out of 700+ categories in our 102M Enterprise Database. To check the number of domains for the remaining 650+ categories, use the Category Counter tool above .

Category Filter Pipeline

Domains flowing through IAB category gates into allowlist tiers

Why Operator-Style Agents Need Structured Allowlists

The first generation of AI agents operated within sandboxed environments — they could query APIs, search the web, and return text, but they never controlled a browser. That changed with the release of OpenAI Operator, Anthropic Computer Use, and Google Project Mariner. These agents see the screen, move the cursor, click buttons, type into input fields, and navigate between pages exactly as a human would. The implication is profound: every website on the public internet is now an attack surface that your agent can reach, and every page the agent visits creates a record, a cookie, a server log, and potentially a compliance event.

Traditional blocklists — lists of known-bad domains — are insufficient for this threat model. A blocklist catches domains that have already been identified as malicious or inappropriate. It does nothing about the millions of benign-but-irrelevant domains that an agent should not visit for a specific task. An allowlist inverts the model: instead of listing what is bad, you list what is approved. Everything not on the allowlist is denied by default. This deny-by-default posture is the only architecture that provides meaningful containment for autonomous browser agents.

The challenge with allowlists has always been scale. Manually curating a list of approved URLs is feasible for a few hundred entries but collapses at thousands. A domain categorization database solves this by converting manual curation into declarative rules. Instead of listing individual URLs, you declare which categories, page types, and reputation thresholds constitute your approved perimeter. The database resolves those rules against 102 million pre-classified domains, producing an effective allowlist of millions of entries that is maintained automatically.

How OpenAI Operator Agent Navigation Works and Where Allowlists Fit

OpenAI Operator is designed to complete multi-step tasks in a web browser on behalf of the user. The agent receives a natural-language instruction, plans a sequence of browser actions, and executes them — opening URLs, clicking links, reading page content, filling forms, and navigating between tabs. At each step, the agent decides which URL to visit next based on its understanding of the task and the current page content.

The allowlist integration point is between the agent's navigation decision and the browser's HTTP request. When the Operator agent decides to navigate to a new URL, the middleware intercepts the request, extracts the target domain, queries the categorization database, evaluates the result against the allowlist policy, and either permits or blocks the navigation. If the domain is blocked, the middleware returns a structured message to the agent explaining why — "domain not in approved categories" or "blocked page type: login" — so the agent can adjust its plan rather than simply failing.

This architecture is transparent to the agent itself. The Operator agent does not need to be aware of the allowlist; it simply receives a navigation failure and replans. This separation of concerns means you can update your allowlist policy without modifying the agent's code, prompts, or configuration. The policy is externalized into the middleware layer where your security team controls it.

Anthropic Computer Use and the Unique Challenges of Screen-Level Control

Anthropic Computer Use operates at a lower level of abstraction than Operator. Instead of issuing high-level browser commands, Computer Use agents see pixel-level screenshots of the screen and generate mouse and keyboard actions. The agent literally sees what a human would see and interacts with the interface using the same input mechanisms. This makes Computer Use agents extraordinarily flexible — they can operate any application, not just web browsers — but it also makes them harder to constrain.

For Computer Use agents, the allowlist must be enforced at the network level or through a browser extension that intercepts navigation events. Because the agent is generating raw mouse clicks and keystrokes, there is no high-level "navigate to URL" command to intercept in the agent's action stream. Instead, you monitor the browser's actual navigation events. When the browser begins loading a new URL, the allowlist middleware checks the target domain and either allows the page to load or redirects to a block page. The agent sees the block page in its next screenshot and adjusts its behavior accordingly.

This network-level enforcement is where the categorization database provides the most value. Each URL check must resolve in under 10 milliseconds to avoid disrupting the agent's visual feedback loop. A local database lookup — the 102M domain database loaded into Redis or an in-memory hash table — satisfies this latency requirement trivially. A remote API call would not.

Building Task-Specific Allowlists with IAB Categories

The most effective allowlist strategy is task-specific: each agent task gets its own allowlist policy tailored to the domains it legitimately needs to access. A financial research task gets access to "Business and Finance," "News," and "Technology & Computing" categories. A competitive intelligence task gets "Business and Finance" and "Shopping" but not "News" (to avoid the agent getting distracted by current events). A recruiting task gets "Careers," "Education," and "Business and Finance > Human Resources."

The IAB Content Taxonomy v3 enables this granularity with its four-tier hierarchy. Tier 1 provides 29 broad categories for coarse-grained control. Tier 2 breaks these into approximately 200 subcategories. Tier 3 and Tier 4 provide progressively finer distinctions. You can mix tiers in a single allowlist policy: allow all of Tier 1 "Technology & Computing" (broad access to tech sites), allow only Tier 2 "Business and Finance > Financial Services" (narrow access to financial sites), and block all of Tier 1 "Adult Content" (broad block on sensitive categories).

The database supports this by providing every domain's full category hierarchy. A single domain may have multiple categories assigned — for example, a fintech company's website might be tagged as both "Technology & Computing > Artificial Intelligence" and "Business and Finance > Financial Services." Your allowlist policy evaluates all assigned categories: if any one matches an approved category, the domain passes. If none match, it is blocked.

Page-Type Exclusions: The Critical Second Layer

Category-based allowlisting controls which domains the agent can visit. Page-type exclusions control which pages on those domains the agent can access. This distinction is crucial because many approved domains contain pages that should be off-limits to autonomous agents. A SaaS vendor's website might be categorized as "Technology & Computing" and appear on your allowlist, but its login page, admin panel, and billing settings page should still be blocked.

Our database classifies pages into more than 20 distinct types: homepage, about, contact, pricing, careers, login, signup, checkout, settings, admin, legal, privacy policy, terms of service, blog, documentation, API reference, support, FAQ, forum, and product pages. For allowlist enforcement, the recommended default exclusion set includes login, signup, checkout, settings, admin, and password-reset page types. These functional pages represent the highest-risk surfaces for agent interaction — they involve authentication, payment, or configuration changes that an autonomous agent should never perform without explicit human authorization.

Page-type exclusions apply universally across all categories. Even if a domain belongs to your most trusted category with the highest reputation score, the agent should not access its login page. This universal application simplifies your policy definition: you define category-based inclusion rules and page-type-based exclusion rules separately, and the middleware evaluates both in sequence.

Reputation and Popularity Scoring as Allowlist Qualifiers

Not every domain in an approved IAB category should be on the allowlist. A domain registered yesterday that happens to be categorized as "Technology & Computing" is not as trustworthy as techcrunch.com, even though both share the same category. Reputation and popularity signals add a quality dimension to category-based allowlisting.

OpenPageRank scores range from 0 to 10, with higher scores indicating greater domain authority. Setting a minimum PageRank threshold of 3 or 4 eliminates the vast majority of low-quality, newly registered, and spammy domains. Global popularity rankings — derived from the Google Chrome User Experience Report — indicate how frequently real users visit each domain. Restricting the allowlist to domains within the top 1 million by global popularity provides broad coverage while excluding the long tail of rarely visited sites.

Combining category rules, page-type exclusions, and reputation thresholds produces a layered allowlist that is simultaneously broad (covering millions of domains) and precise (excluding low-quality sites and sensitive page types). The database provides all three signal types for every domain, so your policy engine evaluates them in a single lookup.

Dynamic Allowlist Updates Without Manual Intervention

A static allowlist becomes stale as new domains are registered, existing domains change categories, and your agent's task scope evolves. The database-driven approach solves this by separating the policy (which categories and page types are approved) from the data (which domains belong to which categories). When you purchase a database update — available quarterly — your allowlist automatically expands to include newly classified domains and removes domains whose categories have changed.

Your policy rules remain the same. If you approve "Technology & Computing" today and receive a database update next quarter that adds 50,000 newly classified technology domains, all 50,000 are automatically included in your allowlist without any configuration change. Conversely, if a domain's category changes from "Technology & Computing" to "Adult Content," it is automatically excluded from your allowlist on the next database refresh. This automation is essential for maintaining allowlist accuracy at the scale of millions of domains.

Audit Logging and Compliance for Allowlist Enforcement

Every allowlist decision — allow or block — should be logged with the full context: the target URL, the domain's categories, page type, reputation score, the policy rule that matched, and the timestamp. This audit trail serves three purposes. First, it enables your security team to review agent behavior and verify that the allowlist is functioning correctly. Second, it provides evidence of compliance for regulatory audits — you can demonstrate that your AI agent was constrained to approved domains for every navigation event. Third, it powers allowlist refinement: by analyzing blocked domains, you can identify categories or domains that should be added to the allowlist, and by analyzing allowed domains, you can identify patterns that suggest the allowlist is too permissive.

The categorization database makes these audit logs rich and actionable. Instead of logging raw URLs, you log structured data: domain, IAB category, page type, PageRank score, popularity rank, and policy decision. This structured data can be aggregated, visualized, and queried to produce security dashboards that show agent navigation patterns, category distribution of visited domains, blocked navigation attempts by reason, and allowlist coverage gaps.

Multi-Agent Allowlist Architectures

Enterprise deployments often run multiple agents with different roles, each requiring a different allowlist. A financial analysis agent, a competitive intelligence agent, a customer support agent, and a recruiting agent all need access to different slices of the internet. The database-driven allowlist architecture supports this naturally: each agent gets its own policy configuration specifying approved categories, excluded page types, and reputation thresholds. All agents query the same underlying database, but each evaluates the results against its own policy rules.

This multi-agent architecture also supports hierarchical policies. A global policy defines universal blocks — no agent may access adult content, malware domains, or login pages. Agent-specific policies define additional category approvals on top of the global base. This inheritance model ensures that security-critical rules are enforced consistently across all agents while allowing task-specific flexibility at the individual agent level.

Getting Started: From Zero to Allowlist in Under an Hour

Deploying a category-based allowlist for your operator-style agents requires three steps. First, acquire the AI Agent Domain Database — the 10M tier covers the most popular domains; the 20M tier provides comprehensive coverage. Second, load the database into your preferred data store (Redis for speed, PostgreSQL for query flexibility, SQLite for simplicity). Third, implement the middleware layer that intercepts agent navigation requests, queries the database, and enforces your allowlist policy. The code snippets above provide production-ready starting points for both Python and JavaScript agent stacks.

Once deployed, your agents operate within a defined perimeter. Every navigation request is validated. Every blocked domain is logged. Every policy decision is deterministic and auditable. The result is an agent deployment that your security, compliance, and legal teams can approve — not because the agent is perfectly safe, but because the allowlist provides a verifiable, enforceable boundary around its web access.

Agent Security Perimeter

Allowlist enforcement creates an auditable boundary around agent navigation

Build Your Agent Allowlist Today

Deploy category-based domain allowlists for OpenAI Operator, Anthropic Computer Use, and any autonomous browser agent. One-time purchase, perpetual license, 102 million domains classified and ready.

View AI Agent Database View 102M Enterprise Database
Stay in the loop

You are on the list!

We will send you updates that matter — no spam.