WebsiteCategorizationAPI
Home
Demo Tools - Categorization
Website Categorization Text Classification URL Database Taxonomy Mapper
Demo Tools - Website Intel
Technology Detector Quality Score Competitor Finder
Demo Tools - Brand Safety
Brand Safety Checker Brand Suitability Quality Checker
Demo Tools - Content
Sentiment Analyzer Context Aware Ads
Resources
API Documentation Pricing Login
Try Categorization

Category-Based Blocking for Agentic Browsers and Autonomous Agents

Agentic browsers operate at machine speed, visiting hundreds of pages per minute without human oversight. Category-based blocking gives your orchestration layer a deterministic, sub-millisecond mechanism to intercept every navigation request and enforce granular allow/block rules derived from IAB content categories, web filtering classifications, and page-type labels across 102 million pre-classified domains.

102M
Classified Domains
700+
IAB Categories
20+
Page Types
99.5%
Internet Coverage

The Problem: Agentic Browsers Ignore Content Boundaries

Without category-level enforcement, an agentic browser session treats every URL identically — a news article receives the same access privileges as a banking portal or an HR system login page.

Flat Access Models Create Compound Risk

When you deploy a browser-using AI agent — whether through Anthropic's Computer Use, OpenAI's Operator, or a Playwright-based custom agent — the default behavior is to navigate anywhere the task requires. There is no built-in concept of "this domain is appropriate for your task" versus "this domain is off-limits." The agent treats the entire internet as a single, undifferentiated surface. This flat access model creates compound risk because a single misguided navigation can cascade: the agent visits a phishing domain, encounters a deceptive login form, and attempts to authenticate with cached credentials — all within seconds, far faster than any human can intervene.

  • Category drift: An agent researching "financial software" follows a link to an actual online banking portal, encountering real transaction interfaces that trigger compliance violations
  • Malware exposure: Search results include SEO-poisoned links to domains categorized as malware distributors — without category awareness, the agent navigates directly to them
  • Sensitive content ingestion: Agents scraping industry news inadvertently visit domains hosting adult, gambling, or extremist content, creating brand safety and legal exposure
  • Authentication hazard: Without page-type blocking, agents reach SSO portals, admin consoles, and credential management pages where any interaction is a security incident

The Solution: Category-Granular Blocking Rules Derived from a Pre-Classified Database

Category-based blocking replaces the flat access model with a layered permission structure. Before every navigation event, the agent's browser harness queries a local copy of the 102 million domain database. The database returns the domain's IAB content categories (up to Tier 4 specificity), web filtering classification (Adult, Malware, Phishing, etc.), page-type labels (login, checkout, admin, settings), reputation score, and global popularity rank. Your policy engine evaluates these fields against a ruleset you define once and enforce automatically.

The blocking rules are deterministic — the same URL always produces the same category, which always triggers the same policy action. There is no probabilistic classification in the decision path, no model inference latency, and no hallucination risk. A domain classified as "Gambling" today will still be classified as "Gambling" tomorrow, ensuring consistent enforcement across every agent session.

Browser Navigation Blocking Grid

Visualizing category-based allow/block decisions across agentic browser sessions

How Category-Based Blocking Works in Practice

Three enforcement layers that transform raw categorization data into agentic browser policy

Hard Block by Web Filtering Category

Web filtering categories map directly to security policy. Domains classified as Malware, Phishing, Adult, Gambling, Weapons, or Illegal Content receive an unconditional hard block. The agent's navigation request is intercepted and terminated before any HTTP connection is established. No exceptions, no overrides, no human review path — these categories represent absolute prohibitions for autonomous agents.

Conditional Block by IAB Category

IAB content categories enable task-scoped filtering. A financial research agent might be allowed to visit "Business and Finance > Financial Services" domains but blocked from "Health & Fitness > Alternative Medicine." These rules are defined per agent profile or per task, creating contextual blocking that adapts to the agent's current objective without modifying the underlying database.

Page-Type Blocking for Interaction Safety

Even when a domain's content category is allowed, specific page types may be off-limits. Login pages, checkout flows, admin panels, and settings pages are blocked by default for all agent sessions. This prevents the agent from interacting with authentication, payment, or configuration interfaces regardless of the site's content classification. A "Shopping" domain is allowed for product research, but its checkout page is blocked.

Category Filter Cascade

URLs passing through layered category filters: Web Filtering, IAB, Page Type

Category-Based Blocking Code

Production-ready snippets to enforce category blocking in your agentic browser sessions

Python — Category Blocking Engine for Browser Agents

import http.client import json class CategoryBlockingEngine: """Enforces category-based blocking for agentic browser sessions.""" HARD_BLOCK_CATEGORIES = [ "Adult", "Malware", "Phishing", "Gambling", "Weapons", "Illegal Content", "Hate Speech" ] BLOCKED_PAGE_TYPES = [ "login", "checkout", "settings", "admin", "signup" ] def __init__(self, api_key, allowed_iab_categories=None): self.api_key = api_key self.allowed_iab = allowed_iab_categories or [] self.conn = http.client.HTTPSConnection( "www.websitecategorizationapi.com" ) def evaluate_url(self, target_url): payload = ( f"query={target_url}" f"&api_key={self.api_key}" f"&data_type=url" f"&expanded_categories=1" ) headers = { "Content-Type": "application/x-www-form-urlencoded" } self.conn.request( "POST", "/api/iab/iab_web_content_filtering.php", payload, headers ) res = self.conn.getresponse() return json.loads(res.read().decode("utf-8")) def enforce_blocking(self, target_url): data = self.evaluate_url(target_url) page_type = data.get("page_type", "unknown") filtering = data.get("filtering_taxonomy", []) iab_cats = [ c[0].split("Category name: ")[1] for c in data.get("iab_classification", []) ] # Layer 1: Hard block by web filtering category for f in filtering: cat_name = f[0].split("Category name: ")[1] if cat_name in self.HARD_BLOCK_CATEGORIES: return {"action": "block", "layer": "web_filter", "reason": f"Hard block: {cat_name}"} # Layer 2: Page-type blocking if page_type in self.BLOCKED_PAGE_TYPES: return {"action": "block", "layer": "page_type", "reason": f"Blocked page type: {page_type}"} # Layer 3: IAB category allowlist if self.allowed_iab: matched = any( any(a.lower() in c.lower() for a in self.allowed_iab) for c in iab_cats ) if not matched: return {"action": "block", "layer": "iab_scope", "reason": "Outside allowed IAB categories"} return {"action": "allow", "layer": "passed", "reason": "All category checks passed"} # Usage engine = CategoryBlockingEngine( api_key="your_api_key", allowed_iab=["Technology", "Business", "Science"] ) result = engine.enforce_blocking("https://example.com/admin") print(f"Decision: {result['action']} — {result['reason']}")

JavaScript — Browser Agent Category Gate

class BrowserAgentCategoryGate { constructor(apiKey, blockedCategories, blockedPageTypes) { this.apiKey = apiKey; this.blockedCategories = blockedCategories || [ "Adult", "Malware", "Phishing", "Gambling" ]; this.blockedPageTypes = blockedPageTypes || [ "login", "checkout", "admin", "settings" ]; this.auditLog = []; } async classifyAndDecide(targetURL) { const response = await fetch( "https://www.websitecategorizationapi.com" + "/api/iab/iab_web_content_filtering.php", { method: "POST", headers: { "Content-Type": "application/x-www-form-urlencoded" }, body: new URLSearchParams({ query: targetURL, api_key: this.apiKey, data_type: "url", expanded_categories: "1" }) } ); const data = await response.json(); const pageType = data.page_type || "unknown"; const filterCat = data.filtering_taxonomy?.[0]?.[0] ?.replace("Category name: ", "") || "Unknown"; const decision = { url: targetURL, filterCategory: filterCat, pageType: pageType, action: "allow", blockedBy: null, timestamp: new Date().toISOString() }; if (this.blockedCategories.includes(filterCat)) { decision.action = "block"; decision.blockedBy = `Category: ${filterCat}`; } else if (this.blockedPageTypes.includes(pageType)) { decision.action = "block"; decision.blockedBy = `Page type: ${pageType}`; } this.auditLog.push(decision); return decision; } }

Blocked vs. Allowed Domain Traffic

Red particles blocked by category rules, green particles navigate freely

AI Agent Database Pricing

Purpose-built domain databases for AI agent filtering. Includes IAB categories, 20+ page types, reputation scores, and popularity rankings. One-time purchase with perpetual license.

AI Agent Database
AI Agent Domain Database 10M
$7,999

10 Million Domains with Page-Type Intelligence

One-time purchase: Perpetual license  |  Optional Updates: $1,599/year

  • 10M+ Categorized Domains
  • IAB Taxonomies v2 & v3
  • 20+ Page Type Labels
  • Web Filtering Categories
  • OpenPageRank Scores
  • Global Popularity Rankings
Popular
AI Agent Domain Database 20M
$14,999

20 Million Domains with Full Intelligence Suite

One-time purchase: Perpetual license  |  Optional Updates: $2,999/year

  • 20M+ Categorized Domains
  • IAB Taxonomies v2 & v3
  • 20+ Page Type Labels
  • Web Filtering Categories
  • OpenPageRank Scores
  • Global & Country Rankings
  • Dedicated Account Manager
Maximum Coverage
AI Agent Domain Database 50M
$24,999

50 Million Domains with Complete Intelligence Suite

One-time purchase: Perpetual license  |  Optional Updates: $4,999/year

  • 50M+ Categorized Domains
  • IAB Taxonomies v2 & v3
  • 20+ Page Type Labels
  • Web Filtering Categories
  • OpenPageRank Scores
  • Global & Country Rankings
  • Dedicated Account Manager

Also available: Enterprise URL Database up to 102M domains from $2,499. View all database tiers →

How Many Domains in Each Category?

Search any IAB or Web Filtering category to see how many domains are in our 102M Enterprise Database — the same data your category blocking rules will reference.

Popular:
Database Analytics

Domain Distribution by Category in Our 102M Enterprise Database

How 102 million domains from our main Enterprise Database are distributed across IAB v3 taxonomy classifications

Top 50 IAB v3 Categories

Spanning Tier 1 through Tier 4 classifications from our 102M Enterprise Database

IAB v3

Charts display domain counts for the top 50 out of 700+ categories in our 102M Enterprise Database. To check the number of domains for the remaining 650+ categories, use the Category Counter tool above .

Category Blocking Heatmap

Blocked categories pulse red while allowed categories glow green

Building a Complete Category Blocking Architecture for Agentic Browsers

Agentic browsers represent the next evolution of AI agent deployment. Unlike API-only agents that interact with structured endpoints, agentic browsers render full web pages, execute JavaScript, click links, fill forms, and navigate complex multi-step workflows across the open internet. This capability unlocks enormous productivity — and introduces equally enormous risk. A browser-based agent performing competitive research might follow a series of links that leads from a legitimate industry blog to a domain hosting malware. Without category-based blocking, the agent cannot distinguish between these two destinations.

The fundamental challenge is that URL strings carry no semantic information. The domain "example-analytics-platform.com" could be a legitimate SaaS product, a phishing site impersonating an analytics vendor, or a newly registered domain with no content at all. Category-based blocking resolves this ambiguity by pre-associating every domain with structured metadata — content categories, web filtering labels, page types, and reputation indicators — that the agent's browser harness can evaluate in microseconds.

Designing Blocking Rulesets for Different Agent Personas

Not every agent should have the same blocking rules. A financial research agent needs access to banking, investment, and fintech domains that would be off-limits for a marketing content agent. A legal research agent needs access to court records, law firm websites, and regulatory databases that are irrelevant to a product development agent. Category-based blocking supports this diversity through agent personas — named configurations that map specific IAB categories and page types to allow/block/review actions.

Each persona defines three lists: an allowlist of IAB categories the agent is explicitly permitted to visit, a blocklist of web filtering categories that are unconditionally blocked (typically shared across all personas), and a review list of categories that trigger a hold-for-human-approval action. When the agent's browser session begins, the orchestrator loads the appropriate persona and applies its rules to every subsequent navigation event. Switching personas mid-session is supported but logged as an audit event to maintain accountability.

Category Blocking at the Browser Engine Level

The most effective implementation of category-based blocking intercepts navigation requests at the browser engine level, before the HTTP connection is established. In Playwright-based agent stacks, this means registering a route handler that intercepts every page.goto() call, extracts the target URL, queries the categorization database, and either allows the request to proceed or aborts it with a descriptive error. In Puppeteer, the equivalent mechanism uses page.setRequestInterception(true) combined with a request handler. In Selenium, a proxy server sits between the browser instance and the internet, performing the category check on every outbound request.

The key architectural requirement is that the blocking decision must happen synchronously in the navigation path. Asynchronous category checks that run after the page has already loaded provide audit logging but not protection — by the time the check completes, the agent has already rendered and potentially interacted with the target page. Pre-navigation interception ensures the agent never sees blocked content.

Handling Edge Cases: Redirects, Iframes, and Dynamic Content

Category blocking must account for several edge cases that simple URL matching cannot handle. HTTP redirects can take the agent from an allowed domain to a blocked domain in a single navigation event. The blocking engine must evaluate the final URL after all redirects resolve, not just the initial URL. This requires either following redirects manually before allowing navigation or intercepting each redirect hop individually.

Embedded iframes present another challenge: a page with an allowed category might contain an iframe loading content from a domain in a blocked category. Effective category blocking extends to all resource loads, not just top-level navigation. Similarly, JavaScript-initiated navigations (window.location changes, history.pushState, dynamically injected links) must be intercepted by the same category evaluation pipeline. A comprehensive implementation hooks into both the navigation API and the network request API to cover all entry points.

Performance Considerations for Real-Time Blocking

Category-based blocking sits directly in the agent's navigation critical path, which means latency matters enormously. Every millisecond of blocking-decision latency adds directly to the agent's task completion time. When an agent visits 200 pages per session, even 50ms per lookup adds 10 seconds of overhead. The 102M domain database deployed locally in Redis or an in-memory hash map delivers sub-millisecond lookups, keeping the total overhead under 200ms for an entire session.

For domains not in the local database — typically newly registered or extremely niche sites — the real-time API fallback adds approximately 200ms per lookup. To minimize the impact, implement a local cache for API responses: once a domain is classified via the API, cache the result for 24 to 72 hours so subsequent visits to the same domain hit the local cache instead of the API. This hybrid approach delivers the coverage of 102 million pre-classified domains plus the flexibility of on-demand classification for the long tail.

Audit Logging and Compliance Reporting

Every blocking decision should be logged with the full classification context: the target URL, the resolved category, the page type, the reputation score, the policy rule that matched, and the resulting action. This audit trail serves three purposes. First, it provides forensic evidence for incident investigation — if an agent accessed a domain that later turns out to be malicious, the log shows exactly when and why the access was allowed. Second, it enables policy refinement — by analyzing blocked URLs over time, security teams can identify false positives (legitimate domains incorrectly blocked) and false negatives (risky domains that slipped through). Third, it satisfies compliance requirements for regulated industries where all agent activity must be documented and auditable.

Category Blocking vs. URL Blocklists: Why Categories Win

Traditional URL blocklists — flat files containing specific domains to block — are brittle and incomplete. They require constant manual curation, they cannot block domains that have not yet been identified as problematic, and they provide no semantic context about why a domain is blocked. Category-based blocking addresses all three limitations. The classification is derived from the domain's actual content and metadata, not from a manually curated list. New domains are classified automatically when they appear in the database's quarterly updates. And the category label itself provides the semantic context that policy engines need to make nuanced decisions — "Gambling" tells the policy engine far more than a raw domain name ever could.

Who Benefits from Category-Based Browser Blocking

Enterprise security teams are the primary buyers. They already operate web proxies and CASBs that enforce category-based filtering for human users. Extending the same category enforcement to AI agents creates a unified security posture. Without this extension, agents become an unmonitored backdoor that bypasses every web filtering policy the organization has invested in.

Platform vendors building agentic browser products need built-in category blocking to satisfy enterprise procurement requirements. No Fortune 500 company will deploy an agentic browser that cannot enforce content category restrictions. The 102M domain database provides the classification data these platforms need without building their own classifier.

Managed AI service providers operating browser agents on behalf of clients need category blocking to demonstrate compliance with client policies. The blocking logs serve as proof that every agent navigation was evaluated against the client's approved category ruleset — a critical requirement for SOC 2, ISO 27001, and industry-specific compliance frameworks.

The Coverage Advantage: 102 Million Pre-Classified Domains

The effectiveness of category-based blocking scales directly with database coverage. A database covering 10 million domains will return "unknown" for a significant portion of URLs an agent encounters, forcing the policy engine to make default allow/block decisions without category context. Our 102M domain database covers 99.5% of the active internet as measured by the Google Chrome User Experience Report. This means that for virtually every domain an agentic browser will navigate to in normal operation, the category classification is already available — no API call needed, no classification delay, no ambiguity in the policy decision.

Agentic Browser Defense Layer

Category-based shields protecting every browser navigation event

Deploy Category-Based Blocking for Your Agentic Browsers

Enforce granular content category rules on every URL your browser agents visit. 102 million pre-classified domains, sub-millisecond lookups, deterministic policy enforcement.

View AI Agent Database View 102M Enterprise Database
Stay in the loop

You are on the list!

We will send you updates that matter — no spam.