WebsiteCategorizationAPI
Home
Demo Tools - Categorization
Website Categorization Text Classification URL Database Taxonomy Mapper
Demo Tools - Website Intel
Technology Detector Quality Score Competitor Finder
Demo Tools - Brand Safety
Brand Safety Checker Brand Suitability Quality Checker
Demo Tools - Content
Sentiment Analyzer Context Aware Ads
Resources
API Documentation Pricing Login
Try Categorization

A Blocklist API That Restricts AI Agents by Domain Category

Autonomous AI agents need hard boundaries. A blocklist API powered by domain categorization data lets you define which website categories, page types, and reputation tiers are off-limits — and enforce those restrictions at the API layer before the agent's HTTP request ever fires. No prompt engineering. No model-based guessing. Deterministic blocking backed by 102 million classified domains.

Block
Dangerous Categories
102M
Classified Domains
<1ms
Block Decision
20+
Page Types

The Problem: Agents Without Blocklists Visit Everything

An AI agent with unrestricted web access is a liability. Without a blocklist, the agent treats every domain on the internet as fair game — including malware sites, adult content, login portals, and internal admin panels.

The Cost of Not Having a Domain Blocklist

When you deploy an AI agent to perform web-based tasks — competitive research, lead generation, market analysis, content aggregation — that agent will follow links, chase redirects, and explore domains that no human would visit intentionally. Without a domain-level blocklist, you are relying entirely on the agent's language model to make safety decisions about URLs. This is fundamentally unreliable because language models were not trained to evaluate domain safety — they were trained to generate text.

  • Malware exposure: Agents following links from web search results can land on domains serving drive-by downloads, cryptominers, or browser exploits — infecting the agent's runtime environment
  • Credential harvesting: Phishing domains that mimic legitimate services trick agents into submitting form data, potentially leaking API keys or session tokens embedded in the agent's context
  • Brand safety violations: An enterprise agent that visits adult, gambling, or extremist content creates liability for the organization — even if the agent was simply following a redirect chain
  • Regulatory non-compliance: Agents accessing domains in sanctioned jurisdictions, or domains hosting regulated content (financial advice, medical guidance), can trigger compliance violations

The Solution: An API-Driven Blocklist Powered by Domain Categorization

Our blocklist API wraps a 102 million domain categorization database in a simple REST interface. Before an agent navigates to any URL, your middleware sends the domain to the blocklist API. The API returns the domain's IAB content category, web filtering category, page type, and reputation score. Your policy engine evaluates the response against your blocklist rules and returns an allow or block decision — all in under 200ms for API calls, or under 1ms when using the local database.

The blocklist is fully configurable. Block entire IAB categories (Adult, Illegal Content, Gambling). Block specific page types (login, checkout, admin, settings). Block domains below a reputation threshold. Block domains outside a specific country or language. The rules are declarative — defined in a policy file, not embedded in prompts — which means they are auditable, version-controlled, and consistent across every agent in your fleet.

Blocklist API Gateway

Domain requests filtered through category-based blocking rules

How the Blocklist API Enforces Domain Restrictions

Three blocking strategies that turn domain categorization into agent-level access control

Category-Based Blocking

Define blocklist rules using IAB content taxonomy categories. Block all domains classified as "Adult Content," "Illegal Activities," "Malware/Spyware," or "Gambling." The API evaluates every URL against your category blocklist and returns a block decision with the triggering category. No ambiguity, no false negatives — if the domain is classified in a blocked category, the agent cannot visit it.

Page-Type Blocking

Block specific page types regardless of domain category. Login pages, checkout pages, admin panels, and settings pages represent high-risk navigation targets for AI agents. Even on a trusted domain, an agent that reaches a login page can trigger security alerts, lock accounts, or inadvertently submit credentials. Page-type blocking prevents these scenarios at the API layer.

Reputation-Based Blocking

Set minimum reputation thresholds for agent navigation. Domains with low PageRank scores, no global ranking, or minimal web presence are more likely to be spam, parked, or malicious. The API returns reputation signals for every domain, and your policy engine can block any domain that falls below your configured trust threshold.

Block/Allow Decision Engine

Real-time domain evaluation against multi-layer blocklist rules

Blocklist API Integration Code

Production-ready snippets to implement domain blocking in your agent harness

Python — Category-Based Blocklist Middleware

import http.client import json from urllib.parse import urlparse class BlocklistAPI: """API-driven blocklist for AI agent domain restriction.""" def __init__(self, api_key, blocklist_config): self.api_key = api_key self.config = blocklist_config self.conn = http.client.HTTPSConnection( "www.websitecategorizationapi.com" ) self.decision_log = [] def check_domain(self, target_url): domain = urlparse(target_url).netloc payload = ( f"query={domain}" f"&api_key={self.api_key}" f"&data_type=domain" f"&expanded_categories=1" ) headers = {"Content-Type": "application/x-www-form-urlencoded"} self.conn.request("POST", "/api/iab/iab_web_content_filtering.php", payload, headers) res = self.conn.getresponse() return json.loads(res.read().decode("utf-8")) def evaluate(self, target_url): data = self.check_domain(target_url) # Extract categories iab_cats = [ c[0].split("Category name: ")[1] for c in data.get("iab_classification", []) ] filter_cat = ( data.get("filtering_taxonomy", [[""]])[0][0] .split("Category name: ")[-1] ) page_type = data.get("page_type", "unknown") # Check category blocklist for cat in iab_cats: if cat in self.config["blocked_categories"]: return self._block(target_url, f"Blocked category: {cat}") # Check web filter blocklist if filter_cat in self.config["blocked_filters"]: return self._block(target_url, f"Blocked filter: {filter_cat}") # Check page type blocklist if page_type in self.config["blocked_page_types"]: return self._block(target_url, f"Blocked page type: {page_type}") return self._allow(target_url, iab_cats, page_type) def _block(self, url, reason): decision = {"url": url, "action": "BLOCK", "reason": reason} self.decision_log.append(decision) return decision def _allow(self, url, categories, page_type): decision = {"url": url, "action": "ALLOW", "categories": categories, "page_type": page_type} self.decision_log.append(decision) return decision # Configuration blocklist = BlocklistAPI("your_api_key", { "blocked_categories": ["Adult", "Illegal Content", "Malware"], "blocked_filters": ["Adult", "Gambling", "Weapons"], "blocked_page_types": ["login", "checkout", "admin", "settings"] }) result = blocklist.evaluate("https://example.com/admin/panel") print(result) # {"action": "BLOCK", "reason": "Blocked page type: admin"}

JavaScript — Express.js Blocklist Endpoint

const express = require("express"); const app = express(); app.use(express.json()); const BLOCKED_CATEGORIES = new Set([ "Adult", "Illegal Content", "Malware", "Phishing", "Gambling" ]); const BLOCKED_PAGE_TYPES = new Set([ "login", "checkout", "admin", "settings", "signup" ]); app.post("/api/blocklist/check", async (req, res) => { const { url, agentId } = req.body; const classification = await fetch( "https://www.websitecategorizationapi.com" + "/api/iab/iab_web_content_filtering.php", { method: "POST", headers: { "Content-Type": "application/x-www-form-urlencoded" }, body: new URLSearchParams({ query: url, api_key: process.env.CATEGORIZATION_API_KEY, data_type: "url", expanded_categories: "1" }) } ).then(r => r.json()); const filterCat = classification.filtering_taxonomy?.[0]?.[0] ?.replace("Category name: ", "") || "Unknown"; const pageType = classification.page_type || "unknown"; if (BLOCKED_CATEGORIES.has(filterCat)) { return res.json({ action: "BLOCK", reason: `Category: ${filterCat}` }); } if (BLOCKED_PAGE_TYPES.has(pageType)) { return res.json({ action: "BLOCK", reason: `Page type: ${pageType}` }); } return res.json({ action: "ALLOW", category: filterCat, pageType: pageType, agentId: agentId }); }); app.listen(3000, () => console.log("Blocklist API on :3000"));

Domain Firewall Barrier

Blocked domains deflected — approved domains pass through

AI Agent Database Pricing

Purpose-built domain databases for AI agent filtering. Includes IAB categories, 20+ page types, reputation scores, and popularity rankings. One-time purchase with perpetual license.

AI Agent Database
AI Agent Domain Database 10M
$7,999

10 Million Domains with Page-Type Intelligence

One-time purchase: Perpetual license  |  Optional Updates: $1,599/year

  • 10M+ Categorized Domains
  • IAB Taxonomies v2 & v3
  • 20+ Page Type Labels
  • Web Filtering Categories
  • OpenPageRank Scores
  • Global Popularity Rankings
Popular
AI Agent Domain Database 20M
$14,999

20 Million Domains with Full Intelligence Suite

One-time purchase: Perpetual license  |  Optional Updates: $2,999/year

  • 20M+ Categorized Domains
  • IAB Taxonomies v2 & v3
  • 20+ Page Type Labels
  • Web Filtering Categories
  • OpenPageRank Scores
  • Global & Country Rankings
  • Dedicated Account Manager
Maximum Coverage
AI Agent Domain Database 50M
$24,999

50 Million Domains with Complete Intelligence Suite

One-time purchase: Perpetual license  |  Optional Updates: $4,999/year

  • 50M+ Categorized Domains
  • IAB Taxonomies v2 & v3
  • 20+ Page Type Labels
  • Web Filtering Categories
  • OpenPageRank Scores
  • Global & Country Rankings
  • Dedicated Account Manager

Also available: Enterprise URL Database up to 102M domains from $2,499. View all database tiers →

How Many Domains in Each Category?

Search any IAB or Web Filtering category to see how many domains your blocklist API would cover from our 102M Enterprise Database.

Popular:
Database Analytics

Domain Distribution by Category in Our 102M Enterprise Database

How 102 million domains from our main Enterprise Database are distributed across IAB v3 taxonomy classifications

Top 50 IAB v3 Categories

Spanning Tier 1 through Tier 4 classifications from our 102M Enterprise Database

IAB v3

Charts display domain counts for the top 50 out of 700+ categories in our 102M Enterprise Database. To check the number of domains for the remaining 650+ categories, use the Category Counter tool above .

Blocklist Rules Cascade

Category rules, page-type rules, and reputation rules evaluated in sequence

Building a Production-Grade Blocklist API for AI Agents

A blocklist API for AI agents is not a simple keyword filter. It is a policy enforcement layer that must handle millions of domain lookups per day, return decisions in single-digit milliseconds, maintain consistent classifications across time, and produce audit-grade logs for every block and allow decision. Building this system on top of a pre-classified domain database gives you a head start — the classification work is already done, and your engineering effort focuses on the policy logic and the API infrastructure.

The architecture has three components: a domain classification store (the 102M domain database loaded into Redis or PostgreSQL), a policy engine (a rules evaluator that maps classifications to block/allow decisions), and an API layer (a REST or gRPC endpoint that agents call before every navigation event). The database provides the facts. The policy engine provides the rules. The API provides the interface.

Category-Level Blocking: The First Layer of Defense

Category-level blocking is the broadest and most impactful layer. By blocking entire IAB categories, you eliminate large swaths of the internet that agents should never visit. The standard enterprise blocklist includes Adult Content, Illegal Activities, Malware and Spyware, Phishing, Spam, Gambling, Weapons, Drugs, and Hate Speech. These categories represent domains that are dangerous, inappropriate, or legally problematic regardless of the agent's task.

Beyond the universal blocklist, task-specific category blocks add a second layer. A financial research agent might be blocked from Shopping and Entertainment categories to keep it focused on financial sources. A content marketing agent might be blocked from Competitor sites to prevent accidental content scraping. The blocklist API accepts category rules as configuration — no code changes required to add or remove blocked categories.

Page-Type Blocking: Preventing High-Risk Interactions

Even on trusted domains, certain page types are dangerous for autonomous agents. A login page on your company's own website is a trusted domain — but an agent that navigates to that login page and attempts to interact with it could lock accounts, trigger security alerts, or expose credentials. Page-type blocking operates independently of category blocking: even if the domain category is "allow," the page type can trigger a block.

The critical page types to block for most agent deployments are: login, signup, checkout, payment, settings, admin, dashboard, and account management pages. Our database classifies pages into 20+ distinct types, giving your blocklist API granular control over which interactions are permitted and which are forbidden.

Reputation-Based Blocking: Filtering Low-Trust Domains

Not every dangerous domain fits neatly into a blocked category. Newly registered domains, parked pages, and domains with no web presence are often used for spam, phishing, or malware distribution — but they may not yet be classified into explicit danger categories. Reputation-based blocking catches these domains by setting minimum thresholds for PageRank scores and global popularity rankings. A domain with a PageRank of 0 and no global ranking is statistically much more likely to be malicious than a domain with established authority and traffic.

Local Database vs. Remote API: Choosing Your Blocklist Architecture

The blocklist API can operate in two modes. In local mode, the full 102M domain database is loaded into a local data store (Redis, PostgreSQL, SQLite), and the blocklist API performs lookups against this local store. Local mode delivers sub-millisecond response times and zero external dependencies. In remote mode, the blocklist API calls our classification API for each domain lookup. Remote mode requires no local data storage but adds 100-200ms of network latency per lookup.

The recommended architecture is hybrid: use the local database for the 99.5% of domains it covers, and fall back to the remote API for the 0.5% of domains not in the local store. This hybrid approach delivers the speed of local lookups with the coverage of real-time classification.

Blocklist API Design Patterns for Agent Frameworks

Different agent frameworks require different integration patterns for the blocklist API. In LangChain, the blocklist check is implemented as a Tool wrapper that intercepts the browsing tool's URL input, queries the blocklist API, and either passes the URL through or raises a ToolException that the agent handles gracefully. In CrewAI, the blocklist integrates as a pre-task hook that validates all URLs in the task's context before execution begins. In custom agent frameworks, the blocklist API is called as middleware in the HTTP client layer — every outbound request passes through the blocklist check before the TCP connection is established.

Monitoring and Alerting on Blocklist Activity

A blocklist API generates valuable security telemetry. Every block decision reveals an attempt by an agent to visit a restricted domain. Patterns in block decisions can reveal prompt injection attacks (where malicious instructions direct agents to dangerous URLs), misconfigured agent tasks (where the agent's research scope is too broad), or emerging threats (where previously uncategorized domains start appearing in agent navigation attempts). Feed blocklist logs into your SIEM or observability platform, and set alerts for unusual block rates, new blocked domains, and attempts to access high-severity categories like Malware or Phishing.

Why Database-Backed Blocklists Beat Model-Based Filtering

Some teams attempt to build blocklist logic directly into their agent's system prompt: "Do not visit adult websites. Do not visit login pages. Do not visit malware domains." This approach fails for three reasons. First, the language model has no reliable mechanism to determine whether a URL is an adult site, a login page, or a malware domain — it can only guess based on the URL string. Second, prompt-based rules are bypassable through prompt injection — a malicious website can include instructions that override the agent's system prompt. Third, prompt-based filtering produces no audit trail — there is no structured log of what was blocked or why.

A database-backed blocklist API solves all three problems. The database provides ground-truth classifications based on actual website content analysis, not URL string patterns. The API operates outside the model's context window, making it immune to prompt injection. And every block decision is logged with full context — domain, category, page type, policy rule, and timestamp — creating an auditable record of every filtering decision.

Threat Deflection System

Malicious domains intercepted and neutralized at the API layer

Build Your Agent Blocklist on Classified Domain Data

Deploy a blocklist API backed by 102 million pre-classified domains. Category blocking, page-type blocking, and reputation filtering — all in one database. One-time purchase, perpetual license.

View AI Agent Database View 102M Enterprise Database
Stay in the loop

You are on the list!

We will send you updates that matter — no spam.