AI agents are browsing the web autonomously, but without a denylist layer they have no guardrails preventing visits to harmful, sensitive, or policy-violating domains. Our API-first denylist service leverages a 102 million domain database with IAB categories, page-type labels, and reputation signals to give your agent harness real-time deny/allow decisions on every URL before navigation happens.
Human-curated blocklists worked when employees browsed the web at human speed. Autonomous agents visit hundreds of URLs per minute, and manual lists are always outdated.
Traditional denylist approaches rely on security teams manually adding domains to a spreadsheet or a firewall rule. This model collapses when an AI agent enters the picture. An agent tasked with "research supply chain risks across 500 vendors" will generate hundreds of unique URL visits within minutes. Each visit to an uncategorized domain is a blind spot your denylist never anticipated. The problem compounds across multiple agents running concurrently — a fleet of ten agents can generate thousands of navigation events per hour, each one a potential policy violation that your static list never accounted for.
Instead of maintaining a list of individual domains, define deny rules at the category level. Block all domains classified as "Adult Content" — that single rule covers over 2 million domains in our database. Block all pages with type "login" or "checkout" — that rule prevents agent interaction with authentication and payment flows across the entire internet. The API resolves any URL against our 102 million domain database and returns the category, page type, and reputation data your deny engine needs to make an instant decision.
This is not a static file you download and forget. The API provides real-time classification with sub-200ms latency, backed by a database that receives quarterly refreshes. Your deny rules stay expressed as categories, and the database handles the mapping from categories to the millions of individual domains they encompass. You define policy once; we maintain the domain-to-category mappings at scale.
Three integration models for embedding category-based deny rules into your agent architecture
Instead of listing individual domains, express your denylist as a set of blocked categories. "Block Adult," "Block Malware," "Block Gambling" — each rule covers tens of thousands to millions of domains automatically. When the API classifies a URL, your agent checks the returned category against your deny set. If it matches, navigation is blocked before the HTTP request fires. No manual domain enumeration required.
Beyond content categories, deny specific page types regardless of the domain. Block all "login" pages to prevent credential exposure. Block "checkout" pages to prevent unauthorized purchases. Block "settings" and "admin" pages to prevent configuration changes. Page-type rules apply universally — even if a domain itself is allowed, a blocked page type on that domain still triggers the deny action.
Add reputation scoring to your deny logic. Domains with low PageRank scores, no global popularity ranking, or recently registered WHOIS dates can be automatically denied. This catches the long tail of suspicious domains that may not yet be categorized as malicious but exhibit patterns consistent with phishing, typosquatting, or disposable infrastructure used for credential harvesting.
Production-ready snippets for building a category-driven denylist into your agent pipeline
import http.client
import json
class DenylistAPI:
"""Category-driven denylist that blocks URLs by IAB category
and page type before the agent navigates."""
DENIED_CATEGORIES = [
"Adult", "Illegal Content", "Malware",
"Gambling", "Weapons", "Drugs"
]
DENIED_PAGE_TYPES = ["login", "checkout", "admin", "settings"]
MIN_REPUTATION_SCORE = 2 # Block low-reputation domains
def __init__(self, api_key):
self.api_key = api_key
self.conn = http.client.HTTPSConnection(
"www.websitecategorizationapi.com"
)
def check_denylist(self, target_url):
payload = (
f"query={target_url}"
f"&api_key={self.api_key}"
f"&data_type=url"
f"&expanded_categories=1"
)
headers = {
"Content-Type": "application/x-www-form-urlencoded"
}
self.conn.request(
"POST",
"/api/iab/iab_web_content_filtering.php",
payload,
headers
)
res = self.conn.getresponse()
data = json.loads(res.read().decode("utf-8"))
# Extract classification fields
categories = [
c[0].split("Category name: ")[1]
for c in data.get("iab_classification", [])
]
page_type = data.get("page_type", "unknown")
reputation = data.get("open_page_rank", 0)
# Check category deny rules
for cat in categories:
for denied in self.DENIED_CATEGORIES:
if denied.lower() in cat.lower():
return {
"action": "deny",
"reason": f"Category denied: {cat}",
"url": target_url,
"rule_type": "category"
}
# Check page-type deny rules
if page_type in self.DENIED_PAGE_TYPES:
return {
"action": "deny",
"reason": f"Page type denied: {page_type}",
"url": target_url,
"rule_type": "page_type"
}
# Check reputation threshold
if reputation < self.MIN_REPUTATION_SCORE:
return {
"action": "deny",
"reason": f"Low reputation: {reputation}",
"url": target_url,
"rule_type": "reputation"
}
return {"action": "allow", "url": target_url}
# Usage in agent harness
denylist = DenylistAPI(api_key="your_api_key")
result = denylist.check_denylist("https://sketchy-site.com/login")
if result["action"] == "deny":
print(f"Blocked: {result['reason']}")
class AgentDenylistGateway {
constructor(apiKey, denyConfig) {
this.apiKey = apiKey;
this.deniedCategories = denyConfig.categories || [];
this.deniedPageTypes = denyConfig.pageTypes || [];
this.minReputation = denyConfig.minReputation || 0;
}
async evaluate(targetURL) {
const response = await fetch(
"https://www.websitecategorizationapi.com" +
"/api/iab/iab_web_content_filtering.php",
{
method: "POST",
headers: {
"Content-Type": "application/x-www-form-urlencoded"
},
body: new URLSearchParams({
query: targetURL,
api_key: this.apiKey,
data_type: "url",
expanded_categories: "1"
})
}
);
const data = await response.json();
const filterCat =
data.filtering_taxonomy?.[0]?.[0]
?.replace("Category name: ", "") || "Unknown";
const pageType = data.page_type || "unknown";
// Check deny rules in priority order
if (this.deniedCategories.includes(filterCat)) {
return { action: "deny", rule: "category", detail: filterCat };
}
if (this.deniedPageTypes.includes(pageType)) {
return { action: "deny", rule: "page_type", detail: pageType };
}
return { action: "allow", url: targetURL };
}
}
// Configure and use
const gateway = new AgentDenylistGateway("your_api_key", {
categories: ["Adult", "Malware", "Gambling"],
pageTypes: ["login", "checkout", "admin"],
minReputation: 3
});
const verdict = await gateway.evaluate("https://example.com/admin");
Purpose-built domain databases for AI agent filtering. Includes IAB categories, 20+ page types, reputation scores, and popularity rankings. One-time purchase with perpetual license.
10 Million Domains with Page-Type Intelligence
One-time purchase: Perpetual license | Optional Updates: $1,599/year
20 Million Domains with Full Intelligence Suite
One-time purchase: Perpetual license | Optional Updates: $2,999/year
50 Million Domains with Complete Intelligence Suite
One-time purchase: Perpetual license | Optional Updates: $4,999/year
Also available: Enterprise URL Database up to 102M domains from $2,499. View all database tiers →
Search any IAB or Web Filtering category to see how many domains are in our 102M Enterprise Database — the same data your denylist rules will reference.
How 102 million domains from our main Enterprise Database are distributed across IAB v3 taxonomy classifications
Spanning Tier 1 through Tier 4 classifications from our 102M Enterprise Database
Charts display domain counts for the top 50 out of 700+ categories in our 102M Enterprise Database. To check the number of domains for the remaining 650+ categories, use the Category Counter tool above .
The concept of a denylist is as old as web filtering itself. The earliest web proxies maintained text files with lists of blocked domains, updated weekly by security analysts. This approach worked when the average employee visited fifty websites per day and the internet had a few million active sites. It does not work when an autonomous AI agent can visit fifty websites per minute and the active internet exceeds two billion domains. The fundamental limitation of domain-level denylists is that they scale linearly — every new domain that should be blocked requires a new entry, a human decision, and a deployment cycle.
Category-based denylists solve this scaling problem by inverting the model. Instead of listing individual domains, you list the categories of content you want blocked. A single rule — "deny all Adult content" — immediately covers the 2.3 million adult domains in our database, plus every newly classified domain that falls into that category in future updates. The denylist grows automatically as the database grows, without any action from your team.
A well-structured denylist rule contains three components: the matching criterion, the action, and the audit metadata. The matching criterion specifies what triggers the rule — an IAB category, a web filtering category, a page type, a reputation threshold, or a combination. The action specifies what happens when the rule matches — hard block (the agent receives an error and cannot navigate), soft block (the agent is warned and the event is logged, but navigation proceeds), or redirect (the agent is sent to a safe fallback URL). The audit metadata captures the timestamp, the agent identity, the URL that was evaluated, and the rule that fired.
This three-part structure enables granular policy enforcement without creating overly complex rule sets. A typical enterprise deployment uses between ten and thirty deny rules that collectively cover millions of domains. The most common rules target IAB categories like "Adult Content," "Illegal Content," and "Sensitive Topics," web filtering categories like "Malware," "Phishing," and "Spam," and page types like "login," "checkout," "admin," and "settings."
An effective denylist for AI agents is not a static file that ships with the agent binary. It is a service — an API endpoint that the agent's middleware calls before every navigation event. The API-first approach provides three critical advantages that static files cannot match. First, centralized policy management: update a deny rule once in the API, and every agent in your fleet immediately enforces it. No redeployment, no file distribution, no version drift between agents. Second, real-time classification: when the agent encounters a domain not in its local cache, the API classifies it on demand and evaluates it against the current deny rules before the agent navigates. Third, audit logging: every deny decision flows through a single API, creating a centralized audit trail that security and compliance teams can query, alert on, and report from.
Our database provides two complementary classification systems. The IAB Content Taxonomy v3 organizes domains into 700+ content categories across four hierarchical tiers — from broad categories like "Technology & Computing" down to narrow topics like "Artificial Intelligence > Machine Learning > Natural Language Processing." The Web Filtering taxonomy provides security-focused categories like "Malware," "Phishing," "Adult," "Gambling," "Weapons," and "Drugs." For denylist purposes, the two systems serve different needs. IAB categories are ideal for content policy enforcement — blocking agents from browsing competitor websites, political content, or off-topic domains. Web filtering categories are ideal for security policy enforcement — blocking agents from visiting known-malicious domains, adult content, or illegal marketplaces.
The most robust deny configurations use both systems simultaneously. A financial services company might deny all Web Filtering categories related to security threats (Malware, Phishing, Spam) AND deny IAB categories unrelated to financial research (Entertainment, Sports, Gaming) AND deny page types associated with data entry (login, checkout, contact forms). This layered approach creates a multi-dimensional deny surface that is far more effective than any single classification system alone.
Every denylist system must account for false positives — legitimate domains or pages that are incorrectly blocked by a deny rule. In human web filtering, false positives result in an employee submitting an unblock request and waiting hours for IT to review it. In agent web filtering, false positives halt the agent's workflow immediately, potentially causing task failure. The cost of a false positive in agent filtering is therefore much higher than in human filtering, and the denylist architecture must be designed to minimize them.
Three strategies reduce false positive impact. First, use allow-overrides: maintain a short allowlist of specific domains that are explicitly permitted regardless of their category. If a deny rule would block a known-good domain, the allowlist takes precedence. Second, implement soft-block mode for ambiguous categories: instead of hard-blocking categories like "News" or "Social Media" that might contain both relevant and irrelevant content, log the visit and flag it for review without halting the agent. Third, leverage page-type context: a domain categorized as "Social Media" might be denied generally, but if the specific page is a "documentation" or "API reference" page type, the page-type signal can override the category deny rule.
Enterprise deployments rarely run a single agent. They run fleets of agents with different roles, permissions, and task scopes. A research agent needs broader web access than a customer service agent. A financial analysis agent needs access to banking domains that a marketing agent should never visit. The denylist API must support role-based deny configurations — different deny rule sets for different agent identities.
The implementation pattern is straightforward: each agent identity maps to a deny profile, and the deny profile specifies the set of blocked categories, page types, and reputation thresholds for that role. When the agent calls the denylist API, it passes its identity along with the target URL. The API evaluates the URL against the deny profile for that specific agent identity and returns the appropriate action. This enables fine-grained, per-agent deny policies without duplicating infrastructure or creating separate API endpoints for each agent.
Latency is the critical performance metric for a denylist API because it sits directly in the agent's navigation path. Every millisecond the API adds to the navigation decision is a millisecond the agent waits before proceeding. Our API delivers sub-200ms response times for real-time classification requests. For organizations that need even lower latency, the database download option enables local lookups in under 1ms by loading the full 102M domain dataset into Redis, PostgreSQL, or SQLite.
Throughput is the secondary concern. A fleet of ten agents, each making two hundred URL visits per hour, generates two thousand denylist evaluations per hour. The API handles this volume without rate limiting on standard enterprise plans. For burst workloads — such as an agent performing a large-scale web scrape — the local database approach eliminates API throughput concerns entirely, as the lookup happens against a local data store with no network overhead.
The primary buyers of denylist APIs for agent browsing fall into four categories. Enterprise security teams deploying browser-using agents need to extend their existing web filtering policies to cover AI agents. They already block malicious and inappropriate content for employees; the denylist API gives them the same control over autonomous agents. Platform vendors building agent orchestration tools need built-in deny capabilities to make their platforms enterprise-ready — without a denylist layer, their product cannot pass enterprise security review. Managed service providers running agents on behalf of clients need configurable deny rules per client, with audit logs that prove compliance with each client's acceptable use policy. Compliance teams in regulated industries need to demonstrate that AI agents cannot access categories of content prohibited by their regulatory framework — a denylist API with audit logging provides the evidence trail for regulatory audits.
Stop maintaining static blocklists. Define deny rules by category and let a 102 million domain database handle the rest. One-time purchase, perpetual license, instant coverage.