Knowing that a domain belongs to "Business and Finance" tells you what the site is about. Knowing that the specific page your agent is about to visit is a login page, a checkout flow, or an admin panel tells you what the agent could do there. Page-type classification adds a critical dimension to agent policy — enabling rules like "allow finance sites but block their login pages" that IAB categories alone cannot express.
Blocking an entire category blocks thousands of useful pages along with the few dangerous ones.
A domain classified as "Business and Finance > Banking" hosts dozens of page types: marketing pages, product descriptions, rate calculators, customer support FAQ, branch locator maps, and — critically — login portals, account dashboards, fund transfer interfaces, and admin panels. Category-level filtering treats all of these pages identically. If you allow the "Banking" category, you allow login pages. If you block it, you lose access to publicly available rate information, branch locations, and financial product comparisons that your agent legitimately needs for research.
Page-type classification adds a second axis to your agent policy. Instead of "allow or block this category," you can write rules like "allow Business and Finance domains except login, checkout, and admin page types." This surgical precision means your agent can research banking products, compare interest rates, and read financial news — while being blocked from authentication portals, payment flows, and administrative interfaces on those same domains.
Our database classifies pages into 20+ distinct types: homepage, about, contact, pricing, careers, login, signup, checkout, settings, admin, account, password_reset, legal, privacy_policy, terms_of_service, blog, documentation, api_reference, support, faq, forum, and product pages. Each type maps to a specific risk level and a recommended policy action, giving your policy engine the granularity it needs for production agent deployments.
Three risk tiers that organize page types into clear policy categories
Page types that represent interactive surfaces where agent action could cause harm. Login and signup pages involve authentication — an agent may attempt to enter credentials. Checkout and payment pages involve financial transactions. Admin and settings pages provide control over system configuration. Password reset pages could trigger security workflows at the target organization. These page types should be hard-blocked for all agents regardless of category scope.
Page types that are generally safe for reading but may contain sensitive information. Account pages display personal data. Contact pages contain organizational information that could be used for social engineering. Careers pages reveal organizational structure. Legal, privacy policy, and terms of service pages contain binding language. These types are allowed but logged with enhanced detail for audit purposes.
Page types designed for public consumption and information sharing. Homepage, about, blog, documentation, api_reference, support, faq, forum, product, and pricing pages are built for visitors — including automated ones. These types are allowed with standard logging. They represent the vast majority of pages an agent will encounter during legitimate research tasks.
Implement granular page-type rules in your agent's navigation pipeline
import http.client
import json
class PageTypePolicyEngine:
"""Policy engine that combines IAB categories with
page-type labels for granular agent navigation rules."""
# Page types grouped by risk tier
BLOCK_TYPES = {
"login", "signup", "checkout", "admin",
"settings", "password_reset", "account"
}
MONITOR_TYPES = {
"contact", "careers", "legal",
"privacy_policy", "terms_of_service"
}
ALLOW_TYPES = {
"homepage", "about", "blog", "documentation",
"api_reference", "support", "faq", "forum",
"product", "pricing"
}
def __init__(self, api_key):
self.api_key = api_key
self.conn = http.client.HTTPSConnection(
"www.websitecategorizationapi.com"
)
def classify(self, url):
domain = url.split("//")[-1].split("/")[0]
payload = (
f"query={url}"
f"&api_key={self.api_key}"
f"&data_type=url"
f"&expanded_categories=1"
)
headers = {
"Content-Type": "application/x-www-form-urlencoded"
}
self.conn.request(
"POST",
"/api/iab/iab_web_content_filtering.php",
payload, headers
)
return json.loads(
self.conn.getresponse().read().decode("utf-8")
)
def evaluate(self, url):
"""Two-dimensional policy evaluation:
category scope + page type risk tier."""
data = self.classify(url)
page_type = data.get("page_type", "unknown")
categories = [
c[0].split("Category name: ")[1]
for c in data.get("iab_classification", [])
]
# Page-type takes priority over category
if page_type in self.BLOCK_TYPES:
return {
"action": "block",
"reason": f"High-risk page type: "
f"{page_type}",
"page_type": page_type,
"categories": categories,
"risk_tier": "high"
}
if page_type in self.MONITOR_TYPES:
return {
"action": "allow_monitored",
"reason": f"Medium-risk page type: "
f"{page_type}",
"page_type": page_type,
"categories": categories,
"risk_tier": "medium"
}
return {
"action": "allow",
"reason": f"Safe page type: {page_type}",
"page_type": page_type,
"categories": categories,
"risk_tier": "low"
}
# Usage
engine = PageTypePolicyEngine(api_key="your_api_key")
urls = [
"https://bank.com/personal/savings",
"https://bank.com/login",
"https://bank.com/admin/dashboard",
"https://techblog.com/articles/ai-trends",
]
for url in urls:
result = engine.evaluate(url)
print(
f"[{result['action'].upper()}] {url} "
f"(type={result['page_type']}, "
f"risk={result['risk_tier']})"
)
class PageTypePolicyGuard {
constructor(apiKey) {
this.apiKey = apiKey;
this.blockTypes = new Set([
"login", "signup", "checkout",
"admin", "settings", "password_reset"
]);
this.monitorTypes = new Set([
"contact", "careers", "legal",
"privacy_policy", "terms_of_service"
]);
}
async classifyAndEvaluate(targetURL) {
const resp = await fetch(
"https://www.websitecategorizationapi.com" +
"/api/iab/iab_web_content_filtering.php",
{
method: "POST",
headers: {
"Content-Type":
"application/x-www-form-urlencoded"
},
body: new URLSearchParams({
query: targetURL,
api_key: this.apiKey,
data_type: "url",
expanded_categories: "1"
})
}
);
const data = await resp.json();
const pageType = data.page_type || "unknown";
if (this.blockTypes.has(pageType)) {
return {
allowed: false,
action: "block",
pageType: pageType,
reason: `High-risk: ${pageType} page`
};
}
const monitored =
this.monitorTypes.has(pageType);
return {
allowed: true,
action: monitored
? "allow_monitored" : "allow",
pageType: pageType,
reason: monitored
? `Monitored: ${pageType} page`
: `Safe: ${pageType} page`
};
}
}
// Guard agent navigation with page-type awareness
const guard = new PageTypePolicyGuard("your_key");
const decision = await guard.classifyAndEvaluate(
"https://saas-app.com/settings/billing"
);
console.log(decision);
// { allowed: false, action: "block",
// pageType: "settings",
// reason: "High-risk: settings page" }
Purpose-built domain databases for AI agent filtering. Includes IAB categories, 20+ page types, reputation scores, and popularity rankings. One-time purchase with perpetual license.
10 Million Domains with Page-Type Intelligence
One-time purchase: Perpetual license | Optional Updates: $1,599/year
20 Million Domains with Full Intelligence Suite
One-time purchase: Perpetual license | Optional Updates: $2,999/year
50 Million Domains with Complete Intelligence Suite
One-time purchase: Perpetual license | Optional Updates: $4,999/year
Also available: Enterprise URL Database up to 102M domains from $2,499. View all database tiers →
Search any IAB or Web Filtering category to see how many domains are in our 102M Enterprise Database — combined with page-type labels, these categories power your agent policy rules.
How 102 million domains from our main Enterprise Database are distributed across IAB v3 taxonomy classifications
Spanning Tier 1 through Tier 4 classifications from our 102M Enterprise Database
Charts display domain counts for the top 50 out of 700+ categories in our 102M Enterprise Database. To check the number of domains for the remaining 650+ categories, use the Category Counter tool above .
Most conversations about AI agent web governance focus on domain categories — blocking adult sites, restricting access to malware domains, limiting agents to specific industry verticals. These category-level controls are necessary but insufficient. The real risk in agent navigation is not the domain; it is the page. An agent visiting bloomberg.com is fine. An agent visiting bloomberg.com/login is a problem. An agent visiting your-internal-app.com/admin is a crisis. Page-type classification bridges this gap by labeling each page with its functional purpose, enabling policies that operate at the page level rather than the domain level.
Consider a practical scenario: your financial research agent needs to access banking websites to compare interest rates, analyze product offerings, and track market trends. Without page-type classification, you have two choices — allow the entire "Banking" category (and accept that the agent might hit login pages) or block it (and lose legitimate research capabilities). With page-type classification, you write a single rule: "allow Banking category, block page types login, checkout, admin, settings." The agent can read publicly available product pages, blog posts, and rate tables while being prevented from interacting with authentication flows, payment systems, and administrative interfaces.
Our database classifies pages into the following functional types, each representing a distinct interaction pattern and risk profile. Homepage: the main landing page of a domain, designed for public visitors. About: organizational information pages. Contact: pages with contact forms, email addresses, and physical addresses. Pricing: pages displaying product or service pricing. Careers: job listing and recruitment pages. Login: authentication pages requiring username and password entry. Signup: account registration pages. Checkout: payment and transaction pages. Settings: account or system configuration pages. Admin: administrative control panels. Account: user account dashboard pages. Password reset: credential recovery pages. Legal: legal notice and disclaimer pages. Privacy policy: data protection and privacy pages. Terms of service: contractual terms pages. Blog: article and content pages. Documentation: technical documentation pages. API reference: API endpoint documentation. Support: customer support and help pages. FAQ: frequently asked questions. Forum: community discussion pages. Product: product description and listing pages.
The most powerful agent policies combine both dimensions: category scope and page-type restrictions. This two-dimensional model creates a policy matrix where each cell represents a specific combination of "what the site is about" and "what the page does." A financial research agent might have a policy matrix like this: Technology + any page type = allow. Finance + blog/documentation/pricing/product = allow. Finance + login/checkout/admin = block. Healthcare + any page type = block (outside scope). Adult + any page type = block (global rule). This matrix is easy to define, easy to audit, and easy to explain to compliance reviewers.
Login pages are the most critical page type to detect and block for AI agents. When an agent encounters a login page, several dangerous scenarios can unfold. If the agent has access to stored credentials (through environment variables, secret managers, or configuration files), it may attempt to authenticate — potentially violating the target service's terms of use, triggering rate limiters, or locking legitimate accounts. Even without credentials, the agent may generate plausible-looking credential pairs from its training data and submit them, creating failed login events that trigger security alerts at the target organization.
Our login page detection identifies authentication pages by analyzing URL patterns, page structure, and form field indicators. The detection covers standard login pages (/login, /signin, /auth), SSO portals, OAuth flows, and custom authentication implementations. When the database labels a page as "login," your agent's middleware can block the navigation before any data is sent to the target server.
Checkout pages represent financial risk. An agent that reaches a checkout page might fill form fields with data from its context, potentially initiating purchases, entering credit card numbers, or submitting billing information. Even if the agent does not have access to real payment data, interacting with checkout flows can create partial orders, abandoned carts that trigger marketing emails, or fraud alerts at payment processors. Page-type classification identifies checkout pages (/checkout, /cart, /payment, /billing) and blocks agent navigation before the risk materializes.
Administrative pages are the highest-risk interaction surface on the web. An agent that navigates to an admin panel — whether through a crawled link, a misconfigured redirect, or an adversarial prompt injection — could potentially modify system settings, access sensitive data, create or delete user accounts, or alter security configurations. Even viewing an admin page exposes organizational structure and system architecture information. Our page-type classification detects admin pages (/admin, /dashboard, /panel, /manage, /console) and enables hard-block policies that prevent any agent from reaching administrative interfaces, regardless of the domain's category.
For pages not covered by the static database — dynamically generated URLs, single-page applications with hash routes, or newly created pages — the real-time API provides on-demand page-type classification. Submit any URL to the API and receive its page-type label within 200 milliseconds. The API analyzes URL structure, path patterns, query parameters, and known page-type indicators to determine the page's functional purpose. Use the API as a fallback for the local database, ensuring that even unknown URLs receive page-type classification before the agent navigates.
Start with a baseline policy that blocks all high-risk page types globally: login, signup, checkout, admin, settings, password_reset, and account. This baseline protects against the most dangerous interactions regardless of domain category. Then layer category-specific rules on top: allow your agent's target categories (Technology, Finance, News) while keeping the page-type blocks in place. Finally, add monitoring rules for medium-risk page types (contact, careers, legal) that log enhanced detail without blocking. This layered approach — global blocks, category scopes, monitoring rules — creates a comprehensive governance framework that is both protective and permissive enough for productive agent operation.
Deploy page-type classification across 102 million domains. 20+ page types, IAB taxonomy, reputation scores. One-time purchase, perpetual license.