AI agents that browse the web inevitably encounter login forms, payment gateways, and authentication screens. Without page-type classification, they cannot distinguish a public product page from a checkout flow. Our database tags 20+ page types across 102 million domains, enabling your agent harness to block interactions with sensitive flows before they start.
Login pages and checkout flows are the highest-risk destinations on the web for autonomous agents. Without explicit page-type labels, every form is just HTML.
When an AI agent is instructed to "compare prices for enterprise SaaS tools," it will follow links across dozens of vendor sites. Some of those links lead to public pricing pages. Others lead to login screens that demand credentials, SSO portals that trigger OAuth flows, and checkout pages that present payment forms with credit card fields. The agent has no native understanding of these distinctions. It sees form fields and interacts with them — potentially submitting data, triggering account lockouts, or initiating unauthorized transactions.
Our domain database includes page-type labels for every classified URL. Login pages, checkout pages, signup forms, admin panels, settings screens, and 15+ additional types are tagged at the domain and path level. When your agent's navigation middleware queries the database, the response includes a page_type field that your policy engine can evaluate instantly. Login detected? Block navigation and log the event. Checkout detected? Redirect the agent to the public pricing page instead. Settings or admin panel? Hard block with an alert to the security team.
This classification is deterministic — the same URL always returns the same page type, eliminating the inconsistency of model-based page detection. It is also pre-computed, so the lookup adds zero latency to the agent's decision path. Your agent harness gains the ability to treat different page types with different levels of trust, just as enterprise web proxies treat different content categories differently for human users.
Three layers of protection that prevent agents from interacting with sensitive page types
Every URL tagged with page type "login" is blocked before the agent's HTTP request fires. This covers SSO portals, OAuth authorization endpoints, multi-factor authentication screens, and custom login forms. The database recognizes login pages across all major web frameworks — WordPress, Shopify, Salesforce, Okta, Auth0, and thousands of custom implementations.
Checkout pages are identified by the "checkout" page-type label. This includes shopping cart pages, payment form pages, order confirmation screens, and subscription signup flows. When an agent encounters a checkout page type, the policy engine blocks interaction and optionally redirects to the corresponding product or pricing page for data extraction only.
Pages classified as "admin," "settings," or "dashboard" receive automatic hard blocks. These pages represent the highest-risk interaction surfaces — they contain account management controls, configuration panels, and data management interfaces that an agent should never reach. Every blocked attempt is logged with the full URL, page type, and timestamp for security audit.
Production-ready middleware that prevents agents from reaching sensitive page types
import http.client
import json
class PageTypeBlocker:
"""Blocks AI agent navigation to login, checkout, and admin pages."""
SENSITIVE_PAGE_TYPES = [
"login", "checkout", "signup", "settings",
"admin", "dashboard", "account", "payment"
]
REDIRECT_MAP = {
"checkout": "pricing",
"login": "homepage",
"signup": "about"
}
def __init__(self, api_key):
self.api_key = api_key
self.conn = http.client.HTTPSConnection(
"www.websitecategorizationapi.com"
)
self.blocked_log = []
def detect_page_type(self, target_url):
payload = (
f"query={target_url}"
f"&api_key={self.api_key}"
f"&data_type=url"
f"&expanded_categories=1"
)
headers = {
"Content-Type": "application/x-www-form-urlencoded"
}
self.conn.request(
"POST",
"/api/iab/iab_web_content_filtering.php",
payload,
headers
)
res = self.conn.getresponse()
return json.loads(res.read().decode("utf-8"))
def evaluate_navigation(self, target_url):
data = self.detect_page_type(target_url)
page_type = data.get("page_type", "unknown")
if page_type in self.SENSITIVE_PAGE_TYPES:
self.blocked_log.append({
"url": target_url,
"page_type": page_type,
"action": "blocked"
})
redirect = self.REDIRECT_MAP.get(page_type)
return {
"allowed": False,
"reason": f"Sensitive page type: {page_type}",
"redirect_to": redirect
}
return {"allowed": True, "page_type": page_type}
# Usage in agent harness
blocker = PageTypeBlocker(api_key="your_api_key")
result = blocker.evaluate_navigation(
"https://store.example.com/checkout/payment"
)
if not result["allowed"]:
print(f"Blocked: {result['reason']}")
const SENSITIVE_TYPES = new Set([
"login", "checkout", "signup", "admin",
"settings", "dashboard", "payment"
]);
async function sensitivePageGuard(targetURL, apiKey) {
const response = await fetch(
"https://www.websitecategorizationapi.com" +
"/api/iab/iab_web_content_filtering.php",
{
method: "POST",
headers: {
"Content-Type": "application/x-www-form-urlencoded"
},
body: new URLSearchParams({
query: targetURL,
api_key: apiKey,
data_type: "url",
expanded_categories: "1"
})
}
);
const classification = await response.json();
const pageType = classification.page_type || "unknown";
if (SENSITIVE_TYPES.has(pageType)) {
return {
allowed: false,
pageType,
reason: `Blocked: ${pageType} page detected`,
timestamp: new Date().toISOString()
};
}
return { allowed: true, pageType, url: targetURL };
}
// Example: guard every agent navigation event
async function agentNavigate(url, apiKey) {
const check = await sensitivePageGuard(url, apiKey);
if (!check.allowed) {
console.warn(`Agent blocked from ${check.pageType} page`);
return null; // prevent navigation
}
return fetch(url); // proceed with request
}
Purpose-built domain databases for AI agent filtering. Includes IAB categories, 20+ page types, reputation scores, and popularity rankings. One-time purchase with perpetual license.
10 Million Domains with Page-Type Intelligence
One-time purchase: Perpetual license | Optional Updates: $1,599/year
20 Million Domains with Full Intelligence Suite
One-time purchase: Perpetual license | Optional Updates: $2,999/year
50 Million Domains with Complete Intelligence Suite
One-time purchase: Perpetual license | Optional Updates: $4,999/year
Also available: Enterprise URL Database up to 102M domains from $2,499. View all database tiers →
Search any IAB or Web Filtering category to see how many domains are in our 102M Enterprise Database — the same data your AI agent filtering rules will reference.
How 102 million domains from our main Enterprise Database are distributed across IAB v3 taxonomy classifications
Spanning Tier 1 through Tier 4 classifications from our 102M Enterprise Database
Charts display domain counts for the top 50 out of 700+ categories in our 102M Enterprise Database. To check the number of domains for the remaining 650+ categories, use the Category Counter tool above .
The rise of browser-using AI agents — including Anthropic's Computer Use, OpenAI's Operator, and Google's Project Mariner — has created a new class of security risk that traditional web filtering was never designed to address. These agents do not just fetch and parse HTML. They interact with web pages: clicking buttons, filling forms, scrolling through content, and following multi-step workflows. When those workflows lead to login pages or checkout flows, the agent's interactions cross from harmless data collection into potentially dangerous territory.
A login page is not just another web page. It is a trust boundary. When a human user reaches a login page, they make a conscious decision about whether to enter credentials. An AI agent has no such judgment. Without page-type classification in the decision path, the agent treats a login form identically to a search form or a newsletter signup. The implications range from embarrassing (the agent submitting gibberish into a login form) to catastrophic (the agent using cached credentials from a password manager to authenticate on an unauthorized site).
Checkout pages present a different but equally severe risk profile. These pages typically contain payment form fields (credit card number, expiration date, CVV), billing address inputs, and purchase confirmation buttons. An agent that reaches a checkout page without page-type awareness may interact with these fields in unpredictable ways. Even if the agent does not have access to payment credentials, its interactions with checkout forms can trigger fraud detection systems, create ghost orders, or expose session tokens that contain cart and pricing data.
The challenge is that checkout pages do not follow a universal URL pattern. Some sites use /checkout, others use /cart/payment, and many use dynamically generated paths that change with each session. URL pattern matching alone cannot reliably detect checkout flows. Our database solves this by classifying pages based on content analysis, not URL patterns — every domain in the 102M database has its checkout, login, and sensitive pages pre-tagged regardless of URL structure.
Many teams attempt to block login and checkout pages using simple URL pattern rules: block any URL containing "/login", "/signin", "/checkout", or "/payment". This approach fails for three reasons. First, it misses login pages that use non-standard paths — many SaaS applications use paths like "/app", "/portal", or "/access" for authentication. Second, it generates false positives — a blog post titled "how-to-login-to-your-dashboard" would be incorrectly blocked. Third, it cannot detect checkout pages that use JavaScript-rendered forms with clean URLs like "/step-2" or "/confirm".
Database-driven page-type classification eliminates these problems. Each domain's pages are classified based on actual page content analysis — the presence of password fields, payment form elements, OAuth redirect patterns, and CAPTCHA challenges. The classification is stored as a simple label ("login", "checkout", "admin") that your policy engine can evaluate without any regex matching or URL parsing.
Not all sensitive page types require the same policy action. A well-designed agent governance framework uses tiered responses based on page-type classification. Login pages should receive a hard block — the agent should never reach an authentication form under any circumstances. Checkout pages may receive a soft block with redirection — the agent is redirected to the corresponding pricing or product page where it can extract the data it needs without interacting with payment forms. Settings and admin pages should receive a hard block with an alert — these represent the highest-risk interaction surfaces and any attempt to reach them indicates either a misconfigured agent or a prompt injection attack.
Signup pages fall into a gray area. Some organizations allow agents to create accounts on approved services (for example, signing up for a free API trial). Others block all signup interactions to prevent unauthorized account creation. The page-type classification enables both policies — you define the rules, the database provides the labels.
The integration pattern for page-type blocking is straightforward regardless of your agent framework. In LangChain, wrap the database lookup in a custom Tool that the agent calls before every browser navigation. In CrewAI, implement a pre-navigation hook that checks page types and returns a block signal when sensitive types are detected. In AutoGen, register a function that intercepts the agent's browse action and evaluates the target URL against the database. The critical requirement is that the page-type check must execute before the browser navigates — not after the page has loaded. Pre-navigation blocking prevents the agent from ever rendering the login or checkout page, eliminating the risk entirely.
Consider an AI agent tasked with "researching competitor pricing for CRM software." The agent searches for CRM vendors, visits their websites, and follows links to pricing pages. Without page-type blocking, the agent may click a "Start Free Trial" button (a signup page), follow a redirect to a login page (if the user already has an account), or reach a checkout page with pricing tiers and payment fields. Each of these interactions represents a policy violation — the agent was supposed to read pricing data, not create accounts or interact with payment forms.
With page-type blocking in place, the agent's navigation middleware checks each URL against the database before loading the page. The pricing page (page type: "pricing") is allowed. The signup page (page type: "signup") is blocked or flagged. The login page (page type: "login") is hard-blocked. The checkout page (page type: "checkout") is redirected to the pricing page. The agent completes its task — extracting pricing data — without ever interacting with a sensitive page type.
Every page-type classification decision should be logged for audit purposes. The log entry should include the target URL, the classified page type, the policy action taken (allow, block, redirect, flag), the timestamp, and the agent identifier. This audit trail serves multiple purposes: it demonstrates compliance with internal security policies, it provides evidence for incident investigations if an agent does reach a sensitive page, and it generates data for tuning your policy rules over time.
Our database makes this logging straightforward because every classification is a deterministic lookup — the same URL always returns the same page type. This consistency means your audit logs are reproducible: given the same database version and the same URL, any reviewer can verify that the correct policy action was taken.
Login and checkout pages exist on every type of website — e-commerce platforms, SaaS applications, banking portals, social media networks, government services, healthcare portals, and educational institutions. Our 102M domain database covers all of these verticals. Shopify checkout pages, WordPress login forms, Salesforce SSO portals, Stripe payment pages, Okta authentication screens, and thousands of custom-built authentication systems are all pre-classified and ready for your policy engine to consume.
This breadth of coverage is essential because AI agents do not restrict themselves to a single vertical. An agent researching "employee benefits programs" may visit HR platforms, insurance company websites, banking portals, and government benefit sites — each with its own login and checkout implementation. The database ensures that regardless of where the agent navigates, the page-type classification is available for every domain it encounters.
Deploy page-type classification to prevent AI agents from interacting with authentication forms, payment flows, and admin panels. 20+ page types, 102 million domains, sub-millisecond lookups.