Authentication pages are the single most dangerous destination an AI agent can reach. Login portals, SSO redirects, OAuth flows, and password reset screens expose credential entry points that autonomous agents should never interact with. Our 102 million domain database includes page-type detection that identifies login, signup, and authentication pages -- enabling your agent harness to block these interactions before they happen.
An autonomous agent browsing the web has no built-in understanding that a URL leading to a login form is fundamentally different from a URL leading to a product page.
When an AI agent encounters a login page during a browsing task, multiple failure modes emerge simultaneously. The agent may attempt to fill in the username and password fields using data from its context window -- potentially submitting real credentials to the wrong site. It may trigger multi-factor authentication flows that send unexpected verification codes to employees. It may create new accounts on services without authorization. And in SSO environments, a single accidental interaction with an identity provider redirect can cascade across dozens of connected applications.
Our database classifies pages into 20+ distinct types, including dedicated labels for login, signup, authentication, SSO, and password reset pages. When your agent harness intercepts a navigation request, it queries the database for the target URL's page type. If the page type matches any authentication-related label, the harness blocks the navigation before the agent's HTTP request reaches the server -- zero contact with the authentication surface.
This pre-navigation blocking is fundamentally different from post-load content analysis. The agent never receives the HTML of the login page, never sees the form fields, and never has the opportunity to interact with authentication elements. The block happens at the URL resolution layer, not the rendering layer, which eliminates the entire class of credential-related risks.
Three layers of protection between your AI agents and authentication surfaces
The database tags pages that present username/password forms, OAuth consent screens, SAML redirects, and multi-factor verification prompts. This includes not just obvious /login URLs but also dynamic login modals, embedded authentication widgets, and third-party identity provider redirects. The classification covers the full spectrum of authentication UX patterns across 102 million domains.
Single sign-on redirects are particularly dangerous because they chain across multiple domains. An agent that follows a /auth/saml redirect lands on an identity provider like Okta, Azure AD, or Ping -- and any interaction there affects every application in the SSO federation. The database identifies identity provider domains and SSO redirect endpoints, enabling the harness to break the redirect chain before it reaches the identity provider.
Account creation pages present a different but equally serious risk. An agent that fills out a registration form can create unauthorized accounts, agree to terms of service on behalf of the organization, and generate identity records that are difficult to track and remediate. Page-type detection identifies signup, registration, and account creation pages across all major web platforms.
Production-ready snippets that block agents from authentication pages using page-type detection
import http.client
import json
class AuthPageBlocker:
"""Blocks AI agents from reaching authentication pages."""
AUTH_PAGE_TYPES = [
"login", "signup", "authentication", "sso",
"password_reset", "registration", "oauth",
"mfa_verification", "account_creation"
]
IDENTITY_PROVIDER_DOMAINS = [
"login.microsoftonline.com", "accounts.google.com",
"auth0.com", "okta.com", "onelogin.com"
]
def __init__(self, api_key):
self.api_key = api_key
self.conn = http.client.HTTPSConnection(
"www.websitecategorizationapi.com"
)
def check_auth_page(self, target_url):
# Quick check against known IdP domains
from urllib.parse import urlparse
domain = urlparse(target_url).netloc
if any(idp in domain for idp in
self.IDENTITY_PROVIDER_DOMAINS):
return True, "Known identity provider domain"
payload = (
f"query={target_url}"
f"&api_key={self.api_key}"
f"&data_type=url"
f"&expanded_categories=1"
)
headers = {
"Content-Type": "application/x-www-form-urlencoded"
}
self.conn.request(
"POST",
"/api/iab/iab_web_content_filtering.php",
payload,
headers
)
res = self.conn.getresponse()
data = json.loads(res.read().decode("utf-8"))
page_type = data.get("page_type", "unknown")
if page_type in self.AUTH_PAGE_TYPES:
return True, f"Auth page detected: {page_type}"
return False, "Page is not an authentication surface"
# Usage in agent middleware
blocker = AuthPageBlocker(api_key="your_api_key")
is_auth, reason = blocker.check_auth_page(
"https://app.example.com/auth/login"
)
if is_auth:
print(f"Navigation blocked: {reason}")
class SSORedirectInterceptor {
constructor(apiKey) {
this.apiKey = apiKey;
this.authPatterns = [
/\/login/i, /\/signin/i, /\/auth\//i,
/\/sso\//i, /\/oauth/i, /\/saml/i
];
}
async shouldBlockNavigation(targetURL) {
// Fast path: check URL patterns first
if (this.authPatterns.some(p => p.test(targetURL))) {
const classification = await this.classify(targetURL);
const pageType = classification.page_type || "unknown";
if (["login","signup","sso","authentication"]
.includes(pageType)) {
return {
blocked: true,
reason: `Auth page type: ${pageType}`,
url: targetURL
};
}
}
// Full classification for non-obvious auth pages
const classification = await this.classify(targetURL);
if (["login","signup","sso","authentication",
"password_reset"].includes(
classification.page_type)) {
return {
blocked: true,
reason: `Auth surface detected: `
+ classification.page_type,
url: targetURL
};
}
return { blocked: false, url: targetURL };
}
async classify(targetURL) {
const res = await fetch(
"https://www.websitecategorizationapi.com" +
"/api/iab/iab_web_content_filtering.php",
{
method: "POST",
headers: {
"Content-Type":
"application/x-www-form-urlencoded"
},
body: new URLSearchParams({
query: targetURL,
api_key: this.apiKey,
data_type: "url",
expanded_categories: "1"
})
}
);
return res.json();
}
}
Purpose-built domain databases for AI agent filtering. Includes IAB categories, 20+ page types, reputation scores, and popularity rankings. One-time purchase with perpetual license.
10 Million Domains with Page-Type Intelligence
One-time purchase: Perpetual license | Optional Updates: $1,599/year
20 Million Domains with Full Intelligence Suite
One-time purchase: Perpetual license | Optional Updates: $2,999/year
50 Million Domains with Complete Intelligence Suite
One-time purchase: Perpetual license | Optional Updates: $4,999/year
Also available: Enterprise URL Database up to 102M domains from $2,499. View all database tiers →
Search any IAB or Web Filtering category to see how many domains are in our 102M Enterprise Database -- the same data your auth-blocking rules will reference.
How 102 million domains from our main Enterprise Database are distributed across IAB v3 taxonomy classifications
Spanning Tier 1 through Tier 4 classifications from our 102M Enterprise Database
Charts display domain counts for the top 50 out of 700+ categories in our 102M Enterprise Database. To check the number of domains for the remaining 650+ categories, use the Category Counter tool above .
Authentication pages represent a uniquely dangerous class of web destinations for AI agents because they combine three risk factors that do not exist together on any other page type. First, they accept credential input -- usernames, passwords, tokens, and biometric prompts. Second, they have persistent side effects -- successful authentication creates sessions, issues tokens, and establishes identity bindings that persist long after the page interaction ends. Third, they are federated -- a single authentication event on an identity provider can propagate access across dozens of downstream applications through SSO protocols like SAML, OIDC, and OAuth 2.0.
No other page type combines all three of these properties. A product page accepts no credentials. A contact form has limited side effects. A blog post is not federated. Authentication pages are uniquely positioned at the intersection of credential handling, persistent state creation, and cross-application propagation -- which is precisely why they demand a dedicated blocking strategy in any agent governance architecture.
Agents do not deliberately seek out login pages. They arrive at authentication surfaces through four common pathways. The first is link following -- an agent researching a topic follows a link that redirects to a login wall. Many content sites gate articles behind authentication; the agent does not know this until it arrives at the login page. The second pathway is search results -- search engines return URLs that land on authentication-gated pages, especially for enterprise SaaS products where the public-facing page is the login screen.
The third pathway is form submission redirect -- after submitting a form on a public page, the site redirects the agent to a registration or login page as a next step. The fourth pathway is SSO redirect chains -- the agent visits a URL on application A, which redirects to identity provider B for authentication, which may further redirect to application C. Each redirect in the chain lands the agent on a new authentication surface that must be detected and blocked independently.
A naive approach to authentication blocking is to maintain a regex list of URL patterns -- /login, /signin, /auth, /sso, /oauth -- and block any URL that matches. This approach fails for three reasons. First, it produces false negatives: many authentication pages do not follow standard URL conventions. Enterprise SSO pages use custom paths (/workforce/identity, /access/verify, /portal/entry). Legacy applications use numeric IDs (/page?id=37). Single-page applications use hash routes (/#/authenticate). No regex list can anticipate every URL pattern used by 102 million domains.
Second, it produces false positives: the path /login appears in documentation pages (/docs/api/login-endpoint), blog posts (/blog/how-to-fix-login-issues), and support articles (/help/login-troubleshooting). Blocking every URL containing "login" as a substring would block legitimate content pages that the agent needs to access.
Third, it cannot handle redirect chains: the initial URL may look benign (/dashboard), but the server responds with a 302 redirect to an SSO provider. Pattern matching operates on the input URL, not on the redirect target, so it misses the authentication surface entirely.
Our database solves these problems by classifying pages based on their actual content and function, not their URL structure. The classification engine analyzes the rendered page content, form elements, button labels, meta tags, and semantic structure to determine whether a page serves an authentication function. This analysis is performed offline during database creation and stored as a page-type label alongside each domain entry. The result is a deterministic, pre-computed classification that your harness can query in sub-millisecond time.
The page-type taxonomy includes specific labels for login, signup, password reset, SSO redirect, OAuth consent, MFA verification, and account settings pages. Each label maps directly to a blocking rule in your policy engine. There is no ambiguity: if the page type is "login," the page serves an authentication function and should be blocked for agent access.
A particularly insidious risk arises when agents have access to credential stores, environment variables, or configuration files that contain usernames and passwords. If such an agent reaches a login page, it may attempt to fill in the credential fields using data from its context window -- effectively performing an automated credential submission that the user never authorized. This is not a theoretical risk; it has been demonstrated in research environments with browser-using agents that have access to password managers or .env files.
Blocking authentication pages at the URL resolution layer eliminates this risk entirely. The agent never receives the HTML of the login page, never parses the form fields, and never has the opportunity to match credential fields against data in its context window. The block occurs before any page content is fetched, which means the credential leakage pathway is closed at the network level rather than at the application level.
Our database includes comprehensive coverage of enterprise identity provider domains. Okta, Azure Active Directory (login.microsoftonline.com), Google Workspace (accounts.google.com), Ping Identity, OneLogin, Auth0, AWS Cognito, and dozens of other identity platforms are classified with authentication-specific page types. This means your harness can block agent access to identity providers regardless of which downstream application initiated the SSO redirect.
The database also covers self-hosted identity solutions. Organizations running Keycloak, Authentik, Authelia, or custom SAML/OIDC providers on their own domains benefit from the same page-type classification. As long as the domain is in the 102 million domain database, the page type is available for policy evaluation.
The most robust deployments use a defense-in-depth strategy with three layers of authentication page blocking. The first layer is the domain database lookup -- the primary protection that catches 99% of authentication pages through pre-computed page-type classification. The second layer is a real-time API fallback -- for domains not in the local database, the API classifies the page on demand and returns the page type for policy evaluation. The third layer is a URL pattern heuristic -- a lightweight regex check that catches obvious authentication URLs (/login, /auth, /sso) as a fast-path optimization before the database lookup completes.
Each layer compensates for the blind spots of the others. The database provides comprehensive coverage with zero latency. The API handles the long tail of new and niche domains. The pattern heuristic provides sub-microsecond blocking for unambiguous authentication URLs. Together, the three layers ensure that no authentication page reaches the agent regardless of how the agent encounters it.
Every blocked authentication navigation should generate an alert to your security operations center. The alert should include the agent ID, the target URL, the detected page type, and the timestamp. High-frequency blocking events -- an agent repeatedly hitting login pages -- may indicate a misconfigured task, a compromised agent prompt, or an adversarial prompt injection attempt designed to steer the agent toward credential entry points. Your monitoring system should track blocking rates per agent and trigger escalation when the rate exceeds a baseline threshold.
Deploy page-type detection to keep your AI agents away from login, SSO, and credential entry pages. One-time purchase, perpetual license, 102 million domains with 20+ page type labels.