Detecting Page Intent to Control What AI Agents Can Do

The Problem: Category Alone Does Not Tell You What a Page Wants You to Do

Two URLs on the same domain can have radically different intents. Knowing the domain's category is not enough -- you need to know the page's purpose.

Domain-Level Categories Miss the Page-Level Threat

Consider a banking website classified under "Financial Services." The homepage is a marketing page -- safe for an agent to read. The /login page is an authentication form -- dangerous for an agent to interact with. The /transfer page is a payment form -- catastrophically dangerous for an agent to submit. The /careers page is a job listing -- completely benign. All four pages share the same domain, the same IAB category, and the same web filtering classification. Without page intent detection, your policy engine treats them identically -- which means you either block the entire domain (losing access to the safe pages) or allow the entire domain (exposing the dangerous pages).

Login pages accept credentials: An agent that fills in login fields may submit real credentials, trigger MFA alerts, or lock accounts
Payment pages accept financial data: An agent that interacts with a checkout form may initiate unauthorized transactions
Settings pages modify configurations: An agent that changes account settings may alter security configurations, notification preferences, or access permissions
Data entry pages collect PII: An agent that fills forms on contact, signup, or survey pages may submit sensitive personal information

The Solution: Page Intent Detection Enables Per-Page Policy Enforcement

Our database classifies pages into 20+ intent-based types: homepage, about, contact, pricing, careers, login, signup, checkout, settings, admin, legal, privacy policy, terms of service, blog, documentation, API reference, support, FAQ, forum, product, and more. Each page type represents a distinct functional intent -- what the page expects the visitor to do. Your policy engine maps each intent to a control action: read-only pages (blog, docs, FAQ) are allowed. Interactive pages (contact, support) are allowed with monitoring. Transactional pages (login, checkout, settings) are blocked. Administrative pages (admin, settings) are hard-blocked with alerts.

This semantic layer transforms your policy engine from a binary allow/block system operating at the domain level into a nuanced, intent-aware system operating at the page level. The agent can access the safe pages on a domain while being prevented from reaching the dangerous ones -- maximizing the agent's utility while minimizing risk.

Page Intent Types and Agent Control Actions

How each page intent maps to a specific agent control policy

Read-Only Intents: Allow

Pages with informational intent -- blog posts, documentation, FAQ pages, about pages, privacy policies, and product descriptions -- are designed for consumption, not interaction. Agents can safely read these pages without risk of triggering side effects. Your policy engine should allow access to these page types across all domain categories, enabling agents to gather information freely from the safe portions of the web.

Interactive Intents: Monitor

Pages with interactive intent -- contact forms, support tickets, feedback widgets, and comment sections -- accept user input but do not handle credentials or financial data. These pages represent a moderate risk: an agent might submit information that creates a support ticket, posts a public comment, or sends a message on behalf of the organization. Your policy engine should allow access with monitoring, logging every interaction for review.

Transactional Intents: Block

Pages with transactional intent -- login forms, signup pages, checkout flows, payment gateways, and account settings -- handle credentials, financial data, or configuration changes. These pages must be hard-blocked for all agent access. No agent should ever reach a page whose intent is to collect a password, process a payment, or modify account settings. Page intent detection identifies these pages regardless of URL structure.

Page Intent Detection Code

Integrate page intent detection into your agent middleware for fine-grained controls

Python -- Intent-Aware Agent Controller

import http.client
import json

class IntentAwareController:
    """Controls agent actions based on detected page intent."""

    INTENT_POLICIES = {
        # Read-only intents: safe to access
        "homepage": "allow",
        "about": "allow",
        "blog": "allow",
        "documentation": "allow",
        "faq": "allow",
        "pricing": "allow",
        "product": "allow",
        "legal": "allow",
        "privacy_policy": "allow",
        "terms_of_service": "allow",
        "careers": "allow",
        # Interactive intents: allow with monitoring
        "contact": "monitor",
        "support": "monitor",
        "forum": "monitor",
        # Transactional intents: always block
        "login": "block",
        "signup": "block",
        "checkout": "block",
        "settings": "block",
        "admin": "block",
        "password_reset": "block"
    }

    def __init__(self, api_key):
        self.api_key = api_key
        self.conn = http.client.HTTPSConnection(
            "www.websitecategorizationapi.com"
        )

    def evaluate_page_intent(self, target_url):
        data = self._classify(target_url)
        page_type = data.get("page_type", "unknown")

        policy = self.INTENT_POLICIES.get(
            page_type, "allow"  # Default allow for unknown
        )

        return {
            "url": target_url,
            "detected_intent": page_type,
            "action": policy,
            "allowed_interactions": self._get_permissions(
                policy),
            "categories": [
                c[0].split("Category name: ")[1]
                for c in data.get("iab_classification", [])
            ]
        }

    def _get_permissions(self, policy):
        if policy == "allow":
            return ["read", "extract_text", "follow_links"]
        elif policy == "monitor":
            return ["read", "extract_text"]
        else:
            return []

    def _classify(self, url):
        payload = (
            f"query={url}&api_key={self.api_key}"
            f"&data_type=url&expanded_categories=1"
        )
        headers = {"Content-Type":
                   "application/x-www-form-urlencoded"}
        self.conn.request("POST",
            "/api/iab/iab_web_content_filtering.php",
            payload, headers)
        return json.loads(
            self.conn.getresponse().read().decode("utf-8"))

controller = IntentAwareController(api_key="your_key")
result = controller.evaluate_page_intent(
    "https://bank.com/account/settings")
print(f"Intent: {result['detected_intent']}")
print(f"Action: {result['action']}")
print(f"Permissions: {result['allowed_interactions']}")

JavaScript -- Page Intent Router for Agent Gateway

class PageIntentRouter {
  constructor(apiKey) {
    this.apiKey = apiKey;
    this.blockIntents = new Set([
      "login", "signup", "checkout",
      "settings", "admin", "password_reset"
    ]);
    this.monitorIntents = new Set([
      "contact", "support", "forum"
    ]);
  }

  async routeByIntent(targetURL, agentId) {
    const classification = await fetch(
      "https://www.websitecategorizationapi.com" +
      "/api/iab/iab_web_content_filtering.php",
      {
        method: "POST",
        headers: {
          "Content-Type":
            "application/x-www-form-urlencoded"
        },
        body: new URLSearchParams({
          query: targetURL,
          api_key: this.apiKey,
          data_type: "url",
          expanded_categories: "1"
        })
      }
    ).then(r => r.json());

    const intent = classification.page_type || "unknown";
    let action = "allow";
    let permissions = ["read", "extract", "follow_links"];

    if (this.blockIntents.has(intent)) {
      action = "block";
      permissions = [];
    } else if (this.monitorIntents.has(intent)) {
      action = "monitor";
      permissions = ["read", "extract"];
    }

    return {
      url: targetURL,
      intent,
      action,
      permissions,
      agentId,
      timestamp: new Date().toISOString()
    };
  }
}

AI Agent Database Pricing

Purpose-built domain databases for AI agent filtering. Includes IAB categories, 20+ page types, reputation scores, and popularity rankings. One-time purchase with perpetual license.

AI Agent Database

AI Agent Domain Database 10M

$7,999

10 Million Domains with Page-Type Intelligence

One-time purchase: Perpetual license | Optional Updates: $1,599/year

10M+ Categorized Domains
IAB Taxonomies v2 & v3
20+ Page Type Labels
Web Filtering Categories
OpenPageRank Scores
Global Popularity Rankings

Get AI Agent DB 10M

Popular

AI Agent Domain Database 20M

$14,999

20 Million Domains with Full Intelligence Suite

One-time purchase: Perpetual license | Optional Updates: $2,999/year

20M+ Categorized Domains
IAB Taxonomies v2 & v3
20+ Page Type Labels
Web Filtering Categories
OpenPageRank Scores
Global & Country Rankings
Dedicated Account Manager

Get AI Agent DB 20M

Maximum Coverage

AI Agent Domain Database 50M

$24,999

50 Million Domains with Complete Intelligence Suite

One-time purchase: Perpetual license | Optional Updates: $4,999/year

50M+ Categorized Domains
IAB Taxonomies v2 & v3
20+ Page Type Labels
Web Filtering Categories
OpenPageRank Scores
Global & Country Rankings
Dedicated Account Manager

Get AI Agent DB 50M

Also available: Enterprise URL Database up to 102M domains from $2,499. View all database tiers →

Understanding Page Intent as a Security Primitive

Page intent is the functional purpose of a web page -- what the page is designed to make the visitor do. A login page intends to collect credentials. A checkout page intends to process a payment. A blog post intends to deliver information. A settings page intends to modify configurations. Understanding this intent is essential for agent governance because the risk profile of a page is determined not by its domain or its content category, but by what the page asks the visitor to do.

Traditional web security tools operate at the domain or URL level. They can tell you that a domain belongs to the "Financial Services" category. But they cannot tell you that the specific page the agent is about to visit is a wire transfer form versus a quarterly earnings report. Both pages live on the same domain, in the same category. Their risk profiles could not be more different. Page intent detection closes this gap by adding a semantic layer that resolves each page's functional purpose independently of its domain classification.

The Complete Page Intent Taxonomy

Our database classifies pages into a comprehensive taxonomy of 20+ intent types. Informational intents include homepage, about, blog, documentation, FAQ, pricing, careers, press releases, and investor relations pages. These pages are designed for consumption -- the visitor reads content but does not submit data or initiate transactions. Service intents include contact forms, support portals, forum threads, and comment sections. These pages accept user input that creates lightweight interactions -- a message, a ticket, a post -- but do not handle credentials or financial data.

Transactional intents include login, signup, checkout, payment, settings, admin, and password reset pages. These pages handle sensitive data (credentials, financial instruments, configuration parameters) and create persistent side effects (sessions, accounts, purchases, setting changes). Administrative intents include dashboard, analytics, CMS, and control panel pages. These pages provide access to system-level functionality that agents should never interact with.

Why Page Intent Cannot Be Inferred from URLs Alone

A common misconception is that page intent can be derived from URL patterns -- /login means login, /checkout means checkout, /blog means blog. This heuristic works for a subset of pages but fails spectacularly for the general case. Modern web applications use opaque URL structures (/app/page?id=47), hash-based routing (/#/authenticate), or entirely parameterized paths (/v2/flow/step3). A page at /dashboard might be a public marketing dashboard, not an admin panel. A page at /account might be a public account FAQ, not a settings page.

Our classification engine does not rely on URL patterns. It analyzes the rendered page content -- form elements, button labels, input field types, meta tags, semantic HTML structure, and contextual text -- to determine the page's functional intent. This content-based analysis is performed offline during database creation and stored as a pre-computed label, so your harness gets the intent classification without any runtime page rendering or content analysis.

Combining Page Intent with Domain Categories for Compound Policies

The most powerful policy configurations combine page intent with domain category to create compound rules. Consider the rule: "Allow agents to access blog and documentation pages on Financial Services domains, but block login and checkout pages on those same domains." This rule enables the agent to research financial products, read analyst reports, and browse documentation -- while preventing it from logging into banking portals or initiating transactions.

Another example: "Allow agents to access all page types on domains in the Technology and Computing category (trusted reference sites), but restrict agents to read-only page types (blog, docs, FAQ) on domains in the Shopping category (e-commerce sites)." This rule enables full browsing of tech documentation while preventing agents from interacting with shopping carts, checkout flows, or account creation forms on e-commerce platforms.

Permission Gradients Beyond Binary Allow/Block

Page intent detection enables a permission gradient that goes beyond binary allow/block. For each page intent, you can define a specific set of interactions the agent is permitted to perform. On blog pages (read-only intent), the agent can read text, extract data, and follow links. On contact pages (interactive intent), the agent can read text and extract data but cannot submit forms. On login pages (transactional intent), the agent receives zero permissions -- the page is never loaded.

This gradient model is particularly useful for browser-using agents like Anthropic Computer Use or OpenAI Operator, which have the ability to click buttons, fill forms, and interact with page elements. The permission gradient controls not just whether the agent can see the page, but what it can do once it gets there. Read-only permissions prevent form submission even on pages that contain forms, adding a defense-in-depth layer below the page-type block.

Handling Pages with Mixed Intents

Some pages combine multiple intents on a single URL. A product page might include a "Buy Now" button alongside product descriptions. A blog post might embed a login widget in the sidebar. A documentation page might include a "Sign up for API access" form. For these mixed-intent pages, the classification engine assigns the dominant intent -- the primary functional purpose of the page -- while flagging the presence of secondary transactional elements in the metadata.

Your policy engine can consume both the primary intent and the secondary flags. If the primary intent is "product" (informational) but the page contains a checkout widget, the policy engine can allow the page with a "no form interaction" constraint, enabling the agent to read the product information while preventing it from clicking the purchase button. This nuanced handling maximizes agent utility on mixed-intent pages.

Real-World Impact: Reducing False Blocks by 80%

Organizations that implement page intent detection alongside domain-level categorization typically see an 80% reduction in false blocks compared to domain-only filtering. Without page intent, blocking a banking domain means the agent cannot access any page on that domain -- including the public blog, the API documentation, and the investor relations section. With page intent, only the login, checkout, and settings pages are blocked, while informational pages remain accessible. This dramatically increases the agent's effective browsing surface without increasing risk.

Related topics: Webpage Type Detection Page Type Classification API Classify Login and Checkout Pages Block Agents from Auth Pages Permission System by Site Type Block Form Submissions

Deploying Page Intent Detection in Your Agent Stack

Page intent data is included in every domain entry in the 10M and 20M AI Agent databases and the 102M Enterprise database. The data ships as a page_type field alongside the IAB categories, web filtering classifications, and reputation scores. Load the database into your data store of choice (Redis, PostgreSQL, SQLite, DynamoDB) and query the page_type field in your agent middleware alongside the category fields. The query pattern is identical -- one lookup per domain, sub-millisecond response time, all fields returned in a single record.

Control Agents by What Pages Intend

Deploy page intent detection to enable fine-grained, per-page agent controls. 20+ intent types, 102 million classified domains, one-time purchase with perpetual license.

View AI Agent Database View 102M Enterprise Database