WebsiteCategorizationAPI
Home
Demo Tools - Categorization
Website Categorization Text Classification URL Database Taxonomy Mapper
Demo Tools - Website Intel
Technology Detector Quality Score Competitor Finder
Demo Tools - Brand Safety
Brand Safety Checker Brand Suitability Quality Checker
Demo Tools - Content
Sentiment Analyzer Context Aware Ads
Resources
API Documentation Pricing Login
Try Categorization

How Webpage Type Detection Enables Granular Agent Controls

Knowing that a domain is classified as "Technology" is not enough. Your AI agent needs to know whether it is about to visit a documentation page, a login portal, or an admin console — and each of those page types demands a different policy response. Page-type detection adds the missing layer of granularity that turns domain-level categories into page-level agent controls.

20+
Page Types Detected
102M
Domains Classified
700+
IAB Categories
<1ms
Lookup Latency

The Problem: Category-Only Filtering Misses Critical Page-Level Risks

A domain categorized as "Business and Finance" can serve a public investor page, a login portal, a checkout flow, and an admin dashboard — all requiring different policy actions.

Same Domain, Radically Different Risk Profiles

Domain-level categorization tells you that salesforce.com is a "Technology > CRM" domain. That classification is correct for every page on the domain. But the risk profile of salesforce.com/login is fundamentally different from salesforce.com/products or salesforce.com/blog. An AI agent navigating to the products page is performing legitimate research. An AI agent navigating to the login page is one form-fill away from a credential exposure incident. Without page-type detection, your policy engine treats both navigation events identically because the domain and category are the same.

  • Login pages: Agents encountering authentication forms can attempt credential entry, trigger account lockouts, or expose SSO tokens to the agent's context window
  • Checkout pages: Payment forms with credit card fields are high-risk surfaces where agent interaction could trigger unauthorized transactions
  • Admin consoles: Settings and administration pages allow configuration changes — an agent clicking the wrong button on an admin page can cause production outages
  • Contact and signup forms: Agents filling out contact forms can create spam entries, fake accounts, or inadvertently submit sensitive company information

The Solution: Page-Type Labels That Map Directly to Policy Actions

Our database classifies every domain with 20+ page-type labels: homepage, about, contact, pricing, careers, login, signup, checkout, settings, admin, legal, privacy policy, terms of service, blog, documentation, API reference, support, FAQ, forum, and product pages. Each page type is a discrete metadata field that your policy engine can match against, independently or in combination with IAB categories and web filtering classifications.

This creates a two-dimensional policy matrix: the category axis tells you what the domain is about, the page-type axis tells you what the specific page is designed to do. A policy rule like "allow Technology domains but block login and admin pages" requires both dimensions. A policy rule like "log all pricing page visits for competitive intelligence auditing" requires page-type awareness. Without this layer, your policy engine operates with one hand tied behind its back.

Page Type Classification Scanner

Identifying 20+ page types across 102 million domains

Page Types and Their Policy Mappings

How each detected page type maps to a specific agent governance action

Login & Authentication Pages

Login pages present username/password fields, SSO redirects, and multi-factor authentication prompts. An AI agent interacting with a login page can attempt credential entry (using credentials from its context), trigger brute-force detection systems, or expose authentication tokens. Policy mapping: hard block with mandatory audit log entry. No agent should ever interact with an authentication surface unless explicitly authorized for that specific workflow.

Checkout & Payment Pages

Checkout pages contain credit card forms, billing address fields, and payment processing integrations. An agent navigating to a checkout page is one interaction away from an unauthorized purchase. Even if the agent does not have payment credentials, its interaction with the page can trigger cart holds, pricing locks, or session-based offers that affect business operations. Policy mapping: hard block for all agents except procurement-authorized workflows.

Admin & Settings Pages

Admin and settings pages expose configuration controls, user management interfaces, and system parameters. An agent clicking a "delete" button on an admin page or toggling a settings switch can cause immediate operational damage. These pages are especially dangerous because they often lack confirmation dialogs designed for human interaction. Policy mapping: hard block unless the agent is explicitly scoped for system administration tasks.

Documentation & API Reference

Documentation and API reference pages are typically read-only, public-facing, and designed for consumption by developers and machines. These are among the safest page types for agent browsing because interaction risk is low and the content is intended to be read and processed. Policy mapping: allow with standard logging. No special restrictions needed unless the documentation is behind an authentication wall.

Pricing & Product Pages

Pricing and product pages contain competitive intelligence — plan tiers, feature comparisons, per-seat costs, and contract terms. For research agents, these are high-value targets. For compliance teams, they are sensitive because the information gathered may be subject to competitive intelligence policies. Policy mapping: allow with logging for authorized research agents, with log entries flagged for competitive intelligence review.

Contact & Signup Forms

Contact forms and signup pages accept user-submitted data. An agent interacting with a contact form can submit company information, create fake accounts, or trigger outbound sales workflows at target companies. This data submission risk is distinct from navigation risk — the agent is not just reading the page, it is writing to it. Policy mapping: block form submission while allowing page view for information gathering.

Page Type Risk Distribution

Mapping page types to risk levels across the domain database

Page-Type Aware Agent Policy Code

Integration code that uses page-type detection for granular agent controls

Python — Page-Type Policy Matrix

import http.client import json class PageTypePolicyMatrix: """Maps page types to granular policy actions.""" # Page type -> (action, interaction_level) POLICY_MAP = { "login": ("block", "none"), "signup": ("block", "none"), "checkout": ("block", "none"), "admin": ("block", "none"), "settings": ("block", "none"), "contact": ("allow", "read_only"), "pricing": ("allow", "read_only"), "product": ("allow", "read_only"), "documentation": ("allow", "full"), "api": ("allow", "full"), "blog": ("allow", "full"), "homepage": ("allow", "read_only"), "about": ("allow", "full"), "careers": ("allow", "read_only"), "legal": ("allow", "full"), "privacy": ("allow", "full"), "terms": ("allow", "full"), "support": ("allow", "read_only"), "faq": ("allow", "full"), "forum": ("allow", "read_only"), } DEFAULT_ACTION = ("review", "read_only") def __init__(self, api_key): self.api_key = api_key self.conn = http.client.HTTPSConnection( "www.websitecategorizationapi.com" ) self.audit_log = [] def evaluate(self, url): data = self._classify(url) page_type = data.get("page_type", "unknown") action, interaction = self.POLICY_MAP.get( page_type, self.DEFAULT_ACTION ) result = { "url": url, "page_type": page_type, "action": action, "interaction_level": interaction, "category": self._extract_category(data) } self.audit_log.append(result) return result def _classify(self, url): payload = ( f"query={url}" f"&api_key={self.api_key}" f"&data_type=url" f"&expanded_categories=1" ) headers = { "Content-Type": "application/x-www-form-urlencoded" } self.conn.request( "POST", "/api/iab/iab_web_content_filtering.php", payload, headers ) res = self.conn.getresponse() return json.loads(res.read().decode("utf-8")) def _extract_category(self, data): cats = data.get("iab_classification", []) if cats: return cats[0][0].replace("Category name: ", "") return "Unknown" # Usage matrix = PageTypePolicyMatrix(api_key="your_api_key") result = matrix.evaluate("https://example.com/login") print(f"Page: {result['page_type']} -> {result['action']}") # Output: Page: login -> block

JavaScript — Page-Type Interaction Controller

class PageTypeController { constructor(apiKey) { this.apiKey = apiKey; // Risk tiers: critical (block), elevated (read-only), standard (allow) this.riskMap = new Map([ ["login", { tier: "critical", action: "block" }], ["signup", { tier: "critical", action: "block" }], ["checkout", { tier: "critical", action: "block" }], ["admin", { tier: "critical", action: "block" }], ["settings", { tier: "critical", action: "block" }], ["contact", { tier: "elevated", action: "read_only" }], ["pricing", { tier: "elevated", action: "read_only" }], ["careers", { tier: "elevated", action: "read_only" }], ["homepage", { tier: "standard", action: "allow" }], ["blog", { tier: "standard", action: "allow" }], ["docs", { tier: "standard", action: "allow" }], ["about", { tier: "standard", action: "allow" }], ["faq", { tier: "standard", action: "allow" }], ]); } async getPagePolicy(url) { const resp = await fetch( "https://www.websitecategorizationapi.com" + "/api/iab/iab_web_content_filtering.php", { method: "POST", headers: { "Content-Type": "application/x-www-form-urlencoded" }, body: new URLSearchParams({ query: url, api_key: this.apiKey, data_type: "url", expanded_categories: "1" }) } ); const data = await resp.json(); const pageType = data.page_type || "unknown"; const risk = this.riskMap.get(pageType) || { tier: "unknown", action: "review" }; return { url, pageType, riskTier: risk.tier, action: risk.action, disableFormSubmission: risk.tier !== "standard", disableClicks: risk.tier === "critical", }; } } // Usage — agent checks before every interaction const ctrl = new PageTypeController("your_api_key"); const policy = await ctrl.getPagePolicy("https://example.com/settings"); if (policy.disableClicks) { console.log("Agent interactions disabled: critical page type"); }

Real-Time Page Type Analysis

Scanning URL patterns and content signals to detect page purpose

AI Agent Database Pricing

Purpose-built domain databases for AI agent filtering. Includes IAB categories, 20+ page types, reputation scores, and popularity rankings. One-time purchase with perpetual license.

AI Agent Database
AI Agent Domain Database 10M
$7,999

10 Million Domains with Page-Type Intelligence

One-time purchase: Perpetual license  |  Optional Updates: $1,599/year

  • 10M+ Categorized Domains
  • IAB Taxonomies v2 & v3
  • 20+ Page Type Labels
  • Web Filtering Categories
  • OpenPageRank Scores
  • Global Popularity Rankings
Popular
AI Agent Domain Database 20M
$14,999

20 Million Domains with Full Intelligence Suite

One-time purchase: Perpetual license  |  Optional Updates: $2,999/year

  • 20M+ Categorized Domains
  • IAB Taxonomies v2 & v3
  • 20+ Page Type Labels
  • Web Filtering Categories
  • OpenPageRank Scores
  • Global & Country Rankings
  • Dedicated Account Manager
Maximum Coverage
AI Agent Domain Database 50M
$24,999

50 Million Domains with Complete Intelligence Suite

One-time purchase: Perpetual license  |  Optional Updates: $4,999/year

  • 50M+ Categorized Domains
  • IAB Taxonomies v2 & v3
  • 20+ Page Type Labels
  • Web Filtering Categories
  • OpenPageRank Scores
  • Global & Country Rankings
  • Dedicated Account Manager

Also available: Enterprise URL Database up to 102M domains from $2,499. View all database tiers →

How Many Domains in Each Category?

Search any IAB or Web Filtering category to see how many domains are in our 102M Enterprise Database — the same data driving your page-type-aware policy rules.

Popular:
Database Analytics

Domain Distribution by Category in Our 102M Enterprise Database

How 102 million domains from our main Enterprise Database are distributed across IAB v3 taxonomy classifications

Top 50 IAB v3 Categories

Spanning Tier 1 through Tier 4 classifications from our 102M Enterprise Database

IAB v3

Charts display domain counts for the top 50 out of 700+ categories in our 102M Enterprise Database. To check the number of domains for the remaining 650+ categories, use the Category Counter tool above .

Page Type Classification Grid

20+ page types mapped across the entire domain database

The Complete Guide to Page-Type Detection for AI Agent Governance

Domain-level content categories became the standard for web filtering in the early 2000s, when the primary use case was preventing employees from accessing inappropriate websites at work. Categories like "Adult," "Gambling," and "Malware" served this purpose well because the risk was at the domain level — if a domain was classified as "Adult," every page on that domain was inappropriate. But AI agents do not just visit domains; they navigate to specific pages, interact with specific elements, and process specific content. The risk model has shifted from domain-level to page-level, and the governance tooling needs to shift with it.

Page-type detection fills this gap by classifying not just what a domain is about, but what each page on that domain is designed to do. A "login" page is designed to accept credentials. A "checkout" page is designed to process payments. An "admin" page is designed to modify system settings. A "documentation" page is designed to display technical information. Each of these functional types carries a distinct risk profile for autonomous agent interaction, and each demands a distinct policy response.

The 20+ Page Types in Our Classification System

Our database classifies pages into the following types, each with a clear functional definition and a recommended policy action for AI agent governance:

Homepage — the landing page of a domain. Low risk for read-only access, but may contain dynamically loaded forms or chat widgets. Policy: allow with monitoring.

About — company or organization information. Safe for agent consumption, provides useful context about the entity behind the domain. Policy: allow.

Contact — pages with contact forms, email addresses, and phone numbers. Risk: agents may attempt to fill and submit contact forms. Policy: allow viewing, block form submission.

Pricing — pricing tables, plan comparisons, and cost calculators. Valuable for research agents, but may contain interactive elements. Policy: allow with competitive intelligence logging.

Careers — job listings and recruitment pages. Low risk for reading, but agents should not submit applications. Policy: allow viewing, block form submission.

Login — authentication portals with username/password fields, SSO redirects, or OAuth buttons. Critical risk: agents should never interact with login pages. Policy: hard block.

Signup — registration forms for creating new accounts. Agents creating accounts on external services is a compliance and legal liability. Policy: hard block.

Checkout — payment processing pages with credit card fields and billing forms. Financial transaction risk. Policy: hard block.

Settings — user account configuration pages. Agents modifying settings can cause unintended consequences. Policy: hard block.

Admin — administrative dashboards and management consoles. Highest-risk page type. Policy: hard block with security alert.

Building Two-Dimensional Policy Rules

The real power of page-type detection emerges when you combine it with IAB category data in two-dimensional policy rules. Instead of flat rules like "block all Adult domains" or "allow all Technology domains," you can write nuanced rules that account for both content and function. "Allow Technology > Cloud Computing domains but block their login and admin pages." "Allow Business and Finance > Banking domains for read-only research but block checkout and settings pages." "Flag all pricing page visits on competitor domains for competitive intelligence review." These two-dimensional rules are impossible with category-only classification and impossible with page-type-only detection — they require both dimensions working together.

Interaction-Level Controls Beyond Allow/Block

Page-type detection enables a third dimension of agent controls: interaction level. Not every page visit is equal — an agent can read a page without interacting with it, interact with read-only elements (scrolling, expanding sections), or interact with write elements (filling forms, clicking buttons, making selections). For many page types, the appropriate policy is "allow viewing but disable interaction." A pricing page can be read without clicking "Buy Now." A careers page can be scanned without clicking "Apply." A contact page can be viewed for phone numbers and addresses without filling the form.

Implementing interaction-level controls requires the page-type metadata to inform the browser automation layer. When the agent's middleware detects a "contact" page type, it can configure the browser session to disable form submissions while still allowing the agent to read the page content. When it detects a "pricing" page type, it can disable click events on purchase buttons while allowing the agent to extract pricing data from the DOM. This granular interaction control is the difference between a governance layer that blocks entire pages (frustrating the agent's task) and one that allows safe data extraction while preventing dangerous interactions.

Page-Type Detection for Data Extraction Policies

Many AI agent tasks involve extracting structured data from web pages — scraping pricing information, collecting product specifications, gathering contact details. Page-type detection helps your data extraction policy engine determine what data the agent is allowed to extract from each page type. Extracting pricing data from a "pricing" page is likely within the agent's authorized scope. Extracting email addresses from a "contact" page may be subject to data protection regulations. Extracting text from a "legal" or "terms" page may involve copyrighted content. Each page type implies a different set of data extraction permissions that your policy engine should enforce.

Handling Hybrid Pages with Multiple Types

Some pages serve multiple functions simultaneously. A product page might include a pricing section, a login prompt for existing customers, and a checkout button for new purchases. In these cases, the page-type detection system assigns the primary type based on the dominant function of the page. The policy engine should apply the most restrictive policy that applies to any detected type. If a page is classified as "product" with a login component detected, the policy should treat it with the caution of a login page rather than the permissiveness of a product page. This "most restrictive wins" approach ensures that hybrid pages do not create policy gaps.

Implementation Patterns for Different Agent Frameworks

In LangChain, page-type metadata integrates as a tool output annotation. The browsing tool returns both the page content and its page-type classification, and downstream chain steps can condition their behavior on the page type. In CrewAI, page-type detection feeds into the task's guardrails configuration — each task can define which page types its agents are allowed to interact with. In custom agent architectures, page-type metadata flows through the middleware layer that sits between the agent's navigation intent and the browser automation framework, adding a pre-check before every page interaction.

Multi-Layer Agent Security

Category + Page Type + Reputation = comprehensive agent controls

Add Page-Type Intelligence to Your Agent Controls

Move beyond domain-level categories. Deploy page-type detection with 20+ labels across 102 million domains. Block login pages, allow documentation, flag pricing — all from a single database lookup.

View AI Agent Database View 102M Enterprise Database
Stay in the loop

You are on the list!

We will send you updates that matter — no spam.