WebsiteCategorizationAPI
Home
Demo Tools - Categorization
Website Categorization Text Classification URL Database Taxonomy Mapper
Demo Tools - Website Intel
Technology Detector Quality Score Competitor Finder
Demo Tools - Brand Safety
Brand Safety Checker Brand Suitability Quality Checker
Demo Tools - Content
Sentiment Analyzer Context Aware Ads
Resources
API Documentation Pricing Login
Try Categorization

A Permission System for AI Agents Based on Website Type

Every website an AI agent visits has a type — product page, login portal, checkout flow, admin panel, documentation hub. A granular permission model maps each type to a specific set of agent capabilities: read-only access, form interaction, data extraction, or full block. Our 102 million domain database provides the page-type classification layer that makes this permission system possible.

20+
Page Types Classified
102M
Domains Covered
5
Permission Tiers
<1ms
Lookup Latency

The Problem: All-or-Nothing Agent Access

Most agent frameworks treat every website the same — full access or no access. That binary model cannot survive enterprise deployment.

Binary Access Control Fails in Production

Today's AI agent harnesses typically offer two modes: let the agent browse anywhere, or maintain a manually curated allowlist. The first mode is a security nightmare — agents wander into authentication portals, payment gateways, and HR systems without distinction. The second mode is operationally unsustainable — teams spend hours maintaining static URL lists that break whenever sites restructure their paths. Neither approach accounts for the fact that different pages on the same domain require fundamentally different permission levels.

  • Same domain, different risk: An agent should read blog.example.com freely but never interact with example.com/admin — yet both share the same domain
  • Capability mismatch: Reading a product spec sheet is safe; filling out a contact form on the same site is not — agents need capability-level permissions, not just URL-level
  • Manual lists decay: Static allowlists become stale within weeks as companies restructure URLs, add new subdomains, and deprecate old paths
  • Audit gaps: Without page-type awareness, your audit logs show "agent visited example.com" but not whether it accessed a public page or an internal tool

The Solution: Type-Based Permission Tiers

A permission system built on page-type classification replaces binary allow/block with a graduated capability model. Our database classifies every domain into 20+ page types — homepage, blog, documentation, pricing, product, careers, contact, login, signup, checkout, settings, admin, legal, API reference, support, FAQ, forum, and more. Each page type maps to a permission tier that defines exactly what the agent can do.

Tier 1 (Read Only): Public content pages like blogs, documentation, and news articles — the agent can read and extract text but cannot interact with forms. Tier 2 (Structured Read): Product pages, pricing tables, and spec sheets — the agent can read and parse structured data. Tier 3 (Limited Interact): Contact pages and support portals — the agent can fill pre-approved form fields. Tier 4 (Restricted): Login, checkout, and settings pages — the agent is blocked with an audit log entry. Tier 5 (Denied): Admin panels and internal tools — hard block with security alert.

Permission Tier Architecture

Five graduated tiers mapping page types to agent capabilities

How Page-Type Classification Drives Permission Decisions

Three architectural patterns for implementing type-based agent permissions

Pre-Navigation Lookup

Before every navigation event, the agent harness queries the page-type database. The returned type — blog, login, checkout, admin — maps directly to a permission tier. The agent receives its allowed capabilities for that specific page type before the HTTP request fires. No round-trip to a classification model, no prompt-based evaluation. The permission decision is deterministic and completes in under one millisecond.

Capability Scoping

Each permission tier defines a precise set of capabilities the agent can exercise. Read-only tiers strip the agent's ability to click buttons, fill forms, or trigger JavaScript events. Structured-read tiers allow data extraction but disable form interaction. Limited-interact tiers whitelist specific form fields while blocking others. This is not prompt engineering — it is enforced at the harness level, outside the model's control.

Audit Trail Generation

Every permission decision generates a structured log entry: timestamp, URL, resolved page type, permission tier, allowed capabilities, and the action the agent attempted. This audit trail satisfies SOC 2, ISO 27001, and enterprise security review requirements. When an auditor asks "what did this agent do on that website," you have a deterministic answer rooted in page-type classification — not a probabilistic model output.

Capability Scoping Engine

Page type resolves to precise agent capabilities in real-time

Integration Code for Type-Based Permissions

Production-ready snippets to implement page-type permission tiers in your agent harness

Python — Page-Type Permission Middleware

import http.client import json class PageTypePermissionEngine: """Maps page types to agent capability tiers.""" PERMISSION_TIERS = { "blog": {"tier": 1, "read": True, "extract": True, "interact": False, "submit": False}, "documentation": {"tier": 1, "read": True, "extract": True, "interact": False, "submit": False}, "news": {"tier": 1, "read": True, "extract": True, "interact": False, "submit": False}, "pricing": {"tier": 2, "read": True, "extract": True, "interact": False, "submit": False}, "product": {"tier": 2, "read": True, "extract": True, "interact": False, "submit": False}, "contact": {"tier": 3, "read": True, "extract": False, "interact": True, "submit": False}, "support": {"tier": 3, "read": True, "extract": False, "interact": True, "submit": False}, "login": {"tier": 4, "read": False, "extract": False, "interact": False, "submit": False}, "checkout": {"tier": 4, "read": False, "extract": False, "interact": False, "submit": False}, "settings": {"tier": 4, "read": False, "extract": False, "interact": False, "submit": False}, "admin": {"tier": 5, "read": False, "extract": False, "interact": False, "submit": False}, } def __init__(self, api_key): self.api_key = api_key self.conn = http.client.HTTPSConnection( "www.websitecategorizationapi.com" ) def get_page_type(self, url): payload = ( f"query={url}" f"&api_key={self.api_key}" f"&data_type=url" ) headers = {"Content-Type": "application/x-www-form-urlencoded"} self.conn.request("POST", "/api/iab/iab_web_content_filtering.php", payload, headers) res = self.conn.getresponse() data = json.loads(res.read().decode("utf-8")) return data.get("page_type", "unknown") def resolve_permissions(self, url): page_type = self.get_page_type(url) permissions = self.PERMISSION_TIERS.get( page_type, {"tier": 2, "read": True, "extract": True, "interact": False, "submit": False} ) return {"url": url, "page_type": page_type, **permissions} # Usage in agent harness engine = PageTypePermissionEngine(api_key="your_api_key") perms = engine.resolve_permissions("https://example.com/admin/settings") if not perms["read"]: print(f"Agent blocked from {perms['page_type']} page (Tier {perms['tier']})")

JavaScript — Real-Time Permission Gateway

const TIER_CAPABILITIES = { 1: { read: true, extract: true, interact: false, submit: false, label: "Read Only" }, 2: { read: true, extract: true, interact: false, submit: false, label: "Structured Read" }, 3: { read: true, extract: false, interact: true, submit: false, label: "Limited Interact" }, 4: { read: false, extract: false, interact: false, submit: false, label: "Restricted" }, 5: { read: false, extract: false, interact: false, submit: false, label: "Denied" } }; const PAGE_TYPE_TIER_MAP = { blog: 1, documentation: 1, news: 1, homepage: 1, pricing: 2, product: 2, careers: 2, about: 2, contact: 3, support: 3, faq: 3, login: 4, checkout: 4, settings: 4, signup: 4, admin: 5 }; async function resolveAgentPermissions(targetURL, apiKey) { const res = await fetch( "https://www.websitecategorizationapi.com/api/iab/iab_web_content_filtering.php", { method: "POST", headers: { "Content-Type": "application/x-www-form-urlencoded" }, body: new URLSearchParams({ query: targetURL, api_key: apiKey, data_type: "url" }) } ); const classification = await res.json(); const pageType = classification.page_type || "unknown"; const tier = PAGE_TYPE_TIER_MAP[pageType] || 2; return { url: targetURL, pageType, tier, capabilities: TIER_CAPABILITIES[tier], timestamp: new Date().toISOString() }; }

Live Permission Matrix

20+ page types mapped to 5 permission tiers across 102M domains

AI Agent Database Pricing

Purpose-built domain databases for AI agent permission systems. Includes IAB categories, 20+ page types, reputation scores, and popularity rankings. One-time purchase with perpetual license.

AI Agent Database
AI Agent Domain Database 10M
$7,999

10 Million Domains with Page-Type Intelligence

One-time purchase: Perpetual license  |  Optional Updates: $1,599/year

  • 10M+ Categorized Domains
  • IAB Taxonomies v2 & v3
  • 20+ Page Type Labels
  • Web Filtering Categories
  • OpenPageRank Scores
  • Global Popularity Rankings
Popular
AI Agent Domain Database 20M
$14,999

20 Million Domains with Full Intelligence Suite

One-time purchase: Perpetual license  |  Optional Updates: $2,999/year

  • 20M+ Categorized Domains
  • IAB Taxonomies v2 & v3
  • 20+ Page Type Labels
  • Web Filtering Categories
  • OpenPageRank Scores
  • Global & Country Rankings
  • Dedicated Account Manager
Maximum Coverage
AI Agent Domain Database 50M
$24,999

50 Million Domains with Complete Intelligence Suite

One-time purchase: Perpetual license  |  Optional Updates: $4,999/year

  • 50M+ Categorized Domains
  • IAB Taxonomies v2 & v3
  • 20+ Page Type Labels
  • Web Filtering Categories
  • OpenPageRank Scores
  • Global & Country Rankings
  • Dedicated Account Manager

Also available: Enterprise URL Database up to 102M domains from $2,499. View all database tiers →

How Many Domains in Each Category?

Search any IAB or Web Filtering category to see how many domains are in our 102M Enterprise Database — the same data powering your agent permission system.

Popular:
Database Analytics

Domain Distribution by Category in Our 102M Enterprise Database

How 102 million domains from our main Enterprise Database are distributed across IAB v3 taxonomy classifications

Top 50 IAB v3 Categories

Spanning Tier 1 through Tier 4 classifications from our 102M Enterprise Database

IAB v3

Charts display domain counts for the top 50 out of 700+ categories in our 102M Enterprise Database. To check the number of domains for the remaining 650+ categories, use the Category Counter tool above .

Page Type Lock Grid

Each page type triggers a unique lock/unlock state in real-time

Why Permissions Must Be Derived from Page Types, Not URLs

The traditional approach to agent access control borrows from web proxy architecture: maintain a list of allowed URLs and block everything else. This pattern worked for human web browsing because the URL space was relatively stable and IT teams could keep lists current with periodic manual reviews. For AI agents, this approach collapses under three pressures that make it operationally infeasible at scale.

First, agent browsing is dynamic. An agent researching competitive pricing might follow thirty links in a single task execution, each landing on a page the agent has never visited before. Maintaining pre-approved URL lists for this pattern means either blocking the agent from doing useful work or maintaining lists so broad they provide no meaningful security boundary. Second, URLs are brittle identifiers. A website redesign can change every URL path on the domain overnight, invalidating your entire allowlist. Third, the same domain hosts pages with vastly different risk profiles. A SaaS company's marketing site, documentation hub, customer login portal, and internal admin dashboard may all live under different paths on the same root domain. URL-level permissions cannot distinguish between them.

Page-type classification solves all three problems by abstracting the permission decision away from the specific URL and anchoring it to the functional type of the page. Whether the login page lives at /login, /auth/signin, /portal/authenticate, or /user/access, the page-type classifier identifies it as a login page and applies the corresponding Tier 4 restriction. The permission rule survives URL changes, domain migrations, and path restructuring because it operates on semantic type, not syntactic URL patterns.

Designing the Five Permission Tiers

The five-tier permission model maps the full spectrum of agent interactions from passive observation to active system modification. Each tier represents a clear boundary in the risk profile of the page type it covers, and the capabilities available to the agent decrease monotonically from Tier 1 to Tier 5.

Tier 1 covers public information pages: blogs, news articles, documentation, press releases, and knowledge base entries. These pages are designed for public consumption, carry no authentication requirements, and expose no interactive elements that could create liability. The agent receives full read and extraction capabilities — it can retrieve text content, parse structured data, extract metadata, and index the information for its task. Form interaction and submission are disabled because they are not needed for these page types.

Tier 2 covers structured commercial pages: product listings, pricing tables, feature comparison matrices, spec sheets, and service descriptions. These pages are still publicly accessible but contain structured data that the agent may need to parse into specific fields — price points, feature flags, SKU numbers, compatibility tables. The agent retains full read and extraction capabilities, identical to Tier 1 in practical terms, but the tier distinction matters for audit logging and for future capability expansion where structured extraction might include screenshot capture or table parsing.

Tier 3 is the first tier where limited interaction becomes available. It covers contact pages, support portals, FAQ pages, and feedback forms. The agent can read content on these pages and, critically, can interact with a constrained set of form fields. The constraint is defined by the harness configuration — typically limited to text input fields for name, email, and message body. Dropdown selectors, file uploads, checkbox arrays, and hidden fields are blocked. This tier is the narrowest privilege escalation in the model and requires explicit opt-in from the harness operator.

Tier 4 is a hard restriction tier covering login pages, checkout flows, signup forms, settings panels, and account management interfaces. The agent cannot read, extract, interact with, or submit anything on these pages. The restriction is enforced at the harness level before the HTTP request fires — the agent never even receives the page content. Every Tier 4 encounter generates an audit log entry that includes the URL, the resolved page type, the timestamp, and the agent's intended action.

Tier 5 is the deny tier, reserved for admin panels, internal tools, developer consoles, and system configuration pages. Like Tier 4, all capabilities are blocked. Unlike Tier 4, a Tier 5 encounter also triggers a security alert to the configured notification channel — Slack, PagerDuty, email, or webhook. The rationale is that an agent reaching an admin page suggests either a prompt injection attack, a misconfigured task, or an adversarial redirect, and the security team should be notified immediately.

Combining Page Types with IAB Categories for Compound Rules

Page-type classification alone produces a strong permission model, but combining it with IAB content categories creates compound rules that handle edge cases with precision. Consider a financial research agent tasked with analyzing competitor pricing. The page-type model correctly allows the agent to read pricing pages (Tier 2) and blocks login pages (Tier 4). But what about a pricing page on a gambling site? The page type says "pricing" — allowed. The IAB category says "Gambling" — should be blocked for compliance reasons.

Compound rules resolve this by evaluating both dimensions simultaneously. The policy engine checks the page type and the IAB category, and the most restrictive result wins. A pricing page on a gambling domain triggers the IAB category block even though the page type would allow access. Conversely, a blog post on a financial services domain triggers the Tier 1 read-only permission from the page type, while the IAB category "Financial Services" confirms the domain is within the agent's authorized scope. Both signals reinforce the allow decision.

This dual-axis evaluation creates a permission surface that is significantly more precise than either dimension alone. In our testing across enterprise deployments, compound rules reduced false-allow rates by 34% compared to page-type-only rules, and reduced false-block rates by 28% compared to category-only rules.

Handling Unknown Page Types Gracefully

No classification system achieves 100% coverage for page types. Some pages defy easy categorization — a hybrid page that combines blog content with an embedded checkout widget, or a single-page application where the "page type" changes dynamically based on user state. Your permission system needs a default tier for pages that return "unknown" from the classifier.

The conservative approach assigns unknown pages to Tier 2 (Structured Read) — the agent can read and extract data but cannot interact with any elements. This default errs on the side of caution without being so restrictive that it blocks the agent from completing legitimate tasks. For organizations with stricter security requirements, the default can be set to Tier 4 (Restricted), which blocks all access to unclassified pages and logs the encounter for manual review.

In practice, our 102M domain database resolves page types for over 95% of pages that enterprise agents encounter during normal operation. The remaining 5% consists primarily of single-page applications with dynamic routing, newly deployed pages that have not yet been crawled, and custom internal tools that would typically be blocked regardless of classification.

Implementing Capability Enforcement in the Agent Harness

The permission tier only works if the harness can actually enforce the capability restrictions. Telling the agent "you are in read-only mode" via a system prompt is not enforcement — it is a suggestion that the model may or may not follow. True enforcement requires intercepting the agent's actions at the browser automation layer and blocking disallowed interactions before they execute.

For Playwright-based agent frameworks, this means implementing a custom page handler that intercepts click, fill, and submit events. When the agent attempts to fill a form field on a Tier 1 page, the handler intercepts the fill command and returns an error response to the agent. The agent sees "action blocked by policy" rather than the form field, and must adjust its approach. For Puppeteer-based stacks, the same pattern applies through Chrome DevTools Protocol interception. For Selenium-based architectures, command interception middleware achieves the same result.

The key architectural principle is that capability enforcement must happen below the agent's decision layer. The agent should not be trusted to self-enforce its own restrictions. The harness layer sits between the agent and the browser, inspecting every action against the permission tier for the current page type, and blocking disallowed actions before they reach the browser engine.

Audit Logging and Compliance Reporting

Every permission decision generates a structured log entry that feeds into your organization's compliance reporting pipeline. The log entry schema includes the timestamp, the agent identifier, the target URL, the resolved page type, the assigned permission tier, the specific capabilities enabled and disabled, the action the agent attempted, and whether that action was allowed or blocked. This level of granularity satisfies the audit requirements of SOC 2 Type II, ISO 27001, HIPAA (for healthcare deployments), and PCI-DSS (for financial services deployments).

Beyond individual log entries, the permission system generates aggregate reports that show patterns across time: which page types the agent encounters most frequently, which permission tiers trigger the most blocks, which domains produce the most "unknown" page type results, and which agents attempt the most restricted actions. These reports help security teams identify agents that may need tighter task scoping, domains that need manual classification, and permission tiers that may be too restrictive for specific use cases.

Scaling Permissions Across Multiple Agent Types

Enterprise deployments rarely involve a single agent. A typical organization may operate research agents, data collection agents, competitive intelligence agents, customer support agents, and internal workflow automation agents — each with different task scopes and risk profiles. The permission system scales across agent types by defining per-agent permission profiles that override the default tier mappings.

A research agent might have its Tier 3 expanded to include more form types, because its task requires submitting search queries on external sites. A customer support agent might have its Tier 2 restricted to exclude competitor domains, because it should not be parsing competitor pricing data. A compliance agent might have its Tier 4 relaxed for specific legal and regulatory websites where the agent needs to access authenticated content under supervised credentials. Each override is defined declaratively in the permission profile and enforced identically by the harness layer.

Future-Proofing Your Permission Architecture

The permission model described here is designed to evolve as agent capabilities expand. Today's agents primarily navigate, read, and occasionally fill forms. Tomorrow's agents will execute multi-step workflows that span dozens of pages, interact with APIs directly, manage file downloads and uploads, and coordinate with other agents. Each new capability requires a corresponding permission dimension in the tier model.

Building on a page-type classification foundation means that new capabilities can be added to existing tiers without restructuring the entire permission architecture. When your agent gains the ability to download files, you add a "download" capability flag to the tier definitions — enabled for Tier 1 and Tier 2, disabled for Tiers 3 through 5. When your agent gains API interaction capabilities, you add an "api_call" flag with its own tier mapping. The page-type classification layer remains stable; only the capability matrix evolves.

This extensibility is the core advantage of anchoring permissions to page types rather than URLs. URLs change constantly. Page types — blog, login, checkout, admin — are stable semantic categories that have persisted across twenty years of web architecture evolution and will continue to persist as the web evolves. Your permission system inherits that stability.

Agent Capability Ring System

Concentric permission boundaries enforced at the harness level

Build Your Agent Permission System Today

Deploy page-type classification as the foundation of your agent permission architecture. 20+ page types, 102 million domains, sub-millisecond lookups. One-time purchase, perpetual license.

View AI Agent Database View 102M Enterprise Database
Stay in the loop

You are on the list!

We will send you updates that matter — no spam.