Enterprise Database Integration Guide - Website Categorization API

Overview

The Enterprise Domain Database provides comprehensive categorization data for millions of domains. Each domain includes multiple classification systems:

IAB Content Taxonomy v2 - Industry standard 4-tier hierarchical categorization (698 categories)
IAB Content Taxonomy v3 - Latest IAB standard 4-tier hierarchical categorization (663 categories)
Web Filtering Category - Security and content filtering classification (43 categories)
Simplified Category - Simplified broad categorization for easy integration (33 categories)
Buyer Personas - Target audience identification for marketing applications
OpenPageRank - Domain authority score for quality assessment
Country - Geographic location/target market of the domain

Recommended Approach: For most use cases, we recommend using the Simplified Category field for primary categorization due to its simplicity, while leveraging IAB v2 or v3 categories for granular ad-tech applications.

Available File Formats

CSV Format

Comma-separated values file. Best for spreadsheet applications, database imports, and ETL pipelines.

UTF-8 encoded
Header row included
Quoted string fields
Gzip compressed option

JSON Format

JavaScript Object Notation. Best for web applications, APIs, and NoSQL databases.

UTF-8 encoded
Array of objects
NDJSON option available
Gzip compressed option

Data Dictionary

Complete field reference for the enterprise database:

Field Name	Data Type	Description	Example
Domain	String (Required)	The domain name without protocol (http/https) or www prefix	example.com
IAB v2 Tier 1 Category	String	Top-level IAB Content Taxonomy v2 category	Technology & Computing
IAB v2 Tier 2 Category	String	Second-level IAB v2 category (subcategory of Tier 1)	Computing
IAB v2 Tier 3 Category	String	Third-level IAB v2 category (may be empty)	Internet
IAB v2 Tier 4 Category	String	Fourth-level IAB v2 category (may be empty)	Cloud Computing
IAB v3 Tier 1 Category	String	Top-level IAB Content Taxonomy v3 category	Technology & Computing
IAB v3 Tier 2 Category	String	Second-level IAB v3 category (subcategory of Tier 1)	Computing
IAB v3 Tier 3 Category	String	Third-level IAB v3 category (may be empty)	Internet
IAB v3 Tier 4 Category	String	Fourth-level IAB v3 category (may be empty)	Cloud Computing
Web Filtering Category	String	Security/content filtering category for network policies	Computers & Technology
Personas	String	Target buyer personas/audience for marketing applications	Tech Enthusiasts, IT Professionals
OpenPageRank	Float	Domain authority score (0-10 scale) for quality assessment	7.85
Country	String	Geographic location or target market of the domain	United States
Simplified Category	String	Simplified broad category (33 possible values) - Additional field	Technology

Simplified Categories

The Simplified Category field uses a simplified taxonomy of 33 broad categories designed for easy integration and filtering:

AdultArts & CultureAutomotiveBusinessEducationEntertainmentEvents & AttractionsFamily & ParentingFashion & BeautyFinanceFood & DiningGamesGovernmentHealthcareHobbiesHome & GardenInternetJobs & CareersLegalMediaMoviesMusicNewsOtherPets & AnimalsReal EstateReligionScienceShoppingSocial & CommunitySportsTechnologyTravel

Tip: Use the "Other" category as a fallback for domains that don't fit well into the main categories. Consider implementing secondary categorization logic using IAB tiers for "Other" domains.

Integration Scenarios

1

Database Import (SQL/NoSQL)

Import the entire dataset into your database for high-performance lookups. Best for applications requiring sub-millisecond response times.

Recommended for: Ad-tech platforms, content filtering systems, security appliances

2

In-Memory Lookup Table

Load the dataset into memory using a hash map or dictionary for instant lookups. Requires ~2-4GB RAM depending on implementation.

Recommended for: Real-time bidding, high-frequency trading, CDN edge nodes

3

Streaming/Batch Processing

Process the file in chunks for enrichment pipelines, data warehouses, or analytics platforms.

Recommended for: ETL pipelines, data lakes, business intelligence

4

Local File Lookup

Keep the file on disk and perform lookups using binary search or indexed access. Lower memory footprint but slower lookups.

Recommended for: Desktop applications, embedded systems, low-resource environments

CSV Integration Examples

Loading and Querying CSV Data

import pandas as pd
from typing import Optional, Dict

class DomainCategorizer:
    def __init__(self, csv_path: str):
        """Load the domain database into memory."""
        # Load CSV with optimized dtypes for memory efficiency
        self.df = pd.read_csv(csv_path, dtype={
            'Domain': 'string',
            'IAB v2 Tier 1 Category': 'category',
            'IAB v2 Tier 2 Category': 'category',
            'IAB v2 Tier 3 Category': 'category',
            'IAB v2 Tier 4 Category': 'category',
            'IAB v3 Tier 1 Category': 'category',
            'IAB v3 Tier 2 Category': 'category',
            'IAB v3 Tier 3 Category': 'category',
            'IAB v3 Tier 4 Category': 'category',
            'Web Filtering Category': 'category',
            'Personas': 'string',
            'OpenPageRank': 'float64',
            'Country': 'category',
            'Simplified Category': 'category'
        })
        # Create lookup dictionary for O(1) access
        self.lookup = self.df.set_index('Domain').to_dict('index')

    def get_category(self, domain: str) -> Optional[Dict]:
        """Get all categories for a domain."""
        # Normalize domain (lowercase, no www)
        domain = domain.lower().replace('www.', '')
        return self.lookup.get(domain)

    def get_simplified_category(self, domain: str) -> Optional[str]:
        """Get simplified category."""
        result = self.get_category(domain)
        return result['Simplified Category'] if result else None

# Usage
categorizer = DomainCategorizer('enterprise_domains.csv')

# Single lookup - access all fields
result = categorizer.get_category('google.com')
print(f"IAB v2: {result['IAB v2 Tier 1 Category']}")
print(f"IAB v3: {result['IAB v3 Tier 1 Category']}")
print(f"Web Filtering: {result['Web Filtering Category']}")
print(f"Simplified: {result['Simplified Category']}")
print(f"Personas: {result['Personas']}")
print(f"PageRank: {result['OpenPageRank']}")
print(f"Country: {result['Country']}")

# Batch processing with chunks (for very large files)
def process_in_chunks(csv_path: str, chunk_size: int = 100000):
    for chunk in pd.read_csv(csv_path, chunksize=chunk_size):
        for _, row in chunk.iterrows():
            # Process each domain
            print(f"{row['Domain']}: {row['Simplified Category']}")

const fs = require('fs');
const readline = require('readline');
const { parse } = require('csv-parse');

class DomainCategorizer {
    constructor() {
        this.lookup = new Map();
    }

    async loadCSV(filePath) {
        return new Promise((resolve, reject) => {
            const parser = parse({
                columns: true,
                skip_empty_lines: true
            });

            fs.createReadStream(filePath)
                .pipe(parser)
                .on('data', (row) => {
                    const domain = row.Domain.toLowerCase().replace('www.', '');
                    this.lookup.set(domain, {
                        iabV2Tier1: row['IAB v2 Tier 1 Category'],
                        iabV2Tier2: row['IAB v2 Tier 2 Category'],
                        iabV2Tier3: row['IAB v2 Tier 3 Category'],
                        iabV2Tier4: row['IAB v2 Tier 4 Category'],
                        iabV3Tier1: row['IAB v3 Tier 1 Category'],
                        iabV3Tier2: row['IAB v3 Tier 2 Category'],
                        iabV3Tier3: row['IAB v3 Tier 3 Category'],
                        iabV3Tier4: row['IAB v3 Tier 4 Category'],
                        webFiltering: row['Web Filtering Category'],
                        personas: row['Personas'],
                        openPageRank: parseFloat(row['OpenPageRank']) || 0,
                        country: row['Country'],
                        simplified_category: row['Simplified Category']
                    });
                })
                .on('end', () => resolve())
                .on('error', reject);
        });
    }

    getCategory(domain) {
        const normalized = domain.toLowerCase().replace('www.', '');
        return this.lookup.get(normalized) || null;
    }

    getSimplifiedCategory(domain) {
        const result = this.getCategory(domain);
        return result ? result.simplified_category : null;
    }
}

// Usage
async function main() {
    const categorizer = new DomainCategorizer();
    await categorizer.loadCSV('enterprise_domains.csv');

    const result = categorizer.getCategory('google.com');
    console.log(`IAB v2: ${result.iabV2Tier1}`);
    console.log(`IAB v3: ${result.iabV3Tier1}`);
    console.log(`Simplified: ${result.simplified_category}`);
    console.log(`PageRank: ${result.openPageRank}`);
    console.log(`Country: ${result.country}`);

    // Batch lookup
    const domains = ['facebook.com', 'amazon.com', 'github.com'];
    domains.forEach(domain => {
        const cat = categorizer.getSimplifiedCategory(domain);
        console.log(`${domain}: ${cat}`);
    });
}

main().catch(console.error);

<?php

class DomainCategorizer {
    private array $lookup = [];

    public function __construct(string $csvPath) {
        $this->loadCSV($csvPath);
    }

    private function loadCSV(string $path): void {
        if (($handle = fopen($path, 'r')) !== false) {
            $headers = fgetcsv($handle);

            while (($row = fgetcsv($handle)) !== false) {
                if (count($row) === count($headers)) {
                    $data = array_combine($headers, $row);
                    $domain = strtolower(str_replace('www.', '', $data['Domain']));
                    $this->lookup[$domain] = $data;
                }
            }
            fclose($handle);
        }
    }

    public function getCategory(string $domain): ?array {
        $normalized = strtolower(str_replace('www.', '', $domain));
        return $this->lookup[$normalized] ?? null;
    }

    public function getSimplifiedCategory(string $domain): ?string {
        $result = $this->getCategory($domain);
        return $result['Simplified Category'] ?? null;
    }

    // Memory-efficient streaming for large files
    public static function processInChunks(string $path, callable $callback): void {
        if (($handle = fopen($path, 'r')) !== false) {
            $headers = fgetcsv($handle);

            while (($row = fgetcsv($handle)) !== false) {
                if (count($row) === count($headers)) {
                    $data = array_combine($headers, $row);
                    $callback($data);
                }
            }
            fclose($handle);
        }
    }
}

// Usage
$categorizer = new DomainCategorizer('enterprise_domains.csv');

$result = $categorizer->getCategory('google.com');
echo "Category: " . $result['Simplified Category'] . "\n";

// Streaming large files
DomainCategorizer::processInChunks('enterprise_domains.csv', function($row) {
    echo $row['Domain'] . ': ' . $row['Simplified Category'] . "\n";
});

import java.io.*;
import java.util.*;
import java.util.concurrent.ConcurrentHashMap;

public class DomainCategorizer {
    private Map<String, DomainData> lookup = new ConcurrentHashMap<>();

    public static class DomainData {
        public String iabTier1, iabTier2, iabTier3, iabTier4;
        public String webFiltering, simplifiedCategory;
    }

    public void loadCSV(String filePath) throws IOException {
        try (BufferedReader reader = new BufferedReader(new FileReader(filePath))) {
            String headerLine = reader.readLine();
            String[] headers = headerLine.split(",");

            String line;
            while ((line = reader.readLine()) != null) {
                String[] values = parseCSVLine(line);
                if (values.length >= 7) {
                    DomainData data = new DomainData();
                    data.iabTier1 = values[1];
                    data.iabTier2 = values[2];
                    data.iabTier3 = values[3];
                    data.iabTier4 = values[4];
                    data.webFiltering = values[5];
                    data.simplifiedCategory = values[6];

                    String domain = values[0].toLowerCase().replace("www.", "");
                    lookup.put(domain, data);
                }
            }
        }
    }

    public DomainData getCategory(String domain) {
        String normalized = domain.toLowerCase().replace("www.", "");
        return lookup.get(normalized);
    }

    public String getSimplifiedCategory(String domain) {
        DomainData result = getCategory(domain);
        return result != null ? result.simplifiedCategory : null;
    }

    private String[] parseCSVLine(String line) {
        // Handle quoted CSV fields
        List<String> result = new ArrayList<>();
        StringBuilder current = new StringBuilder();
        boolean inQuotes = false;

        for (char c : line.toCharArray()) {
            if (c == '"') {
                inQuotes = !inQuotes;
            } else if (c == ',' && !inQuotes) {
                result.add(current.toString().trim());
                current = new StringBuilder();
            } else {
                current.append(c);
            }
        }
        result.add(current.toString().trim());
        return result.toArray(new String[0]);
    }

    public static void main(String[] args) throws IOException {
        DomainCategorizer categorizer = new DomainCategorizer();
        categorizer.loadCSV("enterprise_domains.csv");

        DomainData result = categorizer.getCategory("google.com");
        System.out.println("Category: " + result.simplifiedCategory);
    }
}

using System;
using System.Collections.Generic;
using System.IO;
using Microsoft.VisualBasic.FileIO;

public class DomainCategorizer
{
    private Dictionary<string, DomainData> _lookup = new();

    public class DomainData
    {
        public string IabTier1 { get; set; }
        public string IabTier2 { get; set; }
        public string IabTier3 { get; set; }
        public string IabTier4 { get; set; }
        public string WebFiltering { get; set; }
        public string SimplifiedCategory { get; set; }
    }

    public void LoadCSV(string filePath)
    {
        using var parser = new TextFieldParser(filePath);
        parser.TextFieldType = FieldType.Delimited;
        parser.SetDelimiters(",");
        parser.HasFieldsEnclosedInQuotes = true;

        // Skip header
        string[] headers = parser.ReadFields();

        while (!parser.EndOfData)
        {
            string[] fields = parser.ReadFields();
            if (fields.Length >= 7)
            {
                var data = new DomainData
                {
                    IabTier1 = fields[1],
                    IabTier2 = fields[2],
                    IabTier3 = fields[3],
                    IabTier4 = fields[4],
                    WebFiltering = fields[5],
                    SimplifiedCategory = fields[6]
                };

                string domain = fields[0].ToLower().Replace("www.", "");
                _lookup[domain] = data;
            }
        }
    }

    public DomainData GetCategory(string domain)
    {
        string normalized = domain.ToLower().Replace("www.", "");
        return _lookup.TryGetValue(normalized, out var data) ? data : null;
    }

    public string GetSimplifiedCategory(string domain)
    {
        var result = GetCategory(domain);
        return result?.SimplifiedCategory;
    }
}

// Usage
class Program
{
    static void Main()
    {
        var categorizer = new DomainCategorizer();
        categorizer.LoadCSV("enterprise_domains.csv");

        var result = categorizer.GetCategory("google.com");
        Console.WriteLine($"Category: {result?.SimplifiedCategory}");
    }
}

package main

import (
    "encoding/csv"
    "fmt"
    "os"
    "strings"
    "sync"
)

type DomainData struct {
    IABTier1           string
    IABTier2           string
    IABTier3           string
    IABTier4           string
    WebFiltering       string
    SimplifiedCategory string
}

type DomainCategorizer struct {
    lookup map[string]DomainData
    mu     sync.RWMutex
}

func NewDomainCategorizer() *DomainCategorizer {
    return &DomainCategorizer{
        lookup: make(map[string]DomainData),
    }
}

func (dc *DomainCategorizer) LoadCSV(filePath string) error {
    file, err := os.Open(filePath)
    if err != nil {
        return err
    }
    defer file.Close()

    reader := csv.NewReader(file)

    // Skip header
    _, err = reader.Read()
    if err != nil {
        return err
    }

    for {
        record, err := reader.Read()
        if err != nil {
            break
        }

        if len(record) >= 7 {
            domain := strings.ToLower(strings.Replace(record[0], "www.", "", 1))
            dc.lookup[domain] = DomainData{
                IABTier1:           record[1],
                IABTier2:           record[2],
                IABTier3:           record[3],
                IABTier4:           record[4],
                WebFiltering:       record[5],
                SimplifiedCategory: record[6],
            }
        }
    }
    return nil
}

func (dc *DomainCategorizer) GetCategory(domain string) (DomainData, bool) {
    dc.mu.RLock()
    defer dc.mu.RUnlock()

    normalized := strings.ToLower(strings.Replace(domain, "www.", "", 1))
    data, exists := dc.lookup[normalized]
    return data, exists
}

func (dc *DomainCategorizer) GetSimplifiedCategory(domain string) string {
    if data, exists := dc.GetCategory(domain); exists {
        return data.SimplifiedCategory
    }
    return ""
}

func main() {
    categorizer := NewDomainCategorizer()
    err := categorizer.LoadCSV("enterprise_domains.csv")
    if err != nil {
        panic(err)
    }

    if data, exists := categorizer.GetCategory("google.com"); exists {
        fmt.Printf("Category: %s\n", data.SimplifiedCategory)
    }
}

require 'csv'

class DomainCategorizer
  def initialize(csv_path)
    @lookup = {}
    load_csv(csv_path)
  end

  def load_csv(path)
    CSV.foreach(path, headers: true) do |row|
      domain = row['Domain'].downcase.gsub('www.', '')
      @lookup[domain] = {
        iab_tier1: row['IAB v3 Tier 1 Category'],
        iab_tier2: row['IAB v3 Tier 2 Category'],
        iab_tier3: row['IAB v3 Tier 3 Category'],
        iab_tier4: row['IAB v3 Tier 4 Category'],
        web_filtering: row['Web Filtering Category'],
        simplified_category: row['Simplified Category']
      }
    end
  end

  def get_category(domain)
    normalized = domain.downcase.gsub('www.', '')
    @lookup[normalized]
  end

  def get_simplified_category(domain)
    result = get_category(domain)
    result&.dig(:simplified_category)
  end

  # Memory-efficient streaming
  def self.process_in_chunks(path, &block)
    CSV.foreach(path, headers: true) do |row|
      yield row.to_h
    end
  end
end

# Usage
categorizer = DomainCategorizer.new('enterprise_domains.csv')

result = categorizer.get_category('google.com')
puts "Category: #{result[:simplified_category]}"

# Streaming
DomainCategorizer.process_in_chunks('enterprise_domains.csv') do |row|
  puts "#{row['Domain']}: #{row['Simplified Category']}"
end

JSON Integration Examples

JSON Structure

[
  {
    "domain": "google.com",
    "iab_v2_tier1": "Technology & Computing",
    "iab_v2_tier2": "Computing",
    "iab_v2_tier3": "Internet",
    "iab_v2_tier4": "Search",
    "iab_v3_tier1": "Technology & Computing",
    "iab_v3_tier2": "Computing",
    "iab_v3_tier3": "Internet",
    "iab_v3_tier4": "Search",
    "web_filtering": "Search Engines & Platforms",
    "personas": "Tech Enthusiasts, IT Professionals",
    "open_page_rank": 10.0,
    "country": "United States",
    "simplified_category": "Technology"
  },
  {
    "domain": "amazon.com",
    "iab_v2_tier1": "Shopping",
    "iab_v2_tier2": "Consumer Electronics",
    "iab_v2_tier3": "",
    "iab_v2_tier4": "",
    "iab_v3_tier1": "Shopping",
    "iab_v3_tier2": "Consumer Electronics",
    "iab_v3_tier3": "",
    "iab_v3_tier4": "",
    "web_filtering": "Shopping",
    "personas": "Online Shoppers, Deal Seekers",
    "open_page_rank": 10.0,
    "country": "United States",
    "simplified_category": "Shopping"
  }
]

Loading JSON Data

import json
import ijson  # For streaming large JSON files

class DomainCategorizerJSON:
    def __init__(self, json_path: str):
        """Load JSON database into memory."""
        with open(json_path, 'r', encoding='utf-8') as f:
            data = json.load(f)

        self.lookup = {
            item['domain'].lower(): item
            for item in data
        }

    def get_category(self, domain: str):
        normalized = domain.lower().replace('www.', '')
        return self.lookup.get(normalized)

    # Streaming for large files (NDJSON)
    @staticmethod
    def stream_ndjson(path: str):
        with open(path, 'r', encoding='utf-8') as f:
            for line in f:
                if line.strip():
                    yield json.loads(line)

    # Streaming large JSON array with ijson
    @staticmethod
    def stream_json_array(path: str):
        with open(path, 'rb') as f:
            for item in ijson.items(f, 'item'):
                yield item

# Usage
categorizer = DomainCategorizerJSON('enterprise_domains.json')
result = categorizer.get_category('google.com')
print(f"Category: {result['simplified_category']}")

# Stream processing
for domain in DomainCategorizerJSON.stream_ndjson('domains.ndjson'):
    print(f"{domain['domain']}: {domain['simplified_category']}")

const fs = require('fs');
const readline = require('readline');

class DomainCategorizerJSON {
    constructor() {
        this.lookup = new Map();
    }

    async loadJSON(filePath) {
        const data = JSON.parse(fs.readFileSync(filePath, 'utf8'));
        data.forEach(item => {
            this.lookup.set(item.domain.toLowerCase(), item);
        });
    }

    getCategory(domain) {
        const normalized = domain.toLowerCase().replace('www.', '');
        return this.lookup.get(normalized) || null;
    }

    // Stream NDJSON (newline-delimited JSON)
    async *streamNDJSON(filePath) {
        const rl = readline.createInterface({
            input: fs.createReadStream(filePath),
            crlfDelay: Infinity
        });

        for await (const line of rl) {
            if (line.trim()) {
                yield JSON.parse(line);
            }
        }
    }
}

// Usage
async function main() {
    const categorizer = new DomainCategorizerJSON();
    await categorizer.loadJSON('enterprise_domains.json');

    const result = categorizer.getCategory('google.com');
    console.log(`Category: ${result.simplified_category}`);

    // Stream processing
    for await (const domain of categorizer.streamNDJSON('domains.ndjson')) {
        console.log(`${domain.domain}: ${domain.simplified_category}`);
    }
}

main();

<?php

class DomainCategorizerJSON {
    private array $lookup = [];

    public function __construct(string $jsonPath) {
        $data = json_decode(file_get_contents($jsonPath), true);
        foreach ($data as $item) {
            $domain = strtolower($item['domain']);
            $this->lookup[$domain] = $item;
        }
    }

    public function getCategory(string $domain): ?array {
        $normalized = strtolower(str_replace('www.', '', $domain));
        return $this->lookup[$normalized] ?? null;
    }

    // Stream large JSON files
    public static function streamJSON(string $path, callable $callback): void {
        $handle = fopen($path, 'r');
        $buffer = '';
        $inObject = false;
        $depth = 0;

        while (!feof($handle)) {
            $char = fgetc($handle);

            if ($char === '{') {
                $inObject = true;
                $depth++;
            }

            if ($inObject) {
                $buffer .= $char;

                if ($char === '}') {
                    $depth--;
                    if ($depth === 0) {
                        $item = json_decode($buffer, true);
                        if ($item) $callback($item);
                        $buffer = '';
                        $inObject = false;
                    }
                }
            }
        }
        fclose($handle);
    }
}

// Usage
$categorizer = new DomainCategorizerJSON('enterprise_domains.json');
$result = $categorizer->getCategory('google.com');
echo "Category: " . $result['simplified_category'] . "\n";

# Pretty print and filter with jq
cat enterprise_domains.json | jq '.[] | select(.simplified_category == "Technology")'

# Extract specific fields (all categories)
cat enterprise_domains.json | jq -r '.[] | "\(.domain),\(.iab_v2_tier1),\(.iab_v3_tier1),\(.simplified_category)"'

# Filter by country and PageRank
cat enterprise_domains.json | jq '.[] | select(.country == "United States" and .open_page_rank > 7)'

# Convert to NDJSON for streaming
cat enterprise_domains.json | jq -c '.[]' > domains.ndjson

# Count domains per Simplified category
cat enterprise_domains.json | jq -r '.[].simplified_category' | sort | uniq -c | sort -rn

# Count domains per country
cat enterprise_domains.json | jq -r '.[].country' | sort | uniq -c | sort -rn

Database Schema Examples

MySQL / MariaDB

CREATE TABLE domain_categories (
    id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
    domain VARCHAR(255) NOT NULL,
    iab_v2_tier1 VARCHAR(100),
    iab_v2_tier2 VARCHAR(100),
    iab_v2_tier3 VARCHAR(100),
    iab_v2_tier4 VARCHAR(100),
    iab_v3_tier1 VARCHAR(100),
    iab_v3_tier2 VARCHAR(100),
    iab_v3_tier3 VARCHAR(100),
    iab_v3_tier4 VARCHAR(100),
    web_filtering VARCHAR(100),
    personas TEXT,
    open_page_rank DECIMAL(4,2),
    country VARCHAR(100),
    simplified_category VARCHAR(50),
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,

    UNIQUE KEY idx_domain (domain),
    KEY idx_simplified_category (simplified_category),
    KEY idx_web_filtering (web_filtering),
    KEY idx_iab_v2_tier1 (iab_v2_tier1),
    KEY idx_iab_v3_tier1 (iab_v3_tier1),
    KEY idx_country (country)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

-- Import from CSV
LOAD DATA INFILE '/path/to/enterprise_domains.csv'
INTO TABLE domain_categories
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 ROWS
(domain, iab_v2_tier1, iab_v2_tier2, iab_v2_tier3, iab_v2_tier4,
 iab_v3_tier1, iab_v3_tier2, iab_v3_tier3, iab_v3_tier4,
 web_filtering, personas, open_page_rank, country, simplified_category);

-- Query examples
SELECT * FROM domain_categories WHERE domain = 'google.com';
SELECT simplified_category, COUNT(*) as count FROM domain_categories GROUP BY simplified_category ORDER BY count DESC;
SELECT * FROM domain_categories WHERE country = 'United States' AND open_page_rank > 5;

PostgreSQL

CREATE TABLE domain_categories (
    id BIGSERIAL PRIMARY KEY,
    domain VARCHAR(255) NOT NULL UNIQUE,
    iab_v2_tier1 VARCHAR(100),
    iab_v2_tier2 VARCHAR(100),
    iab_v2_tier3 VARCHAR(100),
    iab_v2_tier4 VARCHAR(100),
    iab_v3_tier1 VARCHAR(100),
    iab_v3_tier2 VARCHAR(100),
    iab_v3_tier3 VARCHAR(100),
    iab_v3_tier4 VARCHAR(100),
    web_filtering VARCHAR(100),
    personas TEXT,
    open_page_rank DECIMAL(4,2),
    country VARCHAR(100),
    simplified_category VARCHAR(50),
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE INDEX idx_domain_categories_simplified ON domain_categories(simplified_category);
CREATE INDEX idx_domain_categories_filtering ON domain_categories(web_filtering);
CREATE INDEX idx_domain_categories_country ON domain_categories(country);
CREATE INDEX idx_domain_categories_pagerank ON domain_categories(open_page_rank);

-- Import from CSV
COPY domain_categories(domain, iab_v2_tier1, iab_v2_tier2, iab_v2_tier3, iab_v2_tier4,
    iab_v3_tier1, iab_v3_tier2, iab_v3_tier3, iab_v3_tier4,
    web_filtering, personas, open_page_rank, country, simplified_category)
FROM '/path/to/enterprise_domains.csv'
DELIMITER ','
CSV HEADER;

MongoDB

// Create collection with schema validation
db.createCollection("domains", {
    validator: {
        $jsonSchema: {
            bsonType: "object",
            required: ["domain"],
            properties: {
                domain: { bsonType: "string" },
                iab_v2_tier1: { bsonType: "string" },
                iab_v2_tier2: { bsonType: "string" },
                iab_v2_tier3: { bsonType: "string" },
                iab_v2_tier4: { bsonType: "string" },
                iab_v3_tier1: { bsonType: "string" },
                iab_v3_tier2: { bsonType: "string" },
                iab_v3_tier3: { bsonType: "string" },
                iab_v3_tier4: { bsonType: "string" },
                web_filtering: { bsonType: "string" },
                personas: { bsonType: "string" },
                open_page_rank: { bsonType: "double" },
                country: { bsonType: "string" },
                simplified_category: { bsonType: "string" }
            }
        }
    }
});

// Create indexes
db.domains.createIndex({ domain: 1 }, { unique: true });
db.domains.createIndex({ simplified_category: 1 });
db.domains.createIndex({ web_filtering: 1 });
db.domains.createIndex({ country: 1 });
db.domains.createIndex({ open_page_rank: -1 });

// Import using mongoimport
// mongoimport --db categorization --collection domains --type json --file enterprise_domains.json --jsonArray

// Query examples
db.domains.findOne({ domain: "google.com" });
db.domains.find({ country: "United States", open_page_rank: { $gt: 7 } });
db.domains.aggregate([
    { $group: { _id: "$simplified_category", count: { $sum: 1 } } },
    { $sort: { count: -1 } }
]);

Redis (Key-Value)

# Python example for Redis import
import redis
import json
import csv

r = redis.Redis(host='localhost', port=6379, db=0)
pipe = r.pipeline()

# Import from CSV
with open('enterprise_domains.csv', 'r') as f:
    reader = csv.DictReader(f)
    for i, row in enumerate(reader):
        domain = row['Domain'].lower()
        data = json.dumps({
            'iab_v2_tier1': row['IAB v2 Tier 1 Category'],
            'iab_v3_tier1': row['IAB v3 Tier 1 Category'],
            'web_filtering': row['Web Filtering Category'],
            'personas': row['Personas'],
            'open_page_rank': row['OpenPageRank'],
            'country': row['Country'],
            'simplified_category': row['Simplified Category']
        })
        pipe.set(f"domain:{domain}", data)

        # Execute in batches
        if i % 10000 == 0:
            pipe.execute()
            pipe = r.pipeline()

    pipe.execute()

# Query
result = r.get("domain:google.com")
data = json.loads(result) if result else None
print(f"IAB v2: {data['iab_v2_tier1']}, Simplified: {data['simplified_category']}")

Performance Optimization

Memory Optimization

Use categorical data types (pandas) or interned strings to reduce memory usage by 60-80%. For millions of domains, expect ~2GB RAM with optimization vs ~8GB without.

Indexing Strategy

Always create indexes on the domain column (unique) and category columns. Use hash indexes for exact lookups and B-tree for range queries.

Streaming for Large Files

When processing the full dataset, use streaming/chunked reading instead of loading everything into memory. Process in batches of 100K-500K records.

Compression

Store compressed (gzip) copies for archival. Decompress on-the-fly during import. Typical compression ratio: 85-90% size reduction.

Update Strategy

For monthly updates, use diff-based updates or UPSERT operations rather than full table replacement to minimize downtime.

Database Updates

The Enterprise Database is updated monthly with:

New domains added from web crawls
Re-categorization of domains with changed content
Removal of expired/parked domains
IAB taxonomy updates (when applicable)

Update Delivery

Updates are delivered via:

Full Database: Complete replacement file
Delta File: Changes only (additions, modifications, deletions)
API Notification: Webhook notification when updates are available

Update Schedule: Database updates are published on the first Monday of each month. Subscribers receive email notification 48 hours before release.

Technical Support

Need help with integration? Our technical team is here to assist:

Email Support

[email protected]

Enterprise Support

Dedicated integration assistance available