Introduction to Batch Processing API

High-Performance Processing: Our Batch Processing API handles thousands of URLs efficiently with asynchronous processing, real-time status tracking, and flexible result formats.

The Batch Processing API allows you to submit large collections of URLs for categorization in a single request. This is ideal for:

  • Large-scale analysis - Process thousands of URLs at once
  • Periodic batch jobs - Regular categorization of domain lists
  • Data migration - Categorizing existing URL databases
  • Research projects - Academic or commercial research requiring bulk categorization

Key Features

Flexible Input

Support for CSV, JSON, and TXT file formats. Upload files up to 50MB with up to 50,000 URLs per batch.

Real-time Tracking

Monitor job progress with detailed status information, processing logs, and estimated completion times.

Multiple Output Formats

Download results in JSON or CSV format with comprehensive categorization data for each URL.

Secure & Reliable

Enterprise-grade security with encrypted file handling and automatic cleanup of processed data.

Authentication

API Key Required: All batch processing requests require a valid API key with sufficient credits.

Authentication for batch processing uses the same API key system as the regular API. Your API key must have sufficient credits to cover all URLs in your batch.

Credit Calculation: Each URL in your batch consumes 1 credit. The system checks your available credits before processing begins.

Supported File Formats

CSV Format

CSV files should contain one URL per row in the first column. Additional columns are ignored.

Example CSV content:
example.com
www.google.com
https://github.com
facebook.com
twitter.com
            

JSON Format

JSON files can be structured as an array of URLs or an object with a "domains" array.

Example JSON formats:
{
  "domains": [
    "example.com",
    "www.google.com",
    "https://github.com",
    "facebook.com",
    "twitter.com"
  ]
}
            

Or as a simple array:

[
  "example.com",
  "www.google.com",
  "https://github.com",
  "facebook.com",
  "twitter.com"
]
            

TXT Format

Plain text files with one URL per line.

example.com
www.google.com
https://github.com
facebook.com
twitter.com
            

Submitting Batch Jobs

Submit a batch job by uploading a file with your URLs to the batch upload endpoint.

Endpoint

POST https://www.websitecategorizationapi.com/api/batch/batch_upload.php

Parameters

Parameter Type Required Description
api_key string Yes Your API key
file file Yes File containing URLs (CSV, JSON, or TXT)

Example Request

              
curl -X POST \
  -H 'Content-Type: multipart/form-data' \
  -F 'api_key=your_api_key_here' \
  -F '[email protected]' \
  'https://www.websitecategorizationapi.com/api/batch/batch_upload.php'
            

Example Response

Successful batch job submission (HTTP 202):
{
  "status": "success",
  "job_id": "batch_64f8a1b2c3d4e_1694123456",
  "total_domains": 1000,
  "estimated_processing_time": 2000,
  "status_url": "https://www.websitecategorizationapi.com/api/batch/status.php?job_id=batch_64f8a1b2c3d4e_1694123456&api_key=your_api_key_here",
  "message": "Batch job created successfully. Processing will begin shortly."
}
            

Checking Job Status

Monitor the progress of your batch job using the status endpoint. Jobs are processed asynchronously, so you'll need to poll this endpoint to track progress.

Endpoint

GET https://www.websitecategorizationapi.com/api/batch/status.php

Parameters

Parameter Type Required Description
api_key string Yes Your API key
job_id string Yes Job ID returned from batch upload

Example Request

              
curl 'https://www.websitecategorizationapi.com/api/batch/status.php?job_id=batch_64f8a1b2c3d4e_1694123456&api_key=your_api_key_here'
            

Job Status Values

Status Description
queued Job is waiting to be processed
processing Job is currently being processed
completed Job has completed successfully
failed Job has failed due to an error

Example Response (Processing)

{
  "job_id": "batch_64f8a1b2c3d4e_1694123456",
  "status": "processing",
  "progress": 45.5,
  "total_domains": 1000,
  "processed_domains": 455,
  "created_at": "2023-09-08 10:30:45",
  "updated_at": "2023-09-08 10:45:20",
  "filename": "urls.csv",
  "estimated_completion": "2023-09-08 11:15:30",
  "estimated_remaining_seconds": 1810,
  "recent_logs": [
    {
      "level": "info",
      "message": "Processing batch 5 of 10",
      "created_at": "2023-09-08 10:45:20"
    },
    {
      "level": "info", 
      "message": "Processed 455 domains successfully",
      "created_at": "2023-09-08 10:45:15"
    }
  ]
}
            

Example Response (Completed)

{
  "job_id": "batch_64f8a1b2c3d4e_1694123456",
  "status": "completed",
  "progress": 100.0,
  "total_domains": 1000,
  "processed_domains": 1000,
  "created_at": "2023-09-08 10:30:45",
  "updated_at": "2023-09-08 11:15:30",
  "completed_at": "2023-09-08 11:15:30",
  "filename": "urls.csv",
  "processing_time_seconds": 2685,
  "average_time_per_domain": 2.685,
  "credits_consumed": 1000,
  "download_links": {
    "json": "https://www.websitecategorizationapi.com/api/batch/download.php?job_id=batch_64f8a1b2c3d4e_1694123456&format=json&api_key=your_api_key_here",
    "csv": "https://www.websitecategorizationapi.com/api/batch/download.php?job_id=batch_64f8a1b2c3d4e_1694123456&format=csv&api_key=your_api_key_here"
  },
  "results_summary": {
    "successful": 987,
    "failed": 8,
    "skipped": 5
  },
  "recent_logs": [
    {
      "level": "info",
      "message": "Batch job completed successfully",
      "created_at": "2023-09-08 11:15:30"
    }
  ]
}
            

Retrieving Results

Once your batch job is completed, you can download the results in JSON or CSV format.

Endpoint

GET https://www.websitecategorizationapi.com/api/batch/download.php

Parameters

Parameter Type Required Description
api_key string Yes Your API key
job_id string Yes Job ID returned from batch upload
format string Optional Result format: "json" or "csv" (default: json)

Example Request (JSON)

              
curl -O 'https://www.websitecategorizationapi.com/api/batch/download.php?job_id=batch_64f8a1b2c3d4e_1694123456&format=json&api_key=your_api_key_here'
            

Example Request (CSV)

              
curl -O 'https://www.websitecategorizationapi.com/api/batch/download.php?job_id=batch_64f8a1b2c3d4e_1694123456&format=csv&api_key=your_api_key_here'
            

JSON Result Format

The JSON results file contains an array of objects, one for each processed URL:

[
  {
    "url": "example.com",
    "status": "success",
    "iab_classification": [
      [
        "Category name: Technology & Computing > Computing > Computer Software and Applications",
        "Confidence: 0.95"
      ]
    ],
    "filtering_taxonomy": [
      [
        "Category name: Computers & Technology",
        "Confidence: 1.0"
      ]
    ],
    "technologies": [
      {
        "name": "Google Analytics",
        "confidence": 100,
        "category": "Analytics"
      }
    ],
    "processing_time": 2.1
  },
  {
    "url": "invalid-domain.xyz",
    "status": "failed",
    "error": "Domain could not be resolved",
    "processing_time": 0.5
  }
]
            

CSV Result Format

The CSV results file contains the following columns:

  • url - The processed URL
  • status - success/failed
  • primary_category - Top IAB category
  • confidence - Confidence score
  • all_categories - All categories (JSON encoded)
  • error_message - Error details if failed
  • processing_time - Time taken in seconds

Error Handling

Comprehensive Error Handling: The batch API provides detailed error responses for efficient troubleshooting.

Common Error Codes

Error Code Description Resolution
400 Invalid file format or no file uploaded Check file format (CSV/JSON/TXT) and ensure file is attached
401 Invalid or missing API key Verify API key is correct and active
402 Insufficient credits Purchase additional credits or reduce batch size
404 Job not found Verify job ID is correct and job exists
413 File too large or too many URLs Reduce file size (max 50MB) or URL count (max 50,000)
500 Server error Contact support if error persists

Example Error Response

{
  "status": "error",
  "error": "Insufficient credits. Required: 1000, Available: 250",
  "code": 402
}
            

Best Practices

Polling Frequency

Poll status every 30-60 seconds for large jobs. Avoid excessive polling to prevent rate limiting.

Data Validation

Validate URLs in your input file to minimize failed requests and optimize credit usage.

Result Storage

Download and store results promptly. Files are automatically cleaned up after 7 days.

Batch Sizing

Optimal batch sizes are 1,000-10,000 URLs. Very large batches may take several hours to process.

Need Help?

For technical support or questions about batch processing, contact us at [email protected]