Introduction to Batch Processing API
The Batch Processing API allows you to submit large collections of URLs for categorization in a single request. This is ideal for:
- Large-scale analysis - Process thousands of URLs at once
- Periodic batch jobs - Regular categorization of domain lists
- Data migration - Categorizing existing URL databases
- Research projects - Academic or commercial research requiring bulk categorization
Key Features
Flexible Input
Support for CSV, JSON, and TXT file formats. Upload files up to 50MB with up to 50,000 URLs per batch.
Real-time Tracking
Monitor job progress with detailed status information, processing logs, and estimated completion times.
Multiple Output Formats
Download results in JSON or CSV format with comprehensive categorization data for each URL.
Secure & Reliable
Enterprise-grade security with encrypted file handling and automatic cleanup of processed data.
Authentication
Authentication for batch processing uses the same API key system as the regular API. Your API key must have sufficient credits to cover all URLs in your batch.
Credit Calculation: Each URL in your batch consumes 1 credit. The system checks your available credits before processing begins.
Supported File Formats
CSV Format
CSV files should contain one URL per row in the first column. Additional columns are ignored.
example.com
www.google.com
https://github.com
facebook.com
twitter.com
JSON Format
JSON files can be structured as an array of URLs or an object with a "domains" array.
{
"domains": [
"example.com",
"www.google.com",
"https://github.com",
"facebook.com",
"twitter.com"
]
}
Or as a simple array:
[
"example.com",
"www.google.com",
"https://github.com",
"facebook.com",
"twitter.com"
]
TXT Format
Plain text files with one URL per line.
example.com
www.google.com
https://github.com
facebook.com
twitter.com
Submitting Batch Jobs
Submit a batch job by uploading a file with your URLs to the batch upload endpoint.
Endpoint
POST https://www.websitecategorizationapi.com/api/batch/batch_upload.php
Parameters
Parameter | Type | Required | Description |
---|---|---|---|
api_key | string | Yes | Your API key |
file | file | Yes | File containing URLs (CSV, JSON, or TXT) |
Example Request
curl -X POST \
-H 'Content-Type: multipart/form-data' \
-F 'api_key=your_api_key_here' \
-F '[email protected]' \
'https://www.websitecategorizationapi.com/api/batch/batch_upload.php'
Example Response
{
"status": "success",
"job_id": "batch_64f8a1b2c3d4e_1694123456",
"total_domains": 1000,
"estimated_processing_time": 2000,
"status_url": "https://www.websitecategorizationapi.com/api/batch/status.php?job_id=batch_64f8a1b2c3d4e_1694123456&api_key=your_api_key_here",
"message": "Batch job created successfully. Processing will begin shortly."
}
Checking Job Status
Monitor the progress of your batch job using the status endpoint. Jobs are processed asynchronously, so you'll need to poll this endpoint to track progress.
Endpoint
GET https://www.websitecategorizationapi.com/api/batch/status.php
Parameters
Parameter | Type | Required | Description |
---|---|---|---|
api_key | string | Yes | Your API key |
job_id | string | Yes | Job ID returned from batch upload |
Example Request
curl 'https://www.websitecategorizationapi.com/api/batch/status.php?job_id=batch_64f8a1b2c3d4e_1694123456&api_key=your_api_key_here'
Job Status Values
Status | Description |
---|---|
queued | Job is waiting to be processed |
processing | Job is currently being processed |
completed | Job has completed successfully |
failed | Job has failed due to an error |
Example Response (Processing)
{
"job_id": "batch_64f8a1b2c3d4e_1694123456",
"status": "processing",
"progress": 45.5,
"total_domains": 1000,
"processed_domains": 455,
"created_at": "2023-09-08 10:30:45",
"updated_at": "2023-09-08 10:45:20",
"filename": "urls.csv",
"estimated_completion": "2023-09-08 11:15:30",
"estimated_remaining_seconds": 1810,
"recent_logs": [
{
"level": "info",
"message": "Processing batch 5 of 10",
"created_at": "2023-09-08 10:45:20"
},
{
"level": "info",
"message": "Processed 455 domains successfully",
"created_at": "2023-09-08 10:45:15"
}
]
}
Example Response (Completed)
{
"job_id": "batch_64f8a1b2c3d4e_1694123456",
"status": "completed",
"progress": 100.0,
"total_domains": 1000,
"processed_domains": 1000,
"created_at": "2023-09-08 10:30:45",
"updated_at": "2023-09-08 11:15:30",
"completed_at": "2023-09-08 11:15:30",
"filename": "urls.csv",
"processing_time_seconds": 2685,
"average_time_per_domain": 2.685,
"credits_consumed": 1000,
"download_links": {
"json": "https://www.websitecategorizationapi.com/api/batch/download.php?job_id=batch_64f8a1b2c3d4e_1694123456&format=json&api_key=your_api_key_here",
"csv": "https://www.websitecategorizationapi.com/api/batch/download.php?job_id=batch_64f8a1b2c3d4e_1694123456&format=csv&api_key=your_api_key_here"
},
"results_summary": {
"successful": 987,
"failed": 8,
"skipped": 5
},
"recent_logs": [
{
"level": "info",
"message": "Batch job completed successfully",
"created_at": "2023-09-08 11:15:30"
}
]
}
Retrieving Results
Once your batch job is completed, you can download the results in JSON or CSV format.
Endpoint
GET https://www.websitecategorizationapi.com/api/batch/download.php
Parameters
Parameter | Type | Required | Description |
---|---|---|---|
api_key | string | Yes | Your API key |
job_id | string | Yes | Job ID returned from batch upload |
format | string | Optional | Result format: "json" or "csv" (default: json) |
Example Request (JSON)
curl -O 'https://www.websitecategorizationapi.com/api/batch/download.php?job_id=batch_64f8a1b2c3d4e_1694123456&format=json&api_key=your_api_key_here'
Example Request (CSV)
curl -O 'https://www.websitecategorizationapi.com/api/batch/download.php?job_id=batch_64f8a1b2c3d4e_1694123456&format=csv&api_key=your_api_key_here'
JSON Result Format
The JSON results file contains an array of objects, one for each processed URL:
[
{
"url": "example.com",
"status": "success",
"iab_classification": [
[
"Category name: Technology & Computing > Computing > Computer Software and Applications",
"Confidence: 0.95"
]
],
"filtering_taxonomy": [
[
"Category name: Computers & Technology",
"Confidence: 1.0"
]
],
"technologies": [
{
"name": "Google Analytics",
"confidence": 100,
"category": "Analytics"
}
],
"processing_time": 2.1
},
{
"url": "invalid-domain.xyz",
"status": "failed",
"error": "Domain could not be resolved",
"processing_time": 0.5
}
]
CSV Result Format
The CSV results file contains the following columns:
- url - The processed URL
- status - success/failed
- primary_category - Top IAB category
- confidence - Confidence score
- all_categories - All categories (JSON encoded)
- error_message - Error details if failed
- processing_time - Time taken in seconds
Error Handling
Common Error Codes
Error Code | Description | Resolution |
---|---|---|
400 | Invalid file format or no file uploaded | Check file format (CSV/JSON/TXT) and ensure file is attached |
401 | Invalid or missing API key | Verify API key is correct and active |
402 | Insufficient credits | Purchase additional credits or reduce batch size |
404 | Job not found | Verify job ID is correct and job exists |
413 | File too large or too many URLs | Reduce file size (max 50MB) or URL count (max 50,000) |
500 | Server error | Contact support if error persists |
Example Error Response
{
"status": "error",
"error": "Insufficient credits. Required: 1000, Available: 250",
"code": 402
}
Best Practices
Polling Frequency
Poll status every 30-60 seconds for large jobs. Avoid excessive polling to prevent rate limiting.
Data Validation
Validate URLs in your input file to minimize failed requests and optimize credit usage.
Result Storage
Download and store results promptly. Files are automatically cleaned up after 7 days.
Batch Sizing
Optimal batch sizes are 1,000-10,000 URLs. Very large batches may take several hours to process.