How to start categorizing websites in 2 minutes


Step 1: Purchase Plan

After purchasing the plan and logging in, you can use dashboard for categorizations. However, if you need to categorize a large number of texts/websites then we advise you to use our API endpoints. You will see your API key in the dashboard, like this:

Your API key: $2y$10$CbkLX8Ba5n5nD0wSEiuYJ.l3N6K.Ivp4E39eB

Step 2: Prepare list of URLs/domains you want to classify

Next step is to prepare a list of URLs/domains that you want to classify. You can store them in a file named urls.csv with each URL in a new line, like this:
www.sportaza.com
www.geogebra.org
www.zeit.de
www.volksverpetzer.de
www.mobrog.com
www.s.cint.com
www.myfreefarm.com
www.rp-online.de
www.bing.com
www.travelex.de
www.alluc.co
www.patreon.com
www.museumspass.com


Step 3: Run the script for classification

After storing the URLs, you should create a python script with the following content:
import requests
import json

# your API key (available in dashboard, after login)
api_key = 'b4dcde2ce5fb2d0b887b5eb6f0cd'

# set this to correct API endpoint, for more information on available API endpoints please see #https://www.websitecategorizationapi.com/api.php
url_api = "https://www.websitecategorizationapi.com/api/iab/iab_category1_url.php"

# name of file where categorizations of URLs will be stored
f_write = open('results.csv','w')

with open('urls.csv','r') as f:
    urls = f.readlines()
    for url1 in urls:
       url1 = url1.replace('\n','')
       if (('http://' not in url1) or ('https://' not in url1)):
       url = 'http://'+url1
       url = requests.utils.quote(url)
       print(url)
       payload='query='+url+'&api_key='+api_key+'&data_type=url'
       headers = {
        'Content-Type': 'application/x-www-form-urlencoded'
       }

       response = requests.request("POST", url_api, headers=headers, data=payload)
       print(response.text)
       data = json.loads(response.text)
       try:
          category = data['classification'][0]['category']
       except:
          category = 'url could not be loaded'
       print(category)
       f_write.write(url1+','+category+'\n')
       f_write.flush()

f_write.close()


and run it from command line with (assuming you stored it under name classification.py):
python3 classification.py

The results of classifications will then be written out in file results.csv, in the format URL, Category. Example output:
www.klamm.de,Events and Attractions
www.geogebra.org,Education
www.myfreefarm.com,Hobbies & Interests
www.rp-online.de,News and Politics
www.alluc.co,Movies
www.patreon.com,Hobbies & Interests
www.kicker.de,Sports
www.neckermann.at,Style & Fashion
www.visitdenmark.de,Travel
www.check24.de,Personal Finance
www.lotterien.at,Shopping
www.nuvisan.de,Medical Health
www.nw.de,Business and Finance
www.arkadium.com,Hobbies & Interests
www.nitro.download,Technology & Computing
www.robinson.com,Travel
www.gymnasium-marktoberdorf.de,Education


These results were obtained by calling the API Endpoint for IAB Tier 1 classification (using iab_category1_url.php in code above).

If you need other types of classifications, you need to replace the corresponding line in python code above, using the definitions from our API documentation..

If you need help on this, please send us an email and we will be happy to help you.