Persian Web Dataset

Persian Language Website Database

Navigate a distinct digital ecosystem. Our database captures websites written in Farsi (فارسی), covering the enclosed web of Iran (using local alternatives to global platforms) and the vibrant international diaspora.

650K+

Active Persian Sites

Local Apps

Unique Tech Stack

Poetry

Rich Cultural Content

Inside the "Halal Net"

The Persian web is one of the most isolated yet active ecosystems in the world. Blocked from many global services, local developers have built robust alternatives for ride-hailing (Snapp), e-commerce (Digikala), and video (Aparat).

Our database uses Right-to-Left (RTL) analysis and specific Farsi character detection (gāf, che, pe, zhe) to differentiate it from Arabic. We help you understand this self-contained market and the diverse content created by millions of Persian speakers worldwide.

Local Platform Detection

Understanding the Iranian stack:

  • Shaparak Integration: Identification of the national payment gateway system, a sure sign of a domestic business.
  • Hosting Location: We flag sites hosted on domestic Iranian servers (Intranet) vs international hosting.

Strategic Use Cases

Diaspora Marketing

Target the affluent Persian communities in Los Angeles ("Tehrangeles"), Toronto, and London. Filter for Farsi sites hosted outside Iran.

Literature & Arts

Persian culture is literary. Access thousands of blogs, publishers, and cultural portals preserving classical poetry and modern art.

App Store Analytics

Cafe Bazaar is the local Android store. Identify the landing pages of top Iranian mobile apps and games.

Farsi NLP Training

Farsi is a low-resource language for many Western AI models. Use our corpus to train on the specific script, font rendering, and grammar of Persian.

Unlock the Persian Web

Get the most accurate, verified list of Farsi-language websites.

Get the Data