This archive generally contains structured metadata—often in RDF or CSV format—linking millions of URLs to human-categorized topics like "Sports," "Science," or "Arts". "TDDLI" often refers to specialized subsets used in academic papers or machine learning models. Strengths:
Since DMOZ officially closed in March 2017, a significant portion of the URLs in this archive may lead to dead links or parked domains.
About Dataset. This is an url classification dataset from dmoz directory. There are 15 class for classification.
Highly recommended for researchers looking to train text-classification models or explore the historical structure of the early-to-mid-2000s internet. Community Perspectives
Unlike machine-generated lists, DMOZ data was curated by over 90,000 volunteer editors, making the classifications highly accurate for its time.
This archive generally contains structured metadata—often in RDF or CSV format—linking millions of URLs to human-categorized topics like "Sports," "Science," or "Arts". "TDDLI" often refers to specialized subsets used in academic papers or machine learning models. Strengths:
Since DMOZ officially closed in March 2017, a significant portion of the URLs in this archive may lead to dead links or parked domains.
About Dataset. This is an url classification dataset from dmoz directory. There are 15 class for classification.
Highly recommended for researchers looking to train text-classification models or explore the historical structure of the early-to-mid-2000s internet. Community Perspectives
Unlike machine-generated lists, DMOZ data was curated by over 90,000 volunteer editors, making the classifications highly accurate for its time.