Download our free data sets

A number of leading organisations and educational institutions use our data sets for their predictive analysis, machine learning, and natural language processing. Our archive consists of more than 6 billion news articles from the past two decades, so we can put together any data set imaginable.

You will find five free data sets in XML and JSON format in the table below. Each data set contains approx. 1,000 articles from randomly selected news sites from around the world.

Please reach out to us at if you have any questions.

LanguageJSON formatXML format
English articles Download JSON Download XML
Spanish articles Download JSON Download XML
French articles Download JSON Download XML
German articles Download JSON Download XML
Russian articles Download JSON Download XML

The five free data sets

Each data set consists of approximately 1,000 articles in a specific language. They were collected from random news sources, and they are all available in JSON and XML format.

  • English articles
  • Spanish articles
  • French articles
  • German articles
  • Russian articles

If you have any questions or if you are looking for different samples from our archive, please contact us at

Contact us

Do you want to know more about the benefits of our products? Please fill out the form and write a few words about what you are looking for.