Russian state institutions full-text datasets
A collection of corpora based on contents extracted from the websites of Russian state institutions
Version 1.0, published: Oct. 29, 2024. Open Access 123
Description
This is a collection of full-text datasets based on contents extracted from the websites of Russian state institutions.
All datasets do not include items published after 31 December 2023.
These datasets have been introduced in the following book chapter, which offers additional context:
> Comai, Giorgio (2025, forthcoming), "Text-mining on-line sources from Russia openly", in *Autocracy, Influence, War: Russian Propaganda Today*, edited by Paul Goode
The name of each corpus is composed of the bare domain name, a two letter code of the main language of the contents, and the year of release of the dataset, separated by an underscore, e.g. `kremlin.ru_ru_2024` for the Russian-language version of Kremlin.ru.
This release includes the following websites:
- Russia’s president, kremlin.ru, in English, filename: kremlin.ru_en_2024, from 1999-12-31 to 2023-12-31. Items included: 33 165
- Russia’s president, kremlin.ru, in Russian, filename: kremlin.ru_ru_2024, from 1999-12-31 to 2023-12-31. Items included: 45 538
- Russia’s MFA, mid.ru, in English, filename: mid.ru_en_2024, from 2003-01-04 to 2023-12-31. Items included: 25 943
- Russia’s MFA, mid.ru, in Russian, filename: mid.ru_ru_2024, from 2003-01-02 to 2023-12-31. Items included: 56 203
- Russia’s government, government.ru, in Russian, filename: government.ru_ru_2024, from 2006-06-22 to 2023-12-30. Items included: 17 135
- Russia’s government (archived version), archive.government.ru, in Russian, filename: archive.government.ru_ru_2024, from 2008-05-07 to 2013-05-21. Items included: 7 103
- Russia’s prime minister (archived version), archive.premier.gov.ru, in Russian, filename: archive.premier.gov.ru_ru_2024, from 2008-05-07 to 2012-05-07. Items included: 3 323
- Russia’s Duma, duma.gov.ru, in Russian, filename: duma.gov.ru_ru_2024, from 2006-04-05 to 2023-12-30. Items included: 29 094
- Russia’s Duma (transcripts), transcript.duma.gov.ru, in Russian, filename: transcript.duma.gov.ru_ru_2024, from 1994-01-11 to 2023-12-15. Items included: 6 032
File formats: compressed csv files (.csv.gz); Open Document Spreadsheets (.ods)
A web version of the documentation accompanying this release is available online:
https://tadadit.xyz/datasets/2024/russian_institutions_2024/
Explore through a basic web interface:
https://explore.tadadit.xyz/2024/ru_institutions_2024/
Countries
Keywords
Russian President Text Mining Russian Institutions Parliament Government
Language of data
Disciplines
Communication Studies Political Science