Data & Code

We have curated and released datasets from our research studies, all publicly available for academic and non-commercial use. Each dataset complies with the respective platform’s data-sharing policies; for example, X datasets adhere to X’s official data access and redistribution guidelines. Our accompanying code repositories ensure transparency and reproducibility across all analyses. If you would like additional information or access details, please feel free to contact us via email.


Datasets

1. Rabobank Dataset

The dataset contains anonymized records of bank accounts and transactions within Rabobank over an 11-year period from 2010 to 2020. For every pair of accounts with at least one transaction, the data includes the number of transactions and the total amount of money transferred between them. In total, the dataset covers 1,624,030 bank accounts and 4,127,043 transactions, represented as (from account, to account) pairs.

Two weighted network representations are provided:
(i) GT Network: Edge weight represents the total money transferred between two accounts.
(ii) GN Network: Edge weight represents the total number of transactions between two accounts.

The dataset is available at: https://github.com/akratiiet/RaboBank_Dataset

2. Soccer X Dataset

The dataset comprises soccer-related tweets collected from Twitter over a three-month period (March – June, 2022) using a comprehensive set of keywords and hashtags related to global and regional football events. Gender information of users was inferred using Genderize.io and Namepedia, and tweets from users with undetermined gender were excluded.

For English (en) tweets, the dataset contains 6,957,598 tweets5,767,122 from male users and 1,190,476 from female users. For Portuguese (pt) tweets, the dataset includes 2,572,247 tweets, comprising 2,011,286 from males and 560,961 from females, contributed by 365,045 male and 148,539 female unique users.

Across both languages, male users form the majority in both total tweet volume and user participation throughout the collection period. The dataset is available at: https://github.com/akratiiet/Soccer-Twitter-Dataset

3. Academic Twitter Dataset

This dataset was collected from Twitter for research purposes and utilized in the "Academic Twitter: Gender-based Differences in Content, Emotion, and Public Response" research paper. It contains information about academic users, their posts, and replies. The data can be used for studies involving academic communication, social media activity patterns, network analysis, or sentiment dynamics in academic communities.

The dataset is available at: https://github.com/akratiiet/Academic_Twitter_Dataset

4. Dutch Politics

This dataset was collected from Bluesky and contains posts related to political discussions in Dutch and English. Data collection focused on 467 political hashtags, identified through an iterative expansion process that began with hashtags of Dutch governing parties and filtered to retain only those related to politics. Posts containing these hashtags were gathered over a three-month period (March 17 – June 17, 2025). Each record includes both the post content and the corresponding user metadata. The dataset comprises 38,824 posts from 7,229 unique users. User gender was inferred using NameAPI’s genderize function based on display names (or handles when unavailable). This dataset provides a valuable resource for analyzing political discourse and user participation patterns in Dutch social media spaces. The dataset is available upon request via email.

5. EU Politician X Dataset

This dataset consists of tweets from 160 highly followed politicians across France, Germany, Italy, and the United Kingdom, collected to study populist communication patterns. Politicians were selected based on party affiliation following Norris (2020) and updated classifications of populist parties in recent literature. For each country, the sample includes 10 left-wing males, 10 left-wing females, 10 right-wing males, and 10 right-wing females. Tweets were retrieved using the X (Twitter) API for the period February 1, 2022 – February 24, 2023, covering the timeline of major geopolitical events. Gender was inferred from Twitter profile metadata, including pronouns and self-descriptions. The dataset is available upon request via email due to X guidelines.

6. Educational Websites Web Tracking Dataset

This dataset comprises approximately 50,000 educational websites, with tracking data collected annually from 2013 to 2025. The final version includes only those websites for which tracker information is consistently available across all years, enabling longitudinal analysis of web tracking trends in the education sector. Each website was manually labelled as homepage, online learning platform, institutional site, language learning, or essay writing service. The dataset is available upon request via email.

7. Netherlands WebTracking Dataset

The dataset includes information on all 2.4m websites from the Netherlands and the trackers embedded within them for the year 2025. It provides detailed insights into the tracking technologies used across Dutch web domains. In addition, the complete website text has been collected to support deeper content and tracking analysis. Access to the full text dataset can be provided upon request via email.