Workshop Proceedings of the 18th International AAAI Conference on Web and Social Media

Workshop: ICWSM 2024 Data Challenge: Research Data in a Post-API, Decentralized, and Walled Garden Web

DOI: 10.36190/2024.71

Published: 2024-06-01
Crowdsourcing and Snowball Sampling: Addressing Challenges in Data Collection from Social Media and News Platforms
Iain Cruickshank, Ian Kloo

The social media landscape is becoming increasingly fragmented and restrictive for researchers. Specifically, fewer and fewer social media sites allow programmatic or API access to their data, even for research purposes. In this paper, we propose a two-stage methodology to overcome challenges in accessing data from social media and news websites for computational social science research. Firstly, we leverage crowdsourcing to gather relevant links from users who frequently share links to posts or websites, including those within walled gardens on social media platforms. Subsequently, we employ snowball sampling to expand the dataset from the collected links, effectively identifying valuable discussions within social media platforms. Despite technical hurdles such as scraping data from sources lacking APIs and the difficulty in finding reliable seed links, our approach offers a systematic means of gathering pertinent data for computational social science research. It is hoped this extended abstract detailing past ways of collecting data can serve as a discussion point for envisioning new methods of collecting data for research.