Workshop Proceedings of the 18th International AAAI Conference on Web and Social Media

Workshop: ICWSM 2024 Data Challenge: Research Data in a Post-API, Decentralized, and Walled Garden Web

DOI: 10.36190/2024.72

Published: 2024-06-01
Overcoming Social Media API Restrictions: Building an Effective Web Scraper
Nicholas Harrell, Iain Cruickshank, Alexander Master

As social media platform application programming interfaces (APIs) are becoming more restrictive and costly to use, there is a considerable risk that researchers will be unable to address research problems related to online discourse. This paper presents a detailed examination of an effective approach to scraping X (formerly Twitter) data, leveraging the Selenium WebDriver for automated interaction with web pages. This technique circumvents the limitations of X's dynamic content generation and JavaScript-dependent interface, providing a robust alternative to traditional API-based data retrieval methods. By emulating human navigation patterns, this method offers insights into extracting real-time social media data, including tweets, likes, and retweets, which are crucial for various analytical applications.