Workshop Proceedings of the 18th International AAAI Conference on Web and Social Media
Workshop: ICWSM 2024 Data Challenge: Research Data in a Post-API, Decentralized, and Walled Garden Web
DOI: 10.36190/2024.72As social media platform application programming interfaces (APIs) are becoming more restrictive and costly to use, there is a considerable risk that researchers will be unable to address research problems related to online discourse. This paper presents a detailed examination of an effective approach to scraping X (formerly Twitter) data, leveraging the Selenium WebDriver for automated interaction with web pages. This technique circumvents the limitations of X's dynamic content generation and JavaScript-dependent interface, providing a robust alternative to traditional API-based data retrieval methods. By emulating human navigation patterns, this method offers insights into extracting real-time social media data, including tweets, likes, and retweets, which are crucial for various analytical applications.