Workshop Proceedings of the 18th International AAAI Conference on Web and Social Media

Workshop: CySoc 2024: 5th International Workshop on Cyber Social Threats

DOI: 10.36190/2024.07

Published: 2024-06-01
A Dataset of Podcasts from Rumble Spanning 2020 to 2022
Utkucan Balci, Jay Patel, Berkan Balci, Jeremy Blackburn

Rumble has emerged as a prominent platform hosting controversial figures facing restrictions on YouTube. Despite this, the academic community's engagement with Rumble has been minimal. To help researchers address this gap, we introduce a comprehensive dataset of about 6.7K podcast videos from August 2020 to December 2022, amounting to over 5.6K hours of content. Besides covering metadata of these podcast videos, we provide speech-to-text transcriptions for future analysis. We also provide speaker diarization information, a collection of 168K unique representative images from podcast videos, and face embeddings of more than 400K extracted faces. With the rise of the influence of podcasts and populist figures, this dataset provides a rich resource to identify challenges in cyber social threats in a relatively underexplored space.