Workshop Proceedings of the 16th International AAAI Conference on Web and Social Media
Despite the potential for antisocial and counter-productive social media behavior, particularly in the context of humanitarian assistance/disaster response (HA/DR), there is a paucity of automated methods to address it. Current methods focus primarily on detecting hate speech and banning problematic content. We propose an alternative strategy of using automated counter speech to focus not just on moderating uncivil behavior, but also the promotion of civil discourse. In this paper, we propose a novel framework to employ pre-trained language models to alleviate the bottlenecks in adoption of such counter speech, namely a lack of understanding on the dynamics of counter speech and a scarcity of well-curated datasets, which are compounded in HA/DR settings. We utilize GPT LMs to create a conversational testbed to simulate online conversations where various approaches for counter speech and other content moderation methods can be evaluated. Additionally, we leverage BERT-based models to detect hate speech and other network and syntactic features to suggest the optimal strategy to employ. We also present empirical results on the experiments we have conducted which provide a proof of concept for the framework.