Workshop Proceedings of the 16th International AAAI Conference on Web and Social Media
The COVID-19 pandemic has caused an infodemic of unlabeled misinformation across social media. Developers of misinformation detection models must selectively decide which misinformation keywords, topics, and themes to collect data on, due to the large amounts of human effort required to manually inspect and label texts and sources of posts. We aim to reduce the resource cost of this process by introducing a weakly-supervised iterative graph-based approach to detect keywords, topics, and themes related to COVID-19 misinformation. Our approach can successfully detect specific topics from general misinformation-related seed words using a low amount of seed texts. Our approach utilizes a BERT-based Word Graph Search (BWGS) algorithm that builds on context-based embeddings for retrieving misinformation-related tweets and articles. We utilize Latent Dirichlet allocation (LDA) topic modeling for obtaining misinformation-related themes from the texts returned by BWGS. Furthermore, we propose the BERT-based Multi-directional Word Graph Search (BMDWGS) algorithm that utilizes greater starting context information for misinformation extraction. In addition to a qualitative analysis of our approach, our quantitative experiments show that BWGS and BMDWGS are effective in extracting misinformation-related content compared to common baselines in low data resource settings. Extracting such content is useful for gauging which specific misconceptions are prevalent amongst the public, facilitating future campaigns to correct these false beliefs.