DOI: 10.36190/2021.16

Published: 2021-06-01
Toxicity-Associated News Classification: The Impact of Metadata and Content Features
Paula Fortuna, Luís B. Cruz, Rodrigo Maia, Vanessa Cortez, Sérgio Nunes

In this work, we study toxicity-associated news, which we define as news with a high percentage of toxic comments. Little research exists in this topic. We address two open questions: (i) Is the comments' toxicity related to the textual content of the news pieces?, and (ii) Are there other contextual factors of the news (i.e. metadata) interfering with the presence of toxicity in its comments? To answer, we annotate 1,995,560 Twitter messages for toxicity, which are replies to 29,726 news pieces from 25 newspapers from the UK and USA.We experiment with content and metadata features and use both classical machine learning and BERT classifiers. We found that metadata features have the best performance when used to train a GDBT classifier (F1 = 0.723). This was the case even when comparing with BERT. Additionally, we contribute to future studies of toxicity-associated news by providing an annotated dataset to the community. With this resource, it is possible to further investigate the effect of other content and metadata-based features to identify toxicity-associated news.