Workshop Proceedings of the 17th International AAAI Conference on Web and Social Media
Workshop: Novel Evaluation Approaches for Text Classification Systems (NEATCLasS)DOI: 10.36190/2023.55
Toxic language is difficult to define, as it is not monolithic and has many variations in perceptions of toxicity. This challenge of detecting toxic language is increased by the highly contextual and subjectivity of its interpretation, which can degrade the reliability of datasets and negatively affect detection model performance. To fill this void, this paper introduces a toxicity inspector framework that incorporates a human-in-the-loop pipeline with the aim of enhancing the reliability of toxicity benchmark datasets by centering the evaluator's attention through an iterative feedback cycle. The centerpiece of this framework is the iterative feedback process, which is guided by two metric types (hard and soft) that provide evaluators and dataset creators with insightful examination to balance the tradeoff between performance gains and toxicity avoidance.