Workshop Proceedings of the 19th International AAAI Conference on Web and Social Media
Workshop: #SMM4H-HeaRD 2025: Joint 10th Social Media Mining for Health and Health Real-World Data Workshop and Shared Tasks
DOI: 10.36190/2025.56This paper outlines our contributions to Task 1 and Task 6 of the #SMM4H-HeaRD 2025 Workshop, both focused on the binary classification of social media text reporting adverse medical events. Task 1 involves detecting adverse drug events (ADEs) in multilingual texts written in German, French, Rus-sian, and English. Task 6 targets the identification of adverse reactions to herpes zoster vaccines in Reddit posts. We em-ployed transfer learning using the encoder-based RoBERTa model as our baseline, alongside the large language model (LLM) Mistral. Our focus was on enhancing LLM perfor-mance by introducing an iterative error-driven data augmenta-tion strategy, in which false predictions were paraphrased us-ing the LLM, added back to the training set, and retraining the model from scratch. This refinement loop was repeated twice for both tasks. Our augmentation strategy enabled the Mistral model to outperform the RoBERTa baseline, with the best-performing Mistral models achieving an F1 score of 0.96 in Task 6 and a macro-averaged F1 score of 0.709 across lan-guages in Task 1, resulting in the highest performance among all participants in the Task 1 competition. To support third-party use, we have released our best-performing models for both Task 1 and Task 6 on Hugging Face.