Workshop Proceedings of the 19th International AAAI Conference on Web and Social Media

Workshop: #SMM4H-HeaRD 2025: Joint 10th Social Media Mining for Health and Health Real-World Data Workshop and Shared Tasks

DOI: 10.36190/2025.69

Published: 2025-06-05
HSE NLP Team at #SMM4H-HeaRD 2025: Hybrid LLM and Multilingual BERT Ensemble for Adverse Drug Event Detection
Airat Valiev

We present a hybrid system for multilingual adverse drug event (ADE) detection in social media, developed for the #SMM4H-HeaRD 2025 shared task. Our approach combines large language models (LLMs) with domain-adapted BERT ensembles, addressing the challenges of extreme class imbalance and linguistic diversity across German, French, Russian, and English user-generated texts. To improve recall in low-resource languages, we generated synthetic ADE-positive samples using LLM-based data augmentation informed by biomedical NER and UMLS knowledge. Our pipeline dynamically integrates medication-specific few-shot prompts, language-specific BERT checkpoints, and ensemble decision strategies, including the use of BERT expert predictions as hints for LLMs. On the official test set, our best submission-GPT-4o few-shot with EuroBERT ensemble hints-achieved an F1-score of 0.5669 for the positive class, outperforming most baseline and ensemble configurations. These results demonstrate that fusing LLM reasoning, biomedical entity linking, and targeted augmentation can substantially improve ADE detection in multilingual, imbalanced social media datasets.