Workshop Proceedings of the 19th International AAAI Conference on Web and Social Media
Workshop: #SMM4H-HeaRD 2025: Joint 10th Social Media Mining for Health and Health Real-World Data Workshop and Shared Tasks
DOI: 10.36190/2025.54Post-marketing drug surveillance often suffers from under-reporting and delays in identifying adverse drug reactions (ADRs). While benchmark datasets and shared tasks--particularly from SMM4H--have advanced ADR detection methods, they primarily focus on general populations and known ADRs. This study complements those efforts by focusing on epilepsy and aiming for the discovery of unknown ADRs. We collected user-generated posts from r/Epilepsy and the Epilepsy Foundation of America (EFA) forums, curated a drug-symptom dictionary, and developed a classification pipeline that combines sentiment analysis with relation classification. Sentiment polarity serves as a putative interpretable characterization of patient experience, while relation classification determines whether co-mentioned terms reflect an ADR or a drug indication. Labels from the SIDER database were used for distant supervision, enabling scalable, domain-adaptable automation without manual annotation. Identifying unknown ADRs remains particularly challenging, as they are rarely annotated or included in available databases. Our classifier demonstrates strong generalization to such cases by leveraging patterns in real-world discourse. When evaluated on the 2025 SMM4H Shared Task 1, it achieved a high precision of 0.80--which is desirable since false positives may lead to misleading surveillance hypotheses and costly but unnecessary follow-up efforts. Manual validation on reddit and EFA further demonstrates the ability of our classifier to identify self-reporting of unknown ADRs. Overall, our work demonstrates that community-focused social media mining, informed by sentiment analysis, can enrich pharmacovigilance pipelines and increase interpretability of automated, low-cost drug safety warnings.