Workshop Proceedings of the 19th International AAAI Conference on Web and Social Media

Workshop: #SMM4H-HeaRD 2025: Joint 10th Social Media Mining for Health and Health Real-World Data Workshop and Shared Tasks

DOI: 10.36190/2025.57

Published: 2025-06-05
LLM Pros at SMM4H-HeaRD 2025 Data Extraction Using Prompt Engineering And Structured Outputs (Task 1, 2, 3, 4, 5, 6)
Aatish Pradhan, Brian M. Habersberger, James H. Wade, Denver Dsouza, Nihal Paul

Six tasks released for the SMM4H-HeaRD 2025 workshop were addressed with a unified large‑language‑model (LLM) pipeline that relies on prompt engineering, strictly enforced JSON schemas and lightweight rule sets. The pipeline utilizes no task‑specific fine‑tuning and can be practiced with minor modifications across a variety of data. The goal of this study was to demonstrate that general-purpose and widely available large language models (LLMs) are capable of understanding and extracting crucial health information. The systems achieved the highest-ranked submission scores on the official leaderboard for Task 2 (non-medical substance use), all three Subtasks of Task 4 (insomnia), and Subtask 2 of Task 5 (foodborne outbreak entity extraction). A detailed worklow on insomnia (Task 4) illustrates how sleep‑difficulty rules, daytime‑impairment rules and medication lists interacted. Shorter descriptions are provided for the remaining five tasks. On the test data sets, the systems obtained an F1-score of 0.4234 for Task 2, an F1-score of 0.9670 for Task 4 (Subtask 1), an F1-score of 0.9064 for Task 4 (Subtask 2a), an F1-score of 0.6822 for Task 4 (Subtask 2b), and an average score of 0.576 for Task 5 (Subtask 2).