Workshop Proceedings of the 19th International AAAI Conference on Web and Social Media

Workshop: #SMM4H-HeaRD 2025: Joint 10th Social Media Mining for Health and Health Real-World Data Workshop and Shared Tasks

DOI: 10.36190/2025.52

Published: 2025-06-05
Female Autism in Natural Language – A Corpus Paper
Nadine Probol, Margot Mieskes

Autism Spectrum Disorder (ASD) is a condition, which gets diagnosed more frequently in recent years. This increase in people on the autism spectrum raises the need for more and also more efficient indicators for the developmental disorder. The aim of this work is to show how it is possible to create a reproducible data set of speech and transcribed data of women and men on the spectrum. The data is collected from YouTube, Instagram, and TikTok with the explicit consent of the people donating their recordings. This includes problems due to ASD, tricks and tips to improve life and help to understand neurotypical (NT) reactions to their behaviour. This dataset should not be used in any way to replace a professional diagnosis, but rather to find linguistic indicators of ASD which might be of use to support a diagnosis.