Workshop Proceedings of the 20th International
AAAI Conference on Web and Social Media
Workshop: Digital Minds 2026:Assessing the interplay of social media on mental health
DOI: 10.36190/2026.23Large language models are increasingly integrated into online information environments, including contexts where people seek information about sensitive and potentially harmful topics related to mental health. As these systems mediate access to such content, their safety mechanisms become part of the digital conditions shaping user risk and protection. In this paper, we use a diagnostic audit to examine how multilingual safety operates within this mediated environment. Using a two-stage, poem-based obfuscation method, we analyze responses from three frontier language models to harm-related queries in English, Spanish, and Hindi across three domains, including self-harm. The audit suggests that safety outcomes do not follow a simple hierarchy in which lower-resourced languages are consistently less protected. Instead, relative levels of protection vary by both language and model, with patterns that shift across systems and harm categories. This indicates that multilingual safety reflects model-specific alignment and design choices rather than inherent linguistic resource differences. From a digital mental health perspective, this finding has important implications for equity. Users seeking information during vulnerable moments may encounter different levels of protection depending not only on the language they use, but also on which AI system mediates their access. We argue that understanding mental health risk in online environments requires examining LLM safeguards as part of a broader ecosystem of digital mediation, moderation, and risk governance.