Workshop Proceedings of the 19th International AAAI Conference on Web and Social Media

Workshop: R2CASS 2025: Social Science Meets Web Data: Reproducible and Reusable Computational Approaches

DOI: 10.36190/2025.51

Published: 2025-06-05
Fake News Detection in Urdu
Muhammad Zeeshan Nazar, Shakir Ullah Shah, Muhammad Taimoor Khan

Fake news presents misleading information as legitimate news to influence public opinion and deceive readers. Fake news detection techniques distinguish between fake news and real news, having credible information. These techniques analyze the linguistic patterns in the text, contextual inconsistencies in user responses, and propagation behavior on social networks. Unlike high-resource languages, Urdu has limited basic tools that restrict the application of state-of-the-art machine learning models for Urdu-based challenges. Therefore, the available approaches for fake news detection in Urdu do not perform well on benchmark datasets. Bag-of-words approaches consisting of frequency-based sparse vectors are often used to represent features as n-grams, which are inadequate for detecting linguistic indicators related to legitimacy in news. In this paper, we propose a methodology that uses Urdu-hack text preprocessing techniques to prepare the data, Urdu embeddings to represent the news text as dense vectors, and finally, a long short-term memory (LSTM) based deep sequence model to classify fake news. The proposed methodology outperforms traditional machine learning approaches in identifying linguistic characteristics and utilizing them for decision-making, achieving considerable performance gains with an accuracy of 85% and 83% on the Bend the Truth (BET) and Urdu fake news (UFN) datasets.