Workshop Proceedings of the 19th International AAAI Conference on Web and Social Media
Workshop: R2CASS 2025: Social Science Meets Web Data: Reproducible and Reusable Computational Approaches
DOI: 10.36190/2025.51Fake news presents misleading information as legitimate news to influence public opinion and deceive readers. Fake news detection techniques distinguish between fake news and real news, having credible information. These techniques analyze the linguistic patterns in the text, contextual inconsistencies in user responses, and propagation behavior on social networks. Unlike high-resource languages, Urdu has limited basic tools that restrict the application of state-of-the-art machine learning models for Urdu-based challenges. Therefore, the available approaches for fake news detection in Urdu do not perform well on benchmark datasets. Bag-of-words approaches consisting of frequency-based sparse vectors are often used to represent features as n-grams, which are inadequate for detecting linguistic indicators related to legitimacy in news. In this paper, we propose a methodology that uses Urdu-hack text preprocessing techniques to prepare the data, Urdu embeddings to represent the news text as dense vectors, and finally, a long short-term memory (LSTM) based deep sequence model to classify fake news. The proposed methodology outperforms traditional machine learning approaches in identifying linguistic characteristics and utilizing them for decision-making, achieving considerable performance gains with an accuracy of 85% and 83% on the Bend the Truth (BET) and Urdu fake news (UFN) datasets.