Workshop Proceedings of the 19th International AAAI Conference on Web and Social Media
Workshop: NLPSI 2025: First Workshop on Integrating NLP and Psychology to Study Social Interactions
DOI: 10.36190/2025.35Pragmatic inference remains a persistent challenge even for state-of-the-art large language models (LLMs). While prior efforts have focused on explicit tuning or training, this paper suggests that their limitations may not stem from a lack of inherent capability but from the absence of a multi-agent setup-since pragmatics, by nature, emerges in interaction. This study therefore proposes an adversarial multi-agent LLM framework, modeled after courtroom dynamics, in which a Semantic and a Pragmatic Agent debate scalar implicatures-an essential test of pragmatic competence-and a neutral Judge Agent adjudicates to reveal whether LLMs can derive pragmatic inferences or still default to literal semantics. Experimental results show a clear difference between the single-agent and multi-agent conditions. In the former, the Judge Agent received the same prompt but without access to agent reasoning, and defaulted to a semantic interpretation as observed in prior findings. In the latter, however, the same model successfully derived the implicature-closely aligning with human scalar implicature patterns. This contrast suggests that effective pragmatic reasoning in LLMs arises not from additional tuning, but from contextualized interaction-specifically the kind enabled by a multi-agent framework, which allows latent pragmatic abilities to surface through exposure to competing interpretations. Their pragmatic inference, however, is not indiscriminate. It is modulated by discourse structure, computing scalar implicatures when the implicature-bearing term is foregrounded but defaulting to semantic interpretations when less salient. This study also contributes to the ongoing debate on whether LLMs genuinely engage in pragmatic reasoning or merely simulate it statistically, raising broader discussions about their validity as cognitive models of human linguistic behavior.