LLM-Powered Automatic Translation and Urgency in Crisis Scenarios
DOI:
https://doi.org/10.59297/vtqcsb52Keywords:
AI-mediated communication, crisis translation, large language modelsAbstract
Large language models (LLMs) are increasingly proposed for crisis preparedness and response, particularly for multilingual communication. However, their suitability for high-stakes crisis contexts remains insufficiently evaluated. This work examines the performance of state-of-the-art LLMs and machine translation systems in crisis-domain translation, with a focus on preserving urgency, a critical property for effective crisis communication and triage. Using multilingual crisis data (TICO-19, 30 languages) and a newly introduced urgency-annotated dataset of 100 scenarios translated into 29 languages, we show that dedicated translation models and LLMs exhibit substantial quality degradation, particularly for low-resource languages. Beyond translation quality, we conduct a human annotation study revealing a striking asymmetry: human assessors maintain consistent urgency judgments regardless of prompt language, while LLM-based urgency classifications vary widely across languages for identical scenarios, at times spanning the full range from Not Urgent to Critical. These findings highlight significant risks in deploying general-purpose language technologies for crisis triage and underscore the need for multilingual, human-centered evaluation frameworks.