When Social Media Images Need Words: Measuring Context Gap and Fusion Tax in Crisis Image Captioning

Amanda Hughes; Yuhao Bao

doi:10.59297/4bar0c32

Authors

Amanda Hughes Brigham Young University https://orcid.org/0000-0002-7506-3343
Yuhao Bao Brigham Young University

DOI:

https://doi.org/10.59297/4bar0c32

Keywords:

Crisis Informatics, Situational Awareness, Image Captioning, Multimodal Large Language Models, Social Media

Abstract

Crisis images on social media can be difficult to interpret at scale, and many carry meaning embedded in text or symbols (e.g., radar screenshots, evacuation notices). This limits vision-only captioning for situational awareness. We quantify a central trade-off in multimodal captioning: adding post text can reduce omission-driven ambiguity (the Context Gap), but it can also introduce text-driven errors (the Fusion Tax). Using 204 high-priority image–post pairs from CrisisFACTS, we compare Vision-only and Vision + Text captioning across Gemini 2.0 Flash, Qwen2.5-VL, and BLIP. We find that post text improves accuracy for Gemini and Qwen largely by reducing misidentification and scene-type errors, while sometimes amplifying hallucinated (unsupported) details. BLIP, however, does not reliably fuse modalities in our setup. When post text is provided, it often collapses into simple text echoing rather than producing image-grounded captions. We discuss implications for multimodal fusion in crisis informatics and outline next steps for image-type evaluation and routing.

Downloads

Download data is not yet available.

Author Biography

Amanda Hughes, Brigham Young University

Amanda L. Hughes is an Associate Professor of Computer Science at Brigham Young University. As a recognized research leader in Crisis Informatics, her work investigates the use of information and communication technology during crises and mass emergencies with particular attention to how social media and AI affect emergency response organizations. The goal of her research is to design, implement, and deploy software systems that improve crisis communications based on deep understandings of the social context in which they reside. She has published more than 80 peer-reviewed papers in the areas of human computer interaction, computer supported cooperative work, and Crisis Informatics, and is highly cited in her field. Her research is funded by grants from NSF, the Knight Foundation, and NASA SERVIR. Amanda received a bachelor's degree in Computer Science from Brigham Young University and a master's and PhD degree in Computer Science from the University of Colorado Boulder.

When Social Media Images Need Words: Measuring Context Gap and Fusion Tax in Crisis Image Captioning

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biography

Downloads

Published

Conference Proceedings Volume

Section

How to Cite

Similar Articles

Most read articles by the same author(s)

Latest publications

Language

Information