LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification

Authors

  • Jacob Ativo California State University, East Bay
  • Bharaneeshwar Balasubramaniyam Kansas State University https://orcid.org/0000-0001-9219-7821
  • Anh Tran Independent Researcher
  • Khushboo Gupta University of Illinois at Chicago
  • Hongmin Li California State University, East Bay
  • Doina Caragea Kansas State University
  • Cornelia Caragea University of Illinois at Chicago

DOI:

https://doi.org/10.59297/bx082870

Keywords:

Large language models, Social media, Crisis data, Model calibration, Disaster response

Abstract

Semi-supervised learning approaches have been investigated as a means to enhance the analysis of social media data in disaster management contexts. In this work, we present the first empirical evaluation of large language model (LLM) guided semi-supervised learning for crisis related tweet classification. We compare two recent LLM assisted semi-supervised methods, VerifyMatch and LLM guided Co-Training (LG-CoTrain), against established semi-supervised baselines. Our results show that LG-CoTrain significantly outperforms classical semi-supervised approaches in low resource settings with 5, 10 and 25 labeled examples per class, achieving the highest averaged Macro F1 across events. VerifyMatch achieves competitive performance while also demonstrating strong calibration properties. As the number of labeled examples increases, the performance gap narrows and Self Training emerges as a strong baseline. We further observe that compact semi-supervised models can, in some cases, outperform very large LLMs operating in zero-shot settings. This finding highlights the potential of transferring knowledge from LLMs into smaller and more deployable models through LLM guided semi-supervised learning, offering a practical pathway for real world disaster response applications. Our project repository on Github is here.

Downloads

Download data is not yet available.

Downloads

Published

2026-05-22

Conference Proceedings Volume

Section

ISCRAM Proceedings

How to Cite

Ativo, J., Balasubramaniyam, B., Tran, A., Gupta, K., Li, H., Caragea, D., & Caragea, C. (2026). LLM-guided Semi-Supervised Approaches for Social Media Crisis Data Classification. Proceedings of the International ISCRAM Conference, 23. https://doi.org/10.59297/bx082870

Similar Articles

111-120 of 245

You may also start an advanced similarity search for this article.