Image-text crisis tweet categorization:a caption-based approach



Deep Learning, Multimodal data, text/image fusion, Crisis Data


The growth of social media usage this last decade has made available a massive and valuable volume of multimedia data. However, the lack of large multimodal annotated datasets, along with the inherent noise and the diversity of multimodal relations in this type of data presents challenges for machine learning methods. Unlike classic multimodal data, social media data comes with a large diversity of relations between image and text making the interaction between the two modalities more difficult.
Previous research concentrated on fusion strategies with separate encoders for each modality. This paper introduces CMB (Caption-based Multimodal BERT), a method of classifying crisis-related social media posts by utilizing information from both images and texts. CMB translates the image modality into a text-compatible space, facilitating intermodal interaction. 
Furthermore, CMB presents training opportunities to enhance the model's robustness to missing modalities. Experimental results show that CMB is competitive with well-established, costly, and manually crafted multimodal models.


Download data is not yet available.




How to Cite

Farah, B., Cleuziou, G. ., Gracianne, C. ., Hafiane, A., Halftermeyer, A., & Canals, R. (2024). Image-text crisis tweet categorization:a caption-based approach. ISCRAM Proceedings, 21.

Similar Articles

1-10 of 56

You may also start an advanced similarity search for this article.

Most read articles by the same author(s)