Image-text crisis tweet categorization:a caption-based approach

Authors

DOI:

https://doi.org/10.59297/9j4kjp22

Keywords:

Deep Learning, Multimodal data, text/image fusion, Crisis Data

Abstract

The growth of social media usage this last decade has made available a massive and valuable volume of multimedia data. However, the lack of large multimodal annotated datasets, along with the inherent noise and the diversity of multimodal relations in this type of data presents challenges for machine learning methods. Unlike classic multimodal data, social media data comes with a large diversity of relations between image and text making the interaction between the two modalities more difficult.
Previous research concentrated on fusion strategies with separate encoders for each modality. This paper introduces CMB (Caption-based Multimodal BERT), a method of classifying crisis-related social media posts by utilizing information from both images and texts. CMB translates the image modality into a text-compatible space, facilitating intermodal interaction. 
Furthermore, CMB presents training opportunities to enhance the model's robustness to missing modalities. Experimental results show that CMB is competitive with well-established, costly, and manually crafted multimodal models.

Downloads

Download data is not yet available.

Downloads

Published

2024-05-17

How to Cite

Farah, B., Cleuziou, G. ., Gracianne, C. ., Hafiane, A., Halftermeyer, A., & Canals, R. (2024). Image-text crisis tweet categorization:a caption-based approach. Proceedings of the International ISCRAM Conference. https://doi.org/10.59297/9j4kjp22

Similar Articles

1-10 of 56

You may also start an advanced similarity search for this article.

Most read articles by the same author(s)