M-CATNAT: A Multimodal dataset to analyze French tweets during natural disasters
DOI:
https://doi.org/10.59297/yhq8bb90Keywords:
Deep Learning, French multimodal data, Crisis management, CrisisMMD datasetAbstract
The proliferation of social media, especially platforms like X (formerly Twitter), has made available a large volume of real-time data valuable across diverse fields. During natural disasters, such data aids humanitarian efforts by providing crucial insights. However, processing this vast amount of data necessitates automated systems, often relying on annotated datasets for training. While supervised learning dominates this area, multilingual and multimodal annotated datasets are scarce. The present study addresses this gap by introducing M-CATNAT, a multimodal dataset of French tweets about natural disasters. Unlike previous datasets, M-CATNAT integrates annotations for texts, images, and their multimodal combination. Leveraging CrisisMMD guidelines, this work in progress aims to annotate 1,430 tweets, generating over 4,500 labels. The M-CATNAT dataset not only expands resources to non-English languages but also enhances multimodal analysis by furnishing three levels of annotation for each tweet (one per modality plus one for the whole tweet).