Enlighten Publications

In this section

CrisisViT: A Robust Vision Transformer for Crisis Image Classification

Long, Z. , Mccreadie, R. and Imran, M. (2023) CrisisViT: A Robust Vision Transformer for Crisis Image Classification. In: 20th International Conference on Information Systems for Crisis Response and Management (ISCRAM 2023), Omaha, NE, USA, 28-31 May 2023, pp. 309-319. ISBN 9798218217495 (doi: 10.59297/SDSM9194)

Text
295808.pdf - Accepted Version
328kB

Abstract

In times of emergency, crisis response agencies need to quickly and accurately assess the situation on the ground in order to deploy relevant services and resources. However, authorities often have to make decisions based on limited information, as data on affected regions can be scarce until local response services can provide first-hand reports. Fortunately, the widespread availability of smartphones with high-quality cameras has made citizen journalism through social media a valuable source of information for crisis responders. However, analyzing the large volume of images posted by citizens requires more time and effort than is typically available. To address this issue, this paper proposes the use of state-of-the-art deep neural models for automatic image classification/tagging, specifically by adapting transformer-based architectures for crisis image classification (CrisisViT). We leverage the new Incidents1M crisis image dataset to develop a range of new transformer-based image classification models. Through experimentation over the standard Crisis image benchmark dataset, we demonstrate that the CrisisViT models significantly outperform previous approaches in emergency type, image relevance, humanitarian category, and damage severity classification. Additionally, we show that the new Incidents1M dataset can further augment the CrisisViT models resulting in an additional 1.25% absolute accuracy gain.

Item Type:	Conference Proceedings
Keywords:	Social media classification, crisis management, deep learning, vision transformers, supervised learning.
Status:	Published
Refereed:	Yes
Glasgow Author(s) Enlighten ID:	Mccreadie, Dr Richard and LONG, ZIJUN
Authors:	Long, Z., Mccreadie, R., and Imran, M.
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
College/School:	College of Science and Engineering College of Science and Engineering > School of Computing Science
Research Centre:	College of Science and Engineering > School of Computing Science > IDA Section > GPU Cluster
Research Group:	Information Retrieval
ISSN:	2411-3387
ISBN:	9798218217495
Related URLs:	Organisation

University Staff: Request a correction | Enlighten Editors: Update this record

References

Akhtar, Z., Ofli, F., and Imran, M. (2021). “Towards Using Remote Sensing and Social Media Data for Flood Mapping”. In: 18th International Conference on Information Systems for Crisis Response and Management, ISCRAM 2021, Blacksburg, VA, USA, May 2021. Ed. by A. Adrot, R. Grace, K. A. Moore, and C. W. Zobel. ISCRAM Digital Library, pp. 536–551. Alam, F., Imran, M., and Ofli, F. (2017). “Image4act: Online social media image processing for disaster response”. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, pp. 601–604. Alam, F., Ofli, F., and Imran, M. (2018). “CrisisMMD: Multimodal Twitter Datasets from Natural Disasters”. In: Proceedings of the Twelfth International Conference on Web and Social Media, ICWSM 2018, Stanford, California, USA, June 25-28, 2018. AAAI Press, pp. 465–473. Asami, K., Fujita, S., Hiroi, K., and Hatayama, M. (2022). “Data Augmentation with Synthesized Damaged Roof Images Generated by GAN”. In: 19th International Conference on Information Systems for Crisis Response and Management, ISCRAM 2022, Tarbes, France, May 22-25, 2022. Ed. by R. Grace and H. Baharmand. ISCRAM Digital Library, pp. 256–265. Baevski, A., Hsu, W., Xu, Q., Babu, A., Gu, J., and Auli, M. (2022). “data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language”. In: International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA. Ed. by K. Chaudhuri, S. Jegelka, L. Song, C. Szepesv ́ari, G. Niu, and S. Sabato. Vol. 162. Proceedings of Machine Learning Research. PMLR, pp. 1298–1312. Buntain, C., McCreadie, R., and Soboroff, I. (2022). “Incident Streams 2021 Off the Deep End: Deeper Annotations and Evaluations in Twitter”. In: 19th International Conference on Information Systems for Crisis Response and Management, ISCRAM 2022, Tarbes, France, May 22-25, 2022. Ed. by R. Grace and H. Baharmand. ISCRAM Digital Library, pp. 584–604. Daly, S. and Thom, J. A. (2016). “Mining and Classifying Image Posts on Social Media to Analyse Fires”. In: 13th Proceedings of the International Conference on Information Systems for Crisis Response and Management, Rio de Janeiro, Brasil, May 22-25, 2016. Ed. by A. H. Tapia, P. Antunes, V. A. Ba ̃nuls, K. A. Moore, and J. P. de Albuquerque. ISCRAM Association. Deng, J., Dong, W., Socher, R., Li, L.- J., Li, K., and Fei-Fei, L. (2009). “ImageNet: A Large-Scale Hierarchical Image Database”. In: CVPR09. Devlin, J., Chang, M., Lee, K., and Toutanova, K. (2019). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers). Ed. by J. Burstein, C. Doran, and T. Solorio. Association for Computational Linguistics, pp. 4171–4186. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). “An image is worth 16x16 words: Transformers for image recognition at scale”. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. Gao, P., Jiang, Z., You, H., Lu, P., Hoi, S. C. H., Wang, X., and Li, H. (2019). “Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering”. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE, pp. 6639–6648. Girshick, R. B., Donahue, J., Darrell, T., and Malik, J. (2014). “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation”. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014. IEEE Computer Society, pp. 580–587. He, K., Chen, X., Xie, S., Li, Y., Doll ́ar, P., and Girshick, R. B. (2022). “Masked Autoencoders Are Scalable Vision Learners”. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. IEEE, pp. 15979–15988. He, K., Zhang, X., Ren, S., and Sun, J. (2016). “Deep Residual Learning for Image Recognition”. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society, pp. 770–778. Imran, M., Alam, F., Qazi, U., Peterson, S., and Ofli, F. (2020). “Rapid Damage Assessment Using Social Media Images by Combining Human and Machine Intelligence”. In: 17th International Conference on Information Systems for Crisis Response and Management, ISCRAM 2020, May 2020. Ed. by A. L. Hughes, F. McNeill, and C. W. Zobel. ISCRAM Digital Library, pp. 761–773. Imran, M., Castillo, C., Diaz, F., and Vieweg, S. (2015). “Processing Social Media Messages in Mass Emergency: A Survey”. In: ACM Comput. Surv. 47.4, 67:1–67:38. Imran, M., Castillo, C., Lucas, J., Meier, P., and Vieweg, S. (2014). “AIDR: Artificial intelligence for disaster response”. In: Proceedings of WWW. ACM. Kumar, S., Barbier, G., Abbasi, M. A., and Liu, H. (2011). “TweetTracker: An Analysis Tool for Humanitarian and Disaster Relief”. In: Proceedings of the Fifth International Conference on Weblogs and Social Media, Barcelona, Catalonia, Spain, July 17-21, 2011. The AAAI Press. Li, X. and Caragea, D. (2020). “Improving Disaster-related Tweet Classification with a Multimodal Approach”. In: 17th International Conference on Information Systems for Crisis Response and Management, ISCRAM 2020, May 2020. Ed. by A. L. Hughes, F. McNeill, and C. W. Zobel. ISCRAM Digital Library, pp. 893–902. Li, X., Caragea, D., Caragea, C., Imran, M., and Ofli, F. (2019). “Identifying Disaster Damage Images Using a Domain Adaptation Approach”. In: Proceedings of the 16th International Conference on Information Systems for Crisis Response and Management, Val`encia, Spain, May 19-22, 2019. Ed. by Z. Franco, J. J. Gonz ́alez, and J. H. Can ́os. ISCRAM Association. McCreadie, R., Buntain, C., and Soboroff, I. (2020). “Incident Streams 2019: Actionable Insights and How to Find Them”. In: 17th International Conference on Information Systems for Crisis Response and Management, ISCRAM 2020, May 2020. Ed. by A. L. Hughes, F. McNeill, and C. W. Zobel. ISCRAM Digital Library, pp. 744–760. Mouzannar, H., Rizk, Y., and Awad, M. (2018). “Damage Identification in Social Media Posts using Multimodal Deep Learning”. In: Proceedings of the 15th International Conference on Information Systems for Crisis Response and Management, Rochester, NY, USA, May 20-23, 2018. Ed. by K. Boersma and B. M. Tomaszewski. ISCRAM Association. Nguyen, D. T., Ofli, F., Imran, M., and Mitra, P. (2017). “Damage assessment from social media imagery data during disasters”. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, pp. 569–576. Nguyen, D. T., Joty, S. R., Imran, M., Sajjad, H., and Mitra, P. (2016). “Applications of Online Deep Learning for Crisis Response Using Social Media Information”. In: CoRR abs/1610.01030. arXiv: 1610.01030. Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, L., Shazeer, N., Ku, A., and Tran, D. (2018). “Image transformer”. In: International Conference on Machine Learning. PMLR, pp. 4055–4064. Redmon, J., Divvala, S. K., Girshick, R. B., and Farhadi, A. (2016). “You Only Look Once: Unified, Real-Time Object Detection”. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society, pp. 779–788. Ren, S., He, K., Girshick, R. B., and Sun, J. (2017). “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”. In: IEEE Trans. Pattern Anal. Mach. Intell. 39.6, pp. 1137–1149. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., and Bernstein, M. (2015). “Imagenet large scale visual recognition challenge”. In: International journal of computer vision 115.3, pp. 211–252. Said, N., Ahmad, K., Riegler, M., Pogorelov, K., Hassan, L., Ahmad, N., and Conci, N. (2019). “Natural disasters detection in social media and satellite imagery: a survey”. In: Multimedia Tools and Applications 78.22, pp. 31267–31302. Shekhar, H. and Setty, S. (2015). “Disaster analysis through tweets”. In: 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI). IEEE, pp. 1719–1723. Simonyan, K. and Zisserman, A. (2015). “Very Deep Convolutional Networks for Large-Scale Image Recognition”. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. Ed. by Y. Bengio and Y. LeCun. Sosea, T., Sirbu, I., Caragea, C., Caragea, D., and Rebedea, T. (2021). “Using the Image-Text Relationship to Improve Multimodal Disaster Tweet Classification”. In: 18th International Conference on Information Systems for Crisis Response and Management, ISCRAM 2021, Blacksburg, VA, USA, May 2021. Ed. by A. Adrot, R. Grace, K. A. Moore, and C. W. Zobel. ISCRAM Digital Library, pp. 691–704. Tan, M. and Le, Q. V. (2019). “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks”. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. Ed. by K. Chaudhuri and R. Salakhutdinov. Vol. 97. Proceedings of Machine Learning Research. PMLR, pp. 6105–6114. To, H., Agrawal, S., Kim, S. H., and Shahabi, C. (2017). “On identifying disaster-related tweets: Matching-based or learning-based?” In: 2017 IEEE third international conference on multimedia big data (BigMM). IEEE, pp. 330–337. Torrey, L. and Shavlik, J. (2010). “Transfer learning”. In: Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI global, pp. 242–264. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). “Attention is All you Need”. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. Ed. by I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett, pp. 5998–6008. Weber, E., Papadopoulos, D. P., Lapedriza, `A., Ofli, F., Imran, M., and Torralba, A. (2022). “Incidents1M: a large-scale dataset of images with natural disasters, damage, and incidents”. In: CoRR abs/2201.04236. arXiv: 2201.04236. Weissenborn, D., T ̈ackstr ̈om, O., and Uszkoreit, J. (2020). “Scaling Autoregressive Video Models”. In: 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. Widener, M. J. and Li, W. (2014). “Using geolocated Twitter data to monitor the prevalence of healthy and unhealthy food references across the US”. In: Applied Geography 54, pp. 189–197. Xie, Z., Zhang, Z., Cao, Y., Lin, Y., Bao, J., Yao, Z., Dai, Q., and Hu, H. (2022). “Simmim: A simple framework for masked image modeling”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9653–9663. Yin, J., Karimi, S., Lampert, A., Cameron, M. A., Robinson, B., and Power, R. (2015). “Using Social Media to Enhance Emergency Situation Awareness: Extended Abstract”. In: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015. AAAI Press, pp. 4234–4239. Zhou, H.- Y., Lu, C., Yang, S., and Yu, Y. (2021). “ConvNets vs. Transformers: Whose visual representations are more transferable?” In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2230–2238.

Deposit and Record Details

ID Code:	295808
Depositing User:	Dr Richard Mccreadie
Datestamp:	11 Apr 2023 08:57
Last Modified:	23 Jan 2024 16:14
Date of acceptance:	1 March 2023
Date of first online publication:	15 May 2023
Date Deposited:	11 April 2023
Data Availability Statement:	No