Biomedical Named Entity Recognition via Dictionary-based Synonym Generalization

Fu, Z., Su, Y., Meng, Z. and Collier, N. (2023) Biomedical Named Entity Recognition via Dictionary-based Synonym Generalization. In: 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023), Singapore, 06-10 Dec 2023, pp. 14621-14635. (doi: 10.18653/v1/2023.emnlp-main.903)

[img] Text
307884.pdf - Published Version
Available under License Creative Commons Attribution.



Biomedical named entity recognition is one of the core tasks in biomedical natural language processing (BioNLP). To tackle this task, numerous supervised/distantly supervised approaches have been proposed. Despite their remarkable success, these approaches inescapably demand laborious human effort. To alleviate the need of human effort, dictionary-based approaches have been proposed to extract named entities simply based on a given dictionary. However, one downside of existing dictionary-based approaches is that they are challenged to identify concept synonyms that are not listed in the given dictionary, which we refer as the synonym generalization problem. In this study, we propose a novel Synonym Generalization (SynGen) framework that recognizes the biomedical concepts contained in the input text using span-based predictions. In particular, SynGen introduces two regularization terms, namely, (1) a synonym distance regularizer; and (2) a noise perturbation regularizer, to minimize the synonym generalization error. To demonstrate the effectiveness of our approach, we provide a theoretical analysis of the bound of synonym generalization error. We extensively evaluate our approach on a wide range of benchmarks and the results verify that SynGen outperforms previous dictionary-based models by notable margins. Lastly, we provide a detailed analysis to further reveal the merits and inner-workings of our approach.

Item Type:Conference Proceedings
Glasgow Author(s) Enlighten ID:Meng, Dr Zaiqiao
Authors: Fu, Z., Su, Y., Meng, Z., and Collier, N.
College/School:College of Science and Engineering > School of Computing Science
Copyright Holders:Copyright © 2023 Association for Computational Linguistics
First Published:First published in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Publisher Policy:Reproduced under a Creative Commons license
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record