Extracting Methodology Components from AI Research Papers: A Data-driven Factored Sequence Labeling Approach

Ghosh, M., Ganguly, D. , Basuchowdhuri, P. and Kumar Naskar, S. (2023) Extracting Methodology Components from AI Research Papers: A Data-driven Factored Sequence Labeling Approach. In: 32nd ACM International Conference on Information and Knowledge Management (CIKM 2023), Birmingham, UK, 21-25 Oct 2023, pp. 3897-3901. ISBN 9798400701245 (doi: 10.1145/3583780.3615258)

[img] Text
304934.pdf - Accepted Version

1MB

Abstract

Extraction of methodology component names from scientific articles is a challenging task due to the diversified contexts around the occurrences of these entities, and the different levels of granularity and containment relationships exhibited by these entities. We hypothesize that standard sequence labeling approaches may not adequately model the dependence of methodology name mentions with their contexts, due to the problems of their large, fast evolving, and domain-specific vocabulary. As a solution, we propose a factored approach, where the mention-context dependencies are represented in a more fine-grained manner, thus allowing the model parameters to better adjust to the different characteristic patterns inherent within the data. In particular, we experiment with two variants of this factored approach - one that uses the per-entity category information derived from an ontology, and the other that makes use of the topology of the sentence embedding space to infer a category for each entity constituting that sentence. We demonstrate that both these factored variants of SciBERT outperform their non-factored counterpart, a state-of-the-art model for scientific concept extraction.

Item Type:Conference Proceedings
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Ganguly, Dr Debasis
Authors: Ghosh, M., Ganguly, D., Basuchowdhuri, P., and Kumar Naskar, S.
College/School:College of Science and Engineering > School of Computing Science
ISBN:9798400701245
Copyright Holders:Copyright © 2023 The Authors
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record