Text-mining clinically relevant cancer biomarkers for curation into the CIViC database

Lever, J. , Jones, M. R., Danos, A. M., Krysiak, K., Bonakdar, M., Grewal, J. K., Culibrk, L., Griffith, O. L., Griffith, M. and Jones, S. J.M. (2019) Text-mining clinically relevant cancer biomarkers for curation into the CIViC database. Genome Medicine, 11, 78. (doi: 10.1186/s13073-019-0686-y) (PMID:31796060) (PMCID:PMC6891984)

[img] Text
242650.pdf - Published Version
Available under License Creative Commons Attribution.

1MB

Abstract

Background: Precision oncology involves analysis of individual cancer samples to understand the genes and pathways involved in the development and progression of a cancer. To improve patient care, knowledge of diagnostic, prognostic, predisposing, and drug response markers is essential. Several knowledgebases have been created by different groups to collate evidence for these associations. These include the open-access Clinical Interpretation of Variants in Cancer (CIViC) knowledgebase. These databases rely on time-consuming manual curation from skilled experts who read and interpret the relevant biomedical literature. Methods: To aid in this curation and provide the greatest coverage for these databases, particularly CIViC, we propose the use of text mining approaches to extract these clinically relevant biomarkers from all available published literature. To this end, a group of cancer genomics experts annotated sentences that discussed biomarkers with their clinical associations and achieved good inter-annotator agreement. We then used a supervised learning approach to construct the CIViCmine knowledgebase. Results: We extracted 121,589 relevant sentences from PubMed abstracts and PubMed Central Open Access full-text papers. CIViCmine contains over 87,412 biomarkers associated with 8035 genes, 337 drugs, and 572 cancer types, representing 25,818 abstracts and 39,795 full-text publications. Conclusions: Through integration with CIVIC, we provide a prioritized list of curatable clinically relevant cancer biomarkers as well as a resource that is valuable to other knowledgebases and precision cancer analysts in general. All data is publically available and distributed with a Creative Commons Zero license. The CIViCmine knowledgebase is available at http://bionlp.bcgsc.ca/civicmine/.

Item Type:Articles
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Lever, Dr Jake
Authors: Lever, J., Jones, M. R., Danos, A. M., Krysiak, K., Bonakdar, M., Grewal, J. K., Culibrk, L., Griffith, O. L., Griffith, M., and Jones, S. J.M.
College/School:College of Science and Engineering > School of Computing Science
Journal Name:Genome Medicine
Publisher:BioMed Central
ISSN:1756-994X
ISSN (Online):1756-994X
Copyright Holders:Copyright © 2019 The Authors
First Published:First published in Genome Medicine 11: 78
Publisher Policy:Reproduced under a Creative Commons License
Data DOI:10.5281/zenodo.1472826

University Staff: Request a correction | Enlighten Editors: Update this record