FAIR Data Pipeline: provenance-driven data management for traceable scientific workflows

Mitchell, S. N. et al. (2022) FAIR Data Pipeline: provenance-driven data management for traceable scientific workflows. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 380(2233), 20210300. (doi: 10.1098/rsta.2021.0300) (PMID:35965468) (PMCID:PMC9376726)

[img] Text
273243.pdf - Published Version
Available under License Creative Commons Attribution.

934kB

Abstract

Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of ‘following the science’ are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process. Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline. Although developed during the COVID-19 pandemic, it allows easy annotation of any data as they are consumed by analyses, or conversely traces the provenance of scientific outputs back through the analytical or modelling source code to primary data. Such a tool provides a mechanism for the public, and fellow scientists, to better assess scientific evidence by inspecting its provenance, while allowing scientists to support policymakers in openly justifying their decisions. We believe that such tools should be promoted for use across all areas of policy-facing research. This article is part of the theme issue ‘Technical challenges of modelling real-life epidemics and examples of overcoming these’.

Item Type:Articles
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Mohr, Dr Sibylle and Bessell, Dr Paul and Archibald, Dr Blair and Harris, Miss Claire and Field, Mr Ryan and Reeve, Professor Richard and Brett, Mrs Alys and Mellor, Professor Dominic and Boden, Dr Lisa and Mitchell, Dr Sonia and Dundas, Professor Ruth and Enright, Dr Jessica and Matthews, Professor Louise and McMonagle, Ciaran and Turner, Dr Robert and Marion, Professor Glenn and McKendrick, Dr Iain
Authors: Mitchell, S. N., Lahiff, A., Cummings, N., Hollocombe, J., Boskamp, B., Field, R., Reddyhoff, D., Zarębski, K., Wilson, A., Viola, B., Burke, M., Archibald, B., Bessell, P., Blackwell, R., Boden, L. A.,, Brett, A., Brett, S., Dundas, R., Enright, J., Gonzalez-Beltran, A., Harris, C., Hinder, I., Hughes, C. D., Knight, M., Mano, V., McMonagle, C., Mellor, D., Mohr, S., Marion, G., Matthews, L., McKendrick, I. J., Pooley, C. M., Porphyre, T., Reeves, A., Townsend, E., Turner, R., Walton, J., and Reeve, R.
College/School:College of Medical Veterinary and Life Sciences > School of Biodiversity, One Health & Veterinary Medicine
College of Medical Veterinary and Life Sciences > School of Health & Wellbeing > MRC/CSO SPHSU
College of Medical Veterinary and Life Sciences > School of Health & Wellbeing > Public Health
College of Science and Engineering > School of Computing Science
Journal Name:Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
Publisher:Royal Society
ISSN:1364-503X
ISSN (Online):1471-2962
Published Online:15 August 2022
Copyright Holders:Copyright © 2022 The Authors
First Published:First published in Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 380(2233): 20210300
Publisher Policy:Reproduced under a Creative Commons License
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
311856Open Epidemiology for COVID-19: a transparent, traceable, open source pipeline for reproducible scienceRichard ReeveScience and Technology Facilities Council (STFC)ST/V006126/1Institute of Biodiversity, Animal Health and Comparative Medicine
190824The BUG consortium Building Upon the Genome: using H. contortus genomic resources to develop novel interventions to control endemic GI parasitesEileen DevaneyBiotechnology and Biological Sciences Research Council (BBSRC)BB/M003949/1Institute of Biodiversity, Animal Health and Comparative Medicine
302761Tracking isometamidium resistance in livestock trypanaosomesMichael BarrettBiotechnology and Biological Sciences Research Council (BBSRC)BB/S001034/1SII - Parasitology
305944Multilayer Algorithmics to Leverage Graph StructureKitty MeeksEngineering and Physical Sciences Research Council (EPSRC)EP/T004878/1M&S - Statistics
3048230021Inequalities in healthAlastair LeylandMedical Research Council (MRC)MC_UU_00022/2HW - MRC/CSO Social and Public Health Sciences Unit
307517Landscape Decisions: Towards a new framework for using land assets programmeRichard ReeveNatural Environment Research Council (NERC)NE/T004193/1M&S - Mathematics
309089Simulating UK plant biodiversity under climate change to aid landscape decision makingRichard ReeveNatural Environment Research Council (NERC)NE/T010355/1M&S - Mathematics
3048230071Inequalities in healthAlastair LeylandOffice of the Chief Scientific Adviser (CSO)SPHSU17HW - MRC/CSO Social and Public Health Sciences Unit