Biases Arising from Using Linked Administrative Data for Research: A Conceptual Framework from Registration to Analysis.

Shaw, R. , Harron, K., Pescarini, J., Júnior, E., Siroky, A., Campbell, D. , Dundas, R. , Ichihara, M. Y., Barreto, M. and Katikireddi, V. (2022) Biases Arising from Using Linked Administrative Data for Research: A Conceptual Framework from Registration to Analysis. In: 2022 International Population Data Linkage, Edinburgh, UK, 7-9 Sept 2022, (doi: 10.23889/ijpds.v7i3.1800)

[img] Text
278726.pdf - Published Version
Available under License Creative Commons Attribution.



Objectives: Administrative data are primarily collected for operational processes and these processes can lead to sources of bias that may not be adequately considered by researchers. We provide a framework to help understand how biases might arise from using linked administrative data, and hopefully aid future study designs. Approach: We developed the conceptual framework based on the team’s experiences with the 100 Million Brazilian Cohort (100MCohort) which contains records of more than 131 million people whose families applied for social assistance between 2001 and 2018, linked to other administrative data sources. We provide examples from the 100MCohort of where and how in the linkage process different forms of bias could arise. We make recommendations on how biases might be addressed using commonly available external data. Results: The conceptual framework covers the whole data generating process from people and events occurring in the population through to deriving variables for analysis. The framework comprises three distinct stages: 1) Recording and registration of events in administrative systems such as Brazil’s Mortality Information System (SIM) and the Hospital Information System (SIH); 2) Linkage of different data sources, for example using exact matching via the Social Identification Number (NIS) in Brazil’s CadÚnico database or linkage algorithms; 3) Cleaning and coding data used both for analysis and linkage. The biases arising from linkage can be better understood by applying theory and making additional metadata available. Conclusion: Maximising the potential of administrative data for research requires a better understanding of how biases arise. This is best achieved by considering the entire data generating process, and better communication among all those involved in the data collection and linkage processes.

Item Type:Conference Proceedings
Glasgow Author(s) Enlighten ID:Katikireddi, Professor Vittal and Campbell, Dr Desmond and Shaw, Dr Richard and Dundas, Professor Ruth
Authors: Shaw, R., Harron, K., Pescarini, J., Júnior, E., Siroky, A., Campbell, D., Dundas, R., Ichihara, M. Y., Barreto, M., and Katikireddi, V.
College/School:College of Medical Veterinary and Life Sciences > School of Health & Wellbeing > MRC/CSO SPHSU
Journal Name:International Journal of Population Data Science
Copyright Holders:Copyright © 2022 The Authors
First Published:First published in International Journal of Population Data Science 7(3):29
Publisher Policy:Reproduced under a Creative Commons License
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
300390Strengthening data linkage to reduce health inequalities in low and middle income countries: building on the Brazilian 100 million cohortAlastair LeylandNational Institute for Health Research (NIHR)16/137/99HW - MRC/CSO Social and Public Health Sciences Unit