The Library Catalogue as Dataset: Exploring Data Science Approaches to Analyse Collections at Scale

Havens, L., Gooding, P. , Lingstadt, K., Forrest, A., MacDonald, A. and Terras, M. (2022) The Library Catalogue as Dataset: Exploring Data Science Approaches to Analyse Collections at Scale. Digital Humanities Congress 2022, Sheffield, UK, 08-11 Sep 2022.

Full text not currently available from Enlighten.

Publisher's URL:


How can we use data science to understand academic library holdings at scale? Can we use library catalogues to understand the historical growth of collections, acquisition practices, subject level specialisms, biases, or how a collection reflects the library’s stated acquisition strategy? We present the Edinburgh University Library Metadata Visualizations Project, which used MARC (Machine Readable Cataloging) metadata, the international standard for dissemination and searching of bibliographic data (Schudel 2006, Library of Congress 2019), as a rich source to understand holdings. Library catalogue data is an example of a Humanities dataset that is complex, challenging, heterogeneous, fragmentary, multilingual, and ambiguous (Lazer et al 2009, Kitchin 2014, Guiliano and Ridge 2016, Underwood 2018, Alex et al 2019). Most data processing of MARC focusses on improvement of the records, although previous work has used MARC to understand biases (Diao and Cao 2016, Lavoie 2018), and for library analytics (Harper 2016). The University Library’s MARC data for its 1,297,311 print books was downloaded from OCLC (the “physical collection”: avoiding complexities of syndication to electronic sources). Data was translated to CSV and cleaned using Python and Pandas scripts. Visualisations were created from samples of the data using Python, Microsoft Excel and Adobe InDesign. Our code is available on GitHub.

Item Type:Conference or Workshop Item
Glasgow Author(s) Enlighten ID:Gooding, Professor Paul
Authors: Havens, L., Gooding, P., Lingstadt, K., Forrest, A., MacDonald, A., and Terras, M.
Subjects:Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4050 Electronic information resources
College/School:College of Arts & Humanities > School of Humanities > Information Studies
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record