Digital forensics formats: seeking a digital preservation storage format for web archiving

Kim, Y. and Ross, S. (2011) Digital forensics formats: seeking a digital preservation storage format for web archiving. In: International Digital Curation Confenrence (IDCC 2011), Bristol, UK, 5-7 December 2011,




In this paper we discuss archival storage formats from the point of view of digital curation and preservation. Considering established approaches to data management as our jumping off point, we selected seven format attributes which are core to the long term accessibility of digital materials. These we have labeled core preservation attributes. These attributes are then used as evaluation criteria to compare file formats belonging to five common categories: formats for archiving selected content (e.g. tar, WARC), disk image formats that capture data for recovery or installation (partimage, dd raw image), these two types combined with a selected compression algorithm (e.g. tar+gzip), formats that combine packing and compression (e.g. 7-zip), and forensic file formats for data analysis in criminal investigations (e.g. aff, Advanced Forensic File format). We present a general discussion of the file format landscape in terms of the attributes we discuss, and make a direct comparison between the three most promising archival formats: tar, WARC, and aff. We conclude by suggesting the next steps to take the research forward and to validate the observations we have made.

Item Type:Conference Proceedings
Glasgow Author(s) Enlighten ID:Kim, Dr Yunhyong and Ross, Professor Seamus
Authors: Kim, Y., and Ross, S.
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science
College/School:College of Arts > School of Humanities > Information Studies
Copyright Holders:Copyright © 2012 The Authors
Publisher Policy:Reproduced with permission of the authors
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
559231BlogForeverSeamus RossEuropean Commission (EC)BlogForeverHU - ARTS AND MEDIA INFORMATICS (HATII)