How the accuracy and confidence of sensitivity classification affects digital sensitivity review

McDonald, G. , Macdonald, C. and Ounis, I. (2021) How the accuracy and confidence of sensitivity classification affects digital sensitivity review. ACM Transactions on Information Systems, 39(1), 4. (doi: 10.1145/3417334)

[img] Text
221945.pdf - Accepted Version

1MB

Abstract

Government documents must be manually reviewed to identify any sensitive information, e.g., confidential information, before being publicly archived. However, human-only sensitivity review is not practical for born-digital documents due to, for example, the volume of documents that are to be reviewed. In this work, we conduct a user study to evaluate the effectiveness of sensitivity classification for assisting human sensitivity reviewers. We evaluate how the accuracy and confidence levels of sensitivity classification affects the number of documents that are correctly judged as being sensitive (reviewer accuracy) and the time that it takes to sensitivity review a document (reviewing speed). In our within-subject study, the participants review government documents to identify real sensitivities while being assisted by three sensitivity classification treatments, namely None (no classification predictions), Medium (sensitivity predictions from a simulated classifier with a balanced accuracy (BAC) of 0.7), and Perfect (sensitivity predictions from a classifier with an accuracy of 1.0). Our results show that sensitivity classification leads to significant improvements (ANOVA, p < 0.05) in reviewer accuracy in terms of BAC (+37.9% Medium, +60.0% Perfect) and also in terms of F2 (+40.8% Medium, +44.9% Perfect). Moreover, we show that assisting reviewers with sensitivity classification predictions leads to significantly increased (ANOVA, p < 0.05) mean reviewing speeds (+72.2% Medium, +61.6% Perfect). We find that reviewers do not agree with the classifier significantly more as the classifier’s confidence increases. However, reviewing speed is significantly increased when the reviewers agree with the classifier (ANOVA, p < 0.05). Our in-depth analysis shows that when the reviewers are not assisted with sensitivity predictions, mean reviewing speeds are 40.5% slower for sensitive judgements compared to not-sensitive judgements. However, when the reviewers are assisted with sensitivity predictions, the difference in reviewing speeds between sensitive and not-sensitive judgements is reduced by ˜10%, from 40.5% to 30.8%. We also find that, for sensitive judgements, sensitivity classification predictions significantly increase mean reviewing speeds by 37.7% when the reviewers agree with the classifier’s predictions (t-test, p < 0.05). Overall, our findings demonstrate that sensitivity classification is a viable technology for assisting human reviewers with the sensitivity review of digital documents.

Item Type:Articles
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Macdonald, Professor Craig and McDonald, Dr Graham and Ounis, Professor Iadh
Authors: McDonald, G., Macdonald, C., and Ounis, I.
College/School:College of Science and Engineering > School of Computing Science
Journal Name:ACM Transactions on Information Systems
Publisher:Association for Computing Machinery
ISSN:1046-8188
ISSN (Online):1558-2868
Published Online:12 October 2020
Copyright Holders:Copyright © 2020 held by the owner/author(s)
First Published:First published in ACM Transactions on Information Systems 39(1): 4
Publisher Policy:Reproduced in accordance with the publisher copyright policy

University Staff: Request a correction | Enlighten Editors: Update this record