Active Learning Strategies for Technology Assisted Sensitivity Review

McDonald, G., Macdonald, C. and Ounis, I. (2018) Active Learning Strategies for Technology Assisted Sensitivity Review. In: 40th European Conference on Information Retrieval (ECIR 2018), Grenoble, France, 25-29 Mar 2018, pp. 439-453. ISBN 9783319769400 (doi:10.1007/978-3-319-76941-7_33)

McDonald, G., Macdonald, C. and Ounis, I. (2018) Active Learning Strategies for Technology Assisted Sensitivity Review. In: 40th European Conference on Information Retrieval (ECIR 2018), Grenoble, France, 25-29 Mar 2018, pp. 439-453. ISBN 9783319769400 (doi:10.1007/978-3-319-76941-7_33)

[img]
Preview
Text
154193.pdf - Accepted Version

500kB

Abstract

Government documents must be reviewed to identify and protect any sensitive information, such as personal information, before the documents can be released to the public. However, in the era of digital government documents, such as e-mail, traditional sensitivity review procedures are no longer practical, for example due to the volume of documents to be reviewed. Therefore, there is a need for new technology assisted review protocols to integrate automatic sensitivity classification into the sensitivity review process. Moreover, to effectively assist sensitivity review, such assistive technologies must incorporate reviewer feedback to enable sensitivity classifiers to quickly learn and adapt to the sensitivities within a collection, when the types of sensitivity are not known a priori. In this work, we present a thorough evaluation of active learning strategies for sensitivity review. Moreover, we present an active learning strategy that integrates reviewer feedback, from sensitive text annotations, to identify features of sensitivity that enable us to learn an effective sensitivity classifier (0.7 Balanced Accuracy) using significantly less reviewer effort, according to the sign test (p < 0.01 ). Moreover, this approach results in a 51% reduction in the number of documents required to be reviewed to achieve the same level of classification accuracy, compared to when the approach is deployed without annotation features.

Item Type:Conference Proceedings
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Macdonald, Dr Craig and Ounis, Professor Iadh and McDonald, Mr Graham
Authors: McDonald, G., Macdonald, C., and Ounis, I.
College/School:College of Science and Engineering > School of Computing Science
ISSN:0302-9743
ISBN:9783319769400
Published Online:01 March 2018
Copyright Holders:Copyright © 2018 Springer International Publishing AG, part of Springer Nature
First Published:First published in Advances in Information Retrieval. ECIR 2018: 439-453
Publisher Policy:Reproduced in accordance with the publisher copyright policy

University Staff: Request a correction | Enlighten Editors: Update this record