Active Learning Stopping Strategies for Technology-Assisted Sensitivity Review

McDonald, G. , Macdonald, C. and Ounis, I. (2020) Active Learning Stopping Strategies for Technology-Assisted Sensitivity Review. In: 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020), Xi'an, China, 25-30 Jul 2020, pp. 2053-2056. ISBN 9781450380164 (doi: 10.1145/3397271.3401267)

[img]
Preview
Text
215380.pdf - Accepted Version

829kB

Abstract

Active learning strategies are often deployed in technology-assisted review tasks, such as e-discovery and sensitivity review, to learn a classifier that can assist the reviewers with their task. In particular, an active learning strategy selects the documents that are expected to be the most useful for learning an effective classifier, so that these documents can be reviewed before the less useful ones. However, when reviewing for sensitivity, the order in which the documents are reviewed can impact on the reviewers' ability to perform the review. Therefore, when deploying active learning in technology-assisted sensitivity review, we want to know when a sufficiently effective classifier has been learned, such that the active learning can stop and the reviewing order of the documents can be selected by the reviewer instead of the classifier. In this work, we propose two active learning stopping strategies for technology-assisted sensitivity review. We evaluate the effectiveness of our proposed approaches in comparison with three state-of-the-art stopping strategies from the literature. We show that our best performing approach results in a significantly more effective sensitivity classifier (+6.6% F2) than the best performing stopping strategy from the literature (McNemar's test, p<0.05).

Item Type:Conference Proceedings
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Macdonald, Professor Craig and McDonald, Dr Graham and Ounis, Professor Iadh
Authors: McDonald, G., Macdonald, C., and Ounis, I.
College/School:College of Science and Engineering > School of Computing Science
ISBN:9781450380164
Published Online:25 July 2020
Copyright Holders:Copyright © 2020 Association for Computing Machinery
First Published:First published in SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval: 2053-2056
Publisher Policy:Reproduced in accordance with the publisher copyright policy
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record