A Study of SVM Kernel Functions for Sensitivity Classification Ensembles with POS Sequences

Mcdonald, G., García-Pedrajas, N., Macdonald, C. and Ounis, I. (2017) A Study of SVM Kernel Functions for Sensitivity Classification Ensembles with POS Sequences. In: SIGIR 2017: The 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Tokyo, Japan, 7-11 Aug 2017, pp. 1097-1100. ISBN 9781450350228 (doi:10.1145/3077136.3080731)

Mcdonald, G., García-Pedrajas, N., Macdonald, C. and Ounis, I. (2017) A Study of SVM Kernel Functions for Sensitivity Classification Ensembles with POS Sequences. In: SIGIR 2017: The 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Tokyo, Japan, 7-11 Aug 2017, pp. 1097-1100. ISBN 9781450350228 (doi:10.1145/3077136.3080731)

[img]
Preview
Text
142960.pdf - Accepted Version

792kB

Abstract

Freedom of Information (FOI) laws legislate that government documents should be opened to the public. However, many government documents contain sensitive information, such as confidential information, that is exempt from release. Therefore, government documents must be sensitivity reviewed prior to release, to identify and close any sensitive information. With the adoption of born-digital documents, such as email, there is a need for automatic sensitivity classification to assist digital sensitivity review. SVM classifiers and Part-of-Speech sequences have separately been shown to be promising for sensitivity classification. However, sequence classification methodologies, and specifically SVM kernel functions, have not been fully investigated for sensitivity classification. Therefore, in this work, we present an evaluation of five SVM kernel functions for sensitivity classification using POS sequences. Moreover, we show that an ensemble classifier that combines POS sequence classification with text classification can significantly improve sensitivity classification effectiveness (+6.09% F2) compared with a text classification baseline, according to McNemar's test of significance.

Item Type:Conference Proceedings
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Macdonald, Dr Craig and Ounis, Professor Iadh and McDonald, Mr Graham
Authors: Mcdonald, G., García-Pedrajas, N., Macdonald, C., and Ounis, I.
College/School:College of Science and Engineering > School of Computing Science
ISBN:9781450350228
Copyright Holders:Copyright © 2017 ACM
First Published:First published in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval: 1097-1100
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record