An ensemble filter-based heuristic approach for cancerous gene expression classification

Uzma, and Halim, Z. (2021) An ensemble filter-based heuristic approach for cancerous gene expression classification. Knowledge-Based Systems, 234, 107560. (doi: 10.1016/j.knosys.2021.107560)

[img] Text
306710.pdf - Accepted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

3MB

Abstract

Gene expression data of cancer has a huge feature set size, making its categorization a challenge for the existing classification methods. It contains redundancy, noise, and irrelevant genes. Therefore, feature selection/reduction plays a crucial role in the classification of such gene expression datasets. This work presents an ensemble of three filter methods, namely, Symmetrical Uncertainty (SU), chi square (X2), and Relief to reduce the feature dimensions by eliminating redundant and noisy genes. The present work designs a novel heuristic called Local Search-based Feature Selection (LSFS) that further reduces noise generated by the ensemble method. The resulting selected features are then optimized using a genetic algorithm. Afterwards, the optimal set of features is classified using three models; Support Vector Machine (SVM), k-NN (k-nearest neighbor), and Random Forest (RF) to find cancer relevant genes. Experiments are conducted using six benchmark datasets. The obtained results are compared with five state-of-the-art algorithms based on accuracy, sensitivity, specificity, F-measure, entropy, and precision. Additional experiments are carried out by manipulating the SVM kernel as a fitness value as well as using multiple distance measures and various values of k for k-NN. Prediction accuracy of the proposed system on the six benchmark datasets is 99%, 90%, 98%, 94%, 98%, and 99%. Significant outcomes obtained from experimental analysis indicate that the proposed approach improves classification of cancerous gene expression data and can be used as a practical tool for the analysis of gene expression data.

Item Type:Articles
Additional Information:This work was sponsored by the GIK Institute graduate research fund, Pakistan under GA-F scheme.
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Uzma, Dr Uzma
Authors: Uzma, , and Halim, Z.
College/School:College of Science and Engineering > School of Engineering > Infrastructure and Environment
Journal Name:Knowledge-Based Systems
Publisher:Elsevier
ISSN:0950-7051
ISSN (Online):1872-7409
Published Online:06 October 2021
Copyright Holders:© 2021 Elsevier B.V.
First Published:First published in Knowledge-Based Systems 234:107560
Publisher Policy:Reproduced in accordance with the publisher copyright policy

University Staff: Request a correction | Enlighten Editors: Update this record