Uzma, , Manzoor, U. and Halim, Z. (2023) Protein encoder: an autoencoder-based ensemble feature selection scheme to predict protein secondary structure. Expert Systems with Applications, 213(Part B), 119081. (doi: 10.1016/j.eswa.2022.119081)
Text
306717.pdf - Accepted Version Available under License Creative Commons Attribution Non-commercial No Derivatives. 5MB |
Abstract
Proteins play a vital role in the human body as they perform important metabolic tasks. Experimental identification of protein structure is expensive and time consuming. The prediction of protein secondary structure is significant to identify the protein tertiary structure and its folds. The feature subset selection from high dimensional protein primary sequence is a key to improve the accuracy of Protein Secondary Structure Prediction (PSSP). Therefore, it is essential to select the relevant features from high dimensional data to predict the protein secondary structure. This work presents a novel method for the PSSP problem based on a two-phase feature selection technique. The first stage utilizes an unsupervised autoencoder for feature extractions. Whereas, the second stage is an ensemble of three feature selection methods, namely, generic univariate select, recursive feature elimination, and Pearson's correlation. This phase combines multiple feature subsets using mutual information to select the optimum feature subset. For classification, different resultant subset features are used. These include random forest, decision tree, and multilayer perceptron. Two sets of experiments are performed on five datasets for the assessment of proposed work. The proposed solution is compared with three state-of-the-art methods based on Q3 accuracy, Q8 accuracy, and segment overlap score. Obtained results show that the proposed framework performs better in the majority of the cases than the past contributions. The proposed framework achieves Q8 accuracies of 82%, 80%, 79%, 73%, and 74% and Q3 accuracies of 90%, 90%, 92%, 79%, and 74% on CB6133, CB6133-filtered, CB513, CASP10, and CASP11 datasets, respectively.
Item Type: | Articles |
---|---|
Additional Information: | This work was supported by the GIK Institute graduate program research fund under GA-1 scheme. |
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | Uzma, Dr Uzma |
Creator Roles: | |
Authors: | Uzma, , Manzoor, U., and Halim, Z. |
College/School: | College of Science and Engineering > School of Engineering > Infrastructure and Environment |
Journal Name: | Expert Systems with Applications |
Publisher: | Elsevier |
ISSN: | 0957-4174 |
ISSN (Online): | 0957-4174 |
Published Online: | 20 October 2022 |
Copyright Holders: | Copyright © 2022 Elsevier Ltd. |
First Published: | First published in Expert Systems with Applications 213(Part B): 119081 |
Publisher Policy: | Reproduced in accordance with the publisher copyright policy |
University Staff: Request a correction | Enlighten Editors: Update this record