Genome signatures, self-organizing maps and higher order phylogenies: a parametric analysis

Gatherer, D. (2007) Genome signatures, self-organizing maps and higher order phylogenies: a parametric analysis. Evolutionary Bioinformatics, 2007(3), pp. 211-236.

[img] Text
4798.pdf

1MB

Publisher's URL: http://www.la-press.com/article.php?article_id=365

Abstract

Genome signatures are data vectors derived from the compositional statistics of DNA. The self-organizing map (SOM) is a neural network method for the conceptualisation of relationships within complex data, such as genome signatures. The various parameters of the SOM training phase are investigated for their effect on the accuracy of the resulting output map. It is concluded that larger SOMs, as well as taking longer to train, are less sensitive in phylogenetic classification of unknown DNA sequences. However, where a classification can be made, a larger SOM is more accurate. Increasing the number of iterations in the training phase of the SOM only slightly increases accuracy, without improving sensitivity. The optimal length of the DNA sequence k-mer from which the genome signature should be derived is 4 or 5, but shorter values are almost as effective. In general, these results indicate that small, rapidly trained SOMs are generally as good as larger, longer trained ones for the analysis of genome signatures. These results may also be more generally applicable to the use of SOMs for other complex data sets, such as microarray data.

Item Type:Articles
Keywords:Genome Signature; Self-Organizing Map; Viruses; Phylogeny; Jack-Knife Method; Microarray; Metagenomics; Herpesvirus
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Gatherer, Dr Derek
Authors: Gatherer, D.
Subjects:Q Science > QH Natural history > QH301 Biology
College/School:College of Medical Veterinary and Life Sciences
Research Group:MRC Virology Unit, Bioinformatics
Journal Name:Evolutionary Bioinformatics
Publisher:Libertas Academica
ISSN:1176-9343
First Published:First published in Evolutionary Bioinformatics 2007(3):211-236
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher.

University Staff: Request a correction | Enlighten Editors: Update this record