Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab)

Oulas, A. et al. (2016) Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab). Biodiversity Data Journal, 4, e8357. (doi:10.3897/BDJ.4.e8357) (PMID:27932907) (PMCID:PMC5136650)

[img]
Preview
Text
130609.pdf - Published Version
Available under License Creative Commons Attribution.

1MB

Abstract

Background: Parallel data manipulation using R has previously been addressed by members of the R community, however most of these studies produce ad hoc solutions that are not readily available to the average R user. Our targeted users, ranging from the expert ecologist/microbiologists to computational biologists, often experience difficulties in finding optimal ways to exploit the full capacity of their computational resources. In addition, improving performance of commonly used R scripts becomes increasingly difficult especially with large datasets. Furthermore, the implementations described here can be of significant interest to expert bioinformaticians or R developers. Therefore, our goals can be summarized as: (i) description of a complete methodology for the analysis of large datasets by combining capabilities of diverse R packages, (ii) presentation of their application through a virtual R laboratory (RvLab) that makes execution of complex functions and visualization of results easy and readily available to the end-user. New information: In this paper, the novelty stems from implementations of parallel methodologies which rely on the processing of data on different levels of abstraction and the availability of these processes through an integrated portal. Parallel implementation R packages, such as the pbdMPI (Programming with Big Data – Interface to MPI) package, are used to implement Single Program Multiple Data (SPMD) parallelization on primitive mathematical operations, allowing for interplay with functions of the vegan package. The dplyr and RPostgreSQL R packages are further integrated offering connections to dataframe like objects (databases) as secondary storage solutions whenever memory demands exceed available RAM resources. The RvLab is running on a PC cluster, using version 3.1.2 (2014-10-31) on a x86_64-pc-linux-gnu (64-bit) platform, and offers an intuitive virtual environmet interface enabling users to perform analysis of ecological and microbial communities based on optimized vegan functions. A beta version of the RvLab is available after registration at: https://portal.lifewatchgreece.eu/

Item Type:Articles
Additional Information:This work was supported by the LifeWatchGreece infrastructure (MIS 384676), funded by the Greek Government under the General Secretariat of Research and Technology (GSRT), ESFRI Projects, National Strategic Reference Framework (NSRF).
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Ijaz, Dr Umer Zeeshan
Authors: Oulas, A., Varsos, C., Patkos, T., Pavloudi, C., Gougousis, A., Ijaz, U. Z., Filiopoulou, I., Pattakos, N., Vanden Berghe, E., Fernández-Guerra, A., Faulwetter, S., Chatzinikolaou, E., Pafilis, E., Bekiari, C., Doerr, M., and Arvanitidis, C.
College/School:College of Science and Engineering > School of Engineering > Infrastructure and Environment
Journal Name:Biodiversity Data Journal
Publisher:Pensoft
ISSN:1314-2836
ISSN (Online):1314-2828
Copyright Holders:Copyright © 2016 Varsos, C et al.
First Published:First published in Biodiversity Data Journal 4: e8357
Publisher Policy:Reproduced under a Creative Commons License

University Staff: Request a correction | Enlighten Editors: Update this record