Oulas, A. et al. (2016) Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab). Biodiversity Data Journal, 4, e8357. (doi: 10.3897/BDJ.4.e8357) (PMID:27932907) (PMCID:PMC5136650)
|
Text
130609.pdf - Published Version Available under License Creative Commons Attribution. 1MB |
Abstract
Background: Parallel data manipulation using R has previously been addressed by members of the R community, however most of these studies produce ad hoc solutions that are not readily available to the average R user. Our targeted users, ranging from the expert ecologist/microbiologists to computational biologists, often experience difficulties in finding optimal ways to exploit the full capacity of their computational resources. In addition, improving performance of commonly used R scripts becomes increasingly difficult especially with large datasets. Furthermore, the implementations described here can be of significant interest to expert bioinformaticians or R developers. Therefore, our goals can be summarized as: (i) description of a complete methodology for the analysis of large datasets by combining capabilities of diverse R packages, (ii) presentation of their application through a virtual R laboratory (RvLab) that makes execution of complex functions and visualization of results easy and readily available to the end-user. New information: In this paper, the novelty stems from implementations of parallel methodologies which rely on the processing of data on different levels of abstraction and the availability of these processes through an integrated portal. Parallel implementation R packages, such as the pbdMPI (Programming with Big Data – Interface to MPI) package, are used to implement Single Program Multiple Data (SPMD) parallelization on primitive mathematical operations, allowing for interplay with functions of the vegan package. The dplyr and RPostgreSQL R packages are further integrated offering connections to dataframe like objects (databases) as secondary storage solutions whenever memory demands exceed available RAM resources. The RvLab is running on a PC cluster, using version 3.1.2 (2014-10-31) on a x86_64-pc-linux-gnu (64-bit) platform, and offers an intuitive virtual environmet interface enabling users to perform analysis of ecological and microbial communities based on optimized vegan functions. A beta version of the RvLab is available after registration at: https://portal.lifewatchgreece.eu/
Item Type: | Articles |
---|---|
Additional Information: | This work was supported by the LifeWatchGreece infrastructure (MIS 384676), funded by the Greek Government under the General Secretariat of Research and Technology (GSRT), ESFRI Projects, National Strategic Reference Framework (NSRF). |
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | Ijaz, Dr Umer |
Authors: | Oulas, A., Varsos, C., Patkos, T., Pavloudi, C., Gougousis, A., Ijaz, U. Z., Filiopoulou, I., Pattakos, N., Vanden Berghe, E., Fernández-Guerra, A., Faulwetter, S., Chatzinikolaou, E., Pafilis, E., Bekiari, C., Doerr, M., and Arvanitidis, C. |
College/School: | College of Science and Engineering > School of Engineering > Infrastructure and Environment |
Journal Name: | Biodiversity Data Journal |
Publisher: | Pensoft |
ISSN: | 1314-2836 |
ISSN (Online): | 1314-2828 |
Copyright Holders: | Copyright © 2016 Varsos, C et al. |
First Published: | First published in Biodiversity Data Journal 4: e8357 |
Publisher Policy: | Reproduced under a Creative Commons License |
University Staff: Request a correction | Enlighten Editors: Update this record