R

From FarmShare

(Difference between revisions)
Jump to: navigation, search
(2014-07-10)
Line 1: Line 1:
== Looking at installed packages ==
== Looking at installed packages ==
-
You can see the list of installed R libraries by the library() call
+
You can see the list of installed R libraries by the library() call in R
<source lang="r">
<source lang="r">
library();
library();
</source>
</source>
-
For example, currently on FarmShare these libraries are installed
+
We have a lot of packages already installed, you can ask us to install more, or just install them quickly in your homedir.
-
 
+
-
<source lang="sh">
+
-
$ R
+
-
 
+
-
R version 2.15.2 (2012-10-26) -- "Trick or Treat"
+
-
Copyright (C) 2012 The R Foundation for Statistical Computing
+
-
ISBN 3-900051-07-0
+
-
Platform: x86_64-pc-linux-gnu (64-bit)
+
-
 
+
-
R is free software and comes with ABSOLUTELY NO WARRANTY.
+
-
You are welcome to redistribute it under certain conditions.
+
-
Type 'license()' or 'licence()' for distribution details.
+
-
 
+
-
R is a collaborative project with many contributors.
+
-
Type 'contributors()' for more information and
+
-
'citation()' on how to cite R or R packages in publications.
+
-
 
+
-
Type 'demo()' for some demos, 'help()' for on-line help, or
+
-
'help.start()' for an HTML browser interface to help.
+
-
Type 'q()' to quit R.
+
-
 
+
-
[Previously saved workspace restored]
+
-
 
+
-
> library()
+
-
Packages in library '/usr/lib/R/site-library':
+
-
 
+
-
AMORE                  A MORE flexible neural network package
+
-
Biobase                Biobase: Base functions for Bioconductor
+
-
DBI                    R Database Interface
+
-
GenABEL                genome-wide SNP association analysis
+
-
HilbertVis              Hilbert curve visualization
+
-
Hmisc                  Harrell Miscellaneous
+
-
MCMCpack                Markov chain Monte Carlo (MCMC) Package
+
-
MNP                    R Package for Fitting the Multinomial Probit
+
-
                        Model
+
-
MatchIt                MatchIt
+
-
RColorBrewer            ColorBrewer palettes
+
-
RGtk2                  R bindings for Gtk 2.8.0 and above
+
-
RMySQL                  R interface to the MySQL database
+
-
RODBC                  ODBC Database Access
+
-
RQuantLib              R interface to the QuantLib library
+
-
RSQLite                SQLite interface for R
+
-
Rcmdr                  R Commander
+
-
Rcpp                    Seamless R and C++ Integration
+
-
Rglpk                  R/GNU Linear Programming Kit Interface
+
-
Rmpi                    Interface (Wrapper) to MPI (Message-Passing
+
-
                        Interface)
+
-
Rserve                  Binary R server
+
-
TeachingDemos          Demonstrations for teaching and learning
+
-
VGAM                    Vector Generalized Linear and Additive Models
+
-
XML                    Tools for parsing and generating XML within R
+
-
                        and S-Plus.
+
-
Zelig                  Everyones Statistical Software
+
-
abind                  Combine multi-dimensional arrays
+
-
bayesm                  Bayesian Inference for
+
-
                        Marketing/Micro-econometrics
+
-
bio3d                  Biological Structure Analysis
+
-
bitops                  Functions for Bitwise operations
+
-
caTools                Tools: moving window statistics, GIF, Base64,
+
-
                        ROC AUC, etc.
+
-
cairoDevice            Cairo-based cross-platform antialiased graphics
+
-
                        device driver.
+
-
car                    Companion to Applied Regression
+
-
chron                  Chronological objects which can handle dates
+
-
                        and times
+
-
coda                    Output analysis and diagnostics for MCMC
+
-
colorspace              Color Space Manipulation
+
-
combinat                combinatorics utilities
+
-
cummeRbund              Analysis, exploration, manipulation, and
+
-
                        visualization of Cufflinks high-throughput
+
-
                        sequencing data.
+
-
date                    Functions for handling dates
+
-
digest                  Create cryptographic hash digests of R objects
+
-
eco                    R Package for Ecological Inference in 2x2
+
-
                        Tables
+
-
edgeR                  Empirical analysis of digital gene expression
+
-
                        data in R
+
-
effects                Effect Displays for Linear, Generalized Linear,
+
-
                        Multinomial-Logit, Proportional-Odds Logit
+
-
                        Models and Mixed-Effects Models
+
-
fAssets                Rmetrics - Assets Selection and Modelling
+
-
fBasics                Rmetrics - Markets and Basic Statistics
+
-
fCopulae                Rmetrics - Dependence Structures with Copulas
+
-
fExtremes              Rmetrics - Extreme Financial Market Data
+
-
fGarch                  Rmetrics - Autoregressive Conditional
+
-
                        Heteroskedastic Modelling
+
-
fMultivar              Multivariate Market Analysis
+
-
fOptions                Basics of Option Valuation
+
-
fPortfolio              Rmetrics - Portfolio Selection and Optimization
+
-
                        - ebook available at www.rmetrics.org
+
-
fTrading                Technical Trading Analysis
+
-
g.data                  Delayed-Data Packages
+
-
gdata                  Various R programming tools for data
+
-
                        manipulation
+
-
genetics                Population Genetics
+
-
ggplot2                An implementation of the Grammar of Graphics
+
-
gmodels                Various R programming tools for model fitting
+
-
gplots                  Various R programming tools for plotting data
+
-
gregmisc                Gregs Miscellaneous Functions
+
-
gtools                  Various R programming tools
+
-
haplo.stats            Statistical Analysis of Haplotypes with Traits
+
-
                        and Covariates when Linkage Phase is Ambiguous
+
-
happy                  Quantitative Trait Locus genetic analysis in
+
-
                        Heterogeneous Stocks
+
-
hdf5                    HDF5
+
-
its                    Irregular Time Series
+
-
latticeExtra            Extra Graphical Utilities Based on Lattice
+
-
limma                  Linear Models for Microarray Data
+
-
lme4                    Linear mixed-effects models using S4 classes
+
-
lmtest                  Testing Linear Regression Models
+
-
mapdata                Extra Map Databases
+
-
mapproj                Map Projections
+
-
maps                    Draw Geographical Maps
+
-
misc3d                  Miscellaneous 3D Plots
+
-
mnormt                  The multivariate normal and t distributions
+
-
msm                    Multi-state Markov and hidden Markov models in
+
-
                        continuous time
+
-
multcomp                Simultaneous Inference in General Parametric
+
-
                        Models
+
-
multicore              Parallel processing of R code on machines with
+
-
                        multiple cores or CPUs
+
-
mvtnorm                Multivariate Normal and t Distributions
+
-
plyr                    Tools for splitting, applying and combining
+
-
                        data
+
-
proto                  Prototype object-based programming
+
-
psy                    Various procedures used in psychometry
+
-
pvclust                Hierarchical Clustering with P-Values via
+
-
                        Multiscale Bootstrap Resampling
+
-
qtl                    Tools for analyzing QTL experiments
+
-
quadprog                Functions to solve Quadratic Programming
+
-
                        Problems.
+
-
qvalue                  Q-value estimation for false discovery rate
+
-
                        control
+
-
randomForest            Breiman and Cutlers random forests for
+
-
                        classification and regression
+
-
relimp                  Relative Contribution of Effects in a
+
-
                        Regression Model
+
-
reshape                Flexibly reshape data.
+
-
reshape2                Flexibly reshape data: a reboot of the reshape
+
-
                        package.
+
-
rggobi                  Interface between R and GGobi
+
-
rgl                    3D visualization device system (OpenGL)
+
-
rkward                  Provides functions related to the RKWard GUI
+
-
rkwardtests            RKWard Plugin Test Suite Framework
+
-
rms                    Regression Modeling Strategies
+
-
robustbase              Basic Robust Statistics
+
-
rotRPackage            Statistical functions needed by the OpenTURNS
+
-
                        project, see www.openturns.org
+
-
rsprng                  R interface to SPRNG (Scalable Parallel Random
+
-
                        Number Generators)
+
-
sandwich                Robust Covariance Matrix Estimators
+
-
slam                    Sparse Lightweight Arrays and Matrices
+
-
sm                      Smoothing methods for nonparametric regression
+
-
                        and density estimation
+
-
sn                      The skew-normal and skew-t distributions
+
-
snow                    Simple Network of Workstations
+
-
sp                      classes and methods for spatial data
+
-
stabledist              Stable Distribution Functions
+
-
stringr                Make it easier to work with strings.
+
-
strucchange            Testing, Monitoring, and Dating Structural
+
-
                        Changes
+
-
timeDate                Rmetrics - Chronological and Calendar Objects
+
-
timeSeries              Rmetrics - Financial Time Series Objects
+
-
tkrplot                TK Rplot
+
-
tseries                Time series analysis and computational finance
+
-
timeSeries              Rmetrics - Financial Time Series Objects
+
-
tkrplot                TK Rplot
+
-
tseries                Time series analysis and computational finance
+
-
vcd                    Visualizing Categorical Data
+
-
zoo                    S3 Infrastructure for Regular and Irregular
+
-
                        Time Series (Zs ordered observations)
+
-
 
+
-
Packages in library '/usr/lib/R/library':
+
-
 
+
-
KernSmooth              Functions for kernel smoothing for Wand & Jones
+
-
                        (1995)
+
-
MASS                    Support Functions and Datasets for Venables and
+
-
                        Ripleys MASS
+
-
Matrix                  Sparse and Dense Matrix Classes and Methods
+
-
base                    The R Base Package
+
-
boot                    Bootstrap Functions (originally by Angelo Canty
+
-
                        for S)
+
-
class                  Functions for Classification
+
-
cluster                Cluster Analysis Extended Rousseeuw et al.
+
-
codetools              Code Analysis Tools for R
+
-
compiler                The R Compiler Package
+
-
datasets                The R Datasets Package
+
-
foreign                Read Data Stored by Minitab, S, SAS, SPSS,
+
-
                        Stata, Systat, dBase, ...
+
-
grDevices              The R Graphics Devices and Support for Colours
+
-
                        and Fonts
+
-
graphics                The R Graphics Package
+
-
grid                    The Grid Graphics Package
+
-
lattice                Lattice Graphics
+
-
methods                Formal Methods and Classes
+
-
mgcv                    Mixed GAM Computation Vehicle with GCV/AIC/REML
+
-
                        smoothness estimation
+
-
nlme                    Linear and Nonlinear Mixed Effects Models
+
-
nnet                    Feed-forward Neural Networks and Multinomial
+
-
                        Log-Linear Models
+
-
parallel                Support for Parallel computation in R
+
-
rpart                  Recursive Partitioning
+
-
spatial                Functions for Kriging and Point Pattern
+
-
                        Analysis
+
-
splines                Regression Spline Functions and Classes
+
-
stats                  The R Stats Package
+
-
stats4                  Statistical Functions using S4 Classes
+
-
survival                Survival Analysis
+
-
tcltk                  Tcl/Tk Interface
+
-
tools                  Tools for Package Development
+
-
utils                  The R Utils Package
+
-
 
+
-
</source>
+
-
 
+
== Which R are you using? ==
== Which R are you using? ==

Revision as of 12:45, 10 July 2014

Contents

Looking at installed packages

You can see the list of installed R libraries by the library() call in R

library();

We have a lot of packages already installed, you can ask us to install more, or just install them quickly in your homedir.

Which R are you using?

Try run

 which R

Try run

 R --version

Installing CRAN Packages

Most CRAN packages can be installed per-user by running install.packages() in an interactive session:

install.packages("package_name", dependencies = TRUE)

R initially attempts to install to /usr/local/lib/R, but will prompt for the creation of a library subdirectory in ~/R (if necessary) and fall back to installation there when the initial attempt fails. If your package requires dependencies available from the standard Ubuntu repositories you can submit a HelpSU ticket requesting installation. We can install from the Debian/Ubuntu package repositories or into the shared FarmShare fs.

You can, of course, install R libraries into any arbitrary path and just add that path to your R env. That will probably break the next time R is upgraded to a new version, since your packages are built with the older version.

NOTE: when you install a package in corn, it will be available to you in Barley.

R Sample Job

Here's an example R file that generates a large array, fills it with some random numbers, then sleeps for 5mins. This happens to use up almost exactly 8GB of RAM.

Save this as 8GB.R:

x <- array(1:1073741824, dim=c(1024,1024,1024)) 
x <- gaussian()
Sys.sleep(300)

Here's an example SGE submit script that runs that R file.

#!/bin/bash

# use the current directory
#$ -cwd
#$ -S /bin/bash

# mail this address
#$ -M chekh@stanford.edu
# send mail on begin, end, suspend
#$ -m bes

R --vanilla --no-save < 8GB.R

You can submit it with just

 qsub r_test.script

Here are the output files that I get, one from stderr, one from stdout

$ cat r_test.script.o497 
R version 2.12.1 (2010-12-16)
Copyright (C) 2010 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.


> x <- array(1:1073741824, dim=c(1024,1024,1024)) 
> x <- gaussian()
> Sys.sleep(300)
>


In the mail that you get about the ending of the job, the maxvmem number is actually incorrect, it is a known bug in this version of SGE. The R script on this page actually uses 8GB of vmem.

Another R Sample Job

R script, let's call it R-rjags.R

print("Hello World")
library(rjags)
#this just loaded some settings from that library
print("Finished")

Job script, let's call it R-jags.submit.script

#!/bin/bash

# use the current directory
#$ -cwd
#$ -S /bin/bash

# mail this address
#$ -M chekh@stanford.edu
# send mail on begin, end, suspend
#$ -m bes

R --vanilla --no-save < R-jags.R

Submit it to the test queue with a small memory requirement:

 qsub -l mem_free=200M -l testq=1 R-jags.submit.script


Looking at the output files, it errored out because R can't find the package rjags. You have two alternatives:

  • include the R library from /mnt/glusterfs/software
  • use modules to specify the full R install from /mnt/glusterfs/software

The first way, you would add this line to your R script:

 .libPaths(c("/mnt/glusterfs/software/free/R-2.15.0/lib/R/library", "/usr/lib/R/library"))

The second way, your script will look like this:

$ cat R-jags.submit.script
#!/bin/bash

# use the current directory
#$ -cwd
#$ -S /bin/bash

# mail this address
#$ -M chekh@stanford.edu
# send mail on begin, end, suspend
#$ -m bes

eval `tclsh /mnt/glusterfs/software/free/modules/tcl/modulecmd.tcl sh autoinit`
module load R-2.15.0
R --vanilla --no-save < R-jags.R

Links

Some other departments have some other more detailed examples:

building our local R

Here's how I usually do it.

2014-07-10

R 3.1.1 released today, I compiled it as chekh on corn40 (Ubuntu 13.10)

R is now configured for x86_64-unknown-linux-gnu

  Source directory:          .
  Installation directory:    /usr/local

  C compiler:                gcc -std=gnu99  -g -O2
  Fortran 77 compiler:       gfortran  -g -O2

  C++ compiler:              g++  -g -O2
  C++ 11 compiler:           g++  -std=c++11 -g -O2
  Fortran 90/95 compiler:    gfortran -g -O2
  Obj-C compiler:	     gcc -g -O2 -fobjc-exceptions

  Interfaces supported:      X11, tcltk
  External libraries:        readline, ICU, lzma
  Additional capabilities:   PNG, JPEG, TIFF, NLS, cairo
  Options enabled:           shared R library, shared BLAS, R profiling

  Recommended packages:      yes
  • make
  • write /farmshare/software/mf/saucy/r/3.1.1.lua

lapack issues

If you see messages like:

  unable to load shared object '/usr/lib/R/modules//lapack.so':

most likely you're mixing R versions and libraries.

Double check that you are not setting R library path to point to directories with older libraries.

This test script should run fine if you have everything set correctly

$ cat lapack.r 
data(iris)
zz = lm(Sepal.Length ~., data = iris) 
summary(zz)

$ R --no-save < lapack.r 
Toolbox
LANGUAGES