Download the ipython notebook.
These are just some of the things I find useful. Feel free to search around for others.
R is a fully functional programming language so one can define functions in it. For example, you might get tired of always typing http://stats191.stanford.edu/data. You could make a small function
useful_function = function(dataname) {
return(paste("http://stats191.stanford.edu/data/", dataname, sep = ""))
}
useful_function("heights.table")
## [1] "http://stats191.stanford.edu/data/heights.table"
Let’s load the heights data with less code
h.table = read.table(useful_function("heights.table"), header = T, sep = ",")
When working on a particular project or assignment, it is often easiest to type commands in a text editor and rerun them several times. The command source is an easy way to do this, and it takes either the name of a file or a URL as argument. Suppose we have a webpage http://stats191.stanford.edu/R/mycode.R with the function dataurl above:
dataurl = function(dataname) {
return(paste("http://stats191.stanford.edu/data/", dataname, sep=''))
}
Then, we can execute this as follows
source("http://stats191.stanford.edu/R/mycode.R")
dataurl("heights.table")
## [1] "http://stats191.stanford.edu/data/ heights.table"
As you go through the course, you might add some other useful functions to this file.
The function c, concatenation, is used often in R, as are rep and seq
X = 3
Y = 4
c(X, Y)
## [1] 3 4
rep(1, 4)
## [1] 1 1 1 1
rep(2, 3)
## [1] 2 2 2
c(rep(1, 4), rep(2, 3))
## [1] 1 1 1 1 2 2 2
seq(0, 10, length = 11)
## [1] 0 1 2 3 4 5 6 7 8 9 10
seq(0, 10, by = 2)
## [1] 0 2 4 6 8 10
You can sort and order sequences
X = c(4, 6, 2, 9)
sort(X)
## [1] 2 4 6 9
Use an ordering of X to sort a list of Y in the same order
Y = c(1, 2, 3, 4)
o = order(X)
X[o]
## [1] 2 4 6 9
Y[o]
## [1] 3 1 2 4
R has a very rich plotting library. Most of our plots will be fairly straightforward, “scatter plots”.
X = c(1:40)
Y = 2 + 3 * X + rnorm(40) * 10
plot(X, Y)
The plots can be made nicer by adding colors and using different symbols. See the help for function par.
plot(X, Y, pch = 21, bg = "red")
plot(X, Y, pch = 23, bg = "red")
You can add titles, as well as change the axis labels.
plot(X, Y, pch = 23, bg = "red", main = "A simulated data set", xlab = "Predictor",
ylab = "Outcome")
Lines are added with abline. We’ll add some lines to our previous plot: a yellow line with intercept 2, slope 3, width 3, type 2, as well as a vertical line at x=20 and horizontal line at y=60.
plot(X, Y, pch = 23, bg = "red", main = "A simulated data set", xlab = "Predictor",
ylab = "Outcome")
abline(2, 3, lwd = 3, lty = 2, col = "yellow")
abline(h = 60)
abline(v = 20)
You can add points and lines to existing plots.
plot(X[1:20], Y[1:20], pch = 21, bg = "red", xlim = c(min(X), max(X)), ylim = c(min(Y),
max(Y)))
points(X[21:40], Y[21:40], pch = 21, bg = "blue")
lines(X[21:40], Y[21:40], lwd = 2, lty = 3, col = "orange")
You can put more than one plot on each device. Here we create a 2-by-1 grid of plots
par(mfrow = c(2, 1))
plot(X, Y, pch = 21, bg = "red")
plot(Y, X, pch = 23, bg = "blue")
par(mfrow = c(1, 1))
Plots can be saved as pdf, png, jpg among other formats. Let’s save a plot in a file called “myplot.jpg”
jpeg("myplot.jpg")
plot(X, Y, pch=21, bg='red')
dev.off()
Several plots can be saved using pdf files. This example has two plots in it.
pdf("myplots.pdf")
# make whatever plot you want
# first page
plot(X, Y, pch=21, bg='red')
# a new call to plot will make a new page
plot(Y, X, pch=23, bg='blue')
# close the current "device" which is this pdf file
dev.off()
This should cover a lot of our plotting needs.
It is easy to use for loops in R
for (i in 1:10) {
print(i^2)
}
## [1] 1
## [1] 4
## [1] 9
## [1] 16
## [1] 25
## [1] 36
## [1] 49
## [1] 64
## [1] 81
## [1] 100
for (w in c("red", "blue", "green")) {
print(w)
}
## [1] "red"
## [1] "blue"
## [1] "green"
Note that big loops can get really slow, a drawback of many high-level languages.
R has a builtin help system, which can be accessed and searched as follows
> help(t.test)
> help.search('t.test')
Many objects also have examples that show you their usage
> example(lm)
In practice, we will often be using the distribution (CDF), quantile (inverse CDF) of standard random variables like the T, F, chi-squared and normal.
The standard 1.96 (about 2) standard deviation rule for
: (note that 1-0.05/2=0.975)
qnorm(0.975)
## [1] 1.96
We might want the
upper quantile for an F with 2,40
degrees of freedom:
qf(0.95, 2, 40)
## [1] 3.232
So, any observed F greater than 3.23 will get rejected at the
level. Alternatively, we might have observed an F of
5 with 2, 40 degrees of freedom, and want the p-value
1 - pf(5, 2, 40)
## [1] 0.01153
Let’s compare this p-value with a chi-squared with 2 degrees of freedom, which is like an F with infinite degrees of freedom in the denominator (send 40 to infinity). We also should multiply the 5 by 2 because it’s divided by 2 (numerator degrees of freedom) in the F.
1 - pchisq(5 * 2, 2)
## [1] 0.006738
1 - pf(5, 2, 400)
## [1] 0.007165
Other common distributions used in applied statistics are norm, t.