Stockholms Univ., Statistiska Inst. Finansiell Statistik Instruktioner till R Nicklas Pettersson 1 Extra Instruktioner till R En av fördelarna med R är att det är gratis, varför vem som helst kan ladda hem det och använda det på vilken dator som helst utan att frågor om upphovsrätt och annat uppkommer. Dessutom är programmet mycket mer exibelt än många andra statistikprogram. Programmets egen hemsida med nedladdning, diverse manualer, newsgroup för att ställa frågor (kräver registrering), tilläggspacket mm är: http://www.r-project.org/ och en länk till den senaste windowsversionen (2008-12-10) är: http://ftp.sunet.se/pub/lang/cran/bin/windows/base/r-2.8.0-win32.exe Om man vill använda en annan editor än den som nns i R så rekommenderas Tinn-R. (Denna nns dock inte installerad i datorsalen, men kanske kommer.) Kod ler skapas i R genom att välja File/New script. Hela eller delar av koden körs genom att markera koden och sedan trycka ctrl + r, eller så väljs kommandet Edit/Run via menyn. 1.1 Instruktioner och manualer En uppsjö av böcker och annan litteratur nns skrivet om programmet, men för att lösa inlämningsuppgiften kommer det att vara tillräckligt att man följer med på datorövningarna. Sidan http://gauss.stat.su.se/f/r/ är en studiecirkel som statistiska institutionen hade under 2008. Framförallt kan delar av slides 1 och 2 vara intressanta för er (kanske också de slides som berör gra k). Den text som rekommenderas för alla nybörjare är "An introduction to R" som nås genom att klicka på "Manuals" via programmets hemsida (eller via denna länk http://cran.r-project.org/doc/manuals/r-intro.pdf ). Här nns också diverse andra manualer. 1
En sida jag tycker är bra (i synnerhet för den som tidigare använd t ex SAS eller SPSS) är http://www.statmethods.net/index.html. Här åter nns bl a manualen http://oit.utk.edu/scc/rforsas&spssusers.pdf. På nationalekonomiska nns följande manual med fokus på ekonometri: http://www.ne.su.se/education/grundniva/nekii_iii/ht08/empirical1/r_intro.pdf På Uppsala Universitet nns följande sida om "STATISTICAL PRO- GRAMMING with R" http://www.dis.uu.se/kurser/kurs.php?cid=380. Dessutom är programmets egna inbyggda hjälpfunktioner ofta bra. Skriv helt enkelt "?kommando" om ni vill ha hjälp med ett speci kt kommando. Vidare går mycket att hitta genom att söka på nätet. Ofta hamnar man då i newsgroupen. 1.2 General commands # anything everything to the right of the sign # is ignored by R (in this case the word anything). So # can be used for making comments?command or help(command) # get help with the command example(command) # get an example of command rm(anything) # remove the object anything rm(list=ls()) # remove (almost) everything in workspace, use with caution!! ls() # list all objects in workspace 1.3 Commands for calculations and how to deal with vectors 1 +2 # one plus two x <- 1+2 # x = 1+2 1+2 -> x y <- 2*x # two multiplied with x z <- 1:4 # z <- c(1,2,3,4) z<-seq(1,4,1) z <- seq(to=4,from=1,by=1) z*x # z multiplied with x z*x-1 # z multiplied with x, all elements minus one z*(x-1) # z multiplied with x minus one sqrt(z) # z^0.5 calculates squared root of z rep(z,each=3) # Repeat each element in z three times rep(z,times=2) # Repeat z two times rep(z,times=2,each=3) # Repeat each element in z three times, and repeat that two times z[1] # rst element in z 2
z[2] # second element in z (which in this example is missing) z[1:2] # z[c(1,2)] both rst and second element in z z[-3] # all elements in z except the third z<3 # which elements in z are <3 z[z>3] # all elements where z>3 z[z>3 j z<2] # all elements where z>3 or z<2 z[z>3 & z<2] # all elements where z>3 and z<2 (must be none) length(z) # gives length of z is.na(z) # does z include missing values?!is.na(z) # which values in z are not missing values? is.na(z[1:5]) # are the ve rst elements in z missing values? is.nan(z) # is z not a number? y <- c(y,1,na,342) # set y to include elements y(the old one), 1, a missing value and 342 z[is.na(y)==true] # z[is.na(y)==t] show only elements in z for which y is a missing value 1.4 Matrices and datasets rbind(z,y) # set z and y to become rows in a matrix cbind(z,y) # set z and y to be columns in a matrix ZY <- cbind(z,y) # put the vectors in a matrix named ZY. Note that capital and small letters are important so that ZY, Zy, zy and zy are four di erent objects in R. dim(zy) # dimensions of ZY (unless object is a simple vector, then NULL) dim(zy) <- c(1,8) # change dimension of ZY to one row, eight columns dim(zy) <- c(4,2) # change dimension of ZY to four rows, two columns ZY*x # multiplication with x ZY*1:4 # multiplication with 1:4 t(zy)*1:4 # transpose ZY and then multiply with 1:4 t(t(zy)*1:4) # same as above but all transposed (back) ZY[is.na(ZY)==T]<-0 # set all missing values in ZY to 0 ZY[3,2] <- NA # put back the missing value on the third row in the second column x(zy) # Get a window to manipulate ZY, this is similar to "all" other statistical programs ZY <- as.data.frame(zy) # turn the matrix ZY into a data frame. Functions in R are often generic, which means that it might treat the objects di erently depending on the class (vector, matrix, data frame, etc) 3
1.5 Basic statistical commands a <- runif(100) # set a to be 100 random uniform numbers sum(a) # sum of a mean(a) # mean of a median(a) # median of a quantile(a) # usual quantiles of a quantile(a,c(0,0.1,0.6,0.9)) # quantiles 0, 0.1, 0.6 and 0.9 max(a) # max of a min(a) # min of a var(a) # variance of a sd(a) # standard deviation of a cor(a) # correlation of a (why doesn t this work?) b <- rnorm(100) # set b to be 100 random normal numbers ab <- cbind(a,b) # set a and b to be columns in a matrix ab colmeans(ab) # column means of ab colsums(ab) # column sums of ab rowmeans(ab) # row means of ab rowsums(ab) # row sums of ab What if we use the basic statistical commands? Should we coerce ab into a data.frame? 1.6 Tables tabledata <- data.frame(rep(1:10,each=10),sample(1:10,100,replace=true)) # rst column is (1 to 10, 10 times each), second column is (sample from 1 to 10, with replacement) names(tabledata) <- c(" rstcol","secondcol") # Give names to the columns table(tabledata) # cross tabulation of the data 1.7 Plots pie(c(1,4,3,2),labels=c("a","b","c","d")) # pieplot barplot(c(1,4,3,2),names.arg=c("a","b","c","d")) # barplot a2 <- rep(5:1,each=20)+a^2 # manipulate a and put it in a2 b2 <- 1:100+b # manipulate b and put it in b2 hist(a2) plot(a2) plot(a2,b2) # histogram of a2 # plot(a2) # scatterplot of a2 and b2 plot(a2,type="l",lty=1) # plot a2 as a line with linetype 1 in a graph 4
lines(b2,lty=2) # add b2 (default is line) with linetype 2 to the graph plot( a2, type="l",lty=1, ylim = c(min(b2),max(b2))); lines(b2,lty=2) # plot a2 and b2 to in same graph with adjusted y-limits legend(10,100,c("a2line","b2line"),lty=c(1,2)) # add a legend at x position 10 and y position 200, names and linetype as speci ed boxplot(a2) # Make a boxplot of a2 boxplot(data.frame(a2,b2)) # Make boxplots of both a2 and b2 in same graph # Here is how to take the rownames (i.e. dates) from variable, # put them in datum and use them in a plot. I also put labels # on the x and y axis. This example is for group 36, FondA. datum <- format.date(rownames(variable)) datum <- as.date(datum) plot(datum,variable[,2,36],xlab="2008",ylab="fonda",type="l",lty=1) # To add a line for FondB, write lines(datum,variable[,3,36],lty=2) # If the line doesn t show up, this is probably because it is out of range of the ylimits. Then change y-limits plot(datum,variable[,2,36],xlab="2008",ylab="fonda",type="l",lty=1,ylim=c(0,3000)) lines(datum,variable[,3,36],lty=2) 1.8 Some statistical models 1.8.1 Linear model MYlmMODELL <- lm(dep indep) # Linear model, where dep is a vector of length=n and indep can be vector (length=n) or matrix (with n rows) MYlmMODELL$ tted.values # Get tted values from MYlmMODELL 1.8.2 Holt Winter regression HWmodel <- HoltWinters(vector,alfa,beta,gamma) # Holt Winters exponential smoothing with parameters saved in HWmodel HWmodel <- HoltWinters(vector,0.2,0.4,0) # Holt Winters with alfa=0.2, beta=0.4 and non-seasonal model saved in HWmodel predict(hwmodel,nrahead,prediction.interval=t) # Predicts HWmodel, nrahead steps ahead, and if prediction.interval is T=TRUE prediction intervals are given. Thus to predict HWmodel one step ahead with 95% prediction interval you could write: predict(hwmodel,1,t) 5
1.8.3 ARIMA arima(vector,c(p,d,q)) # arima model where (p, d, q) are the AR order, the degree of di erencing, and the MA order. arima(vector,c(1,0,0)) # This is an autoregressive model of rst order. acf(vector) # autocorrelation function for vector pacf(vector) # partial autocorrelation function for vector 1.9 Save and load your data and code The best thing to do is to save your code in a le. If From within R But if you want to load or save a whole workspace, you can do it in the following way. Click on File/Save workspace and save your workspace as lename.rdata. The workspace includes all de ned variables. Or click on File/Load workspace/ lename.rdata to load a workspace into R. 1.10 Packages There are a lot of add-on packages that can be downloaded from http://www.rproject.org/. Click on Packages/Load package and select the package, if it is already installed. Otherwise click on Packages/Install package(s) and select the package. If the package is on the computer you could chose Packages/Install package(s) from local zip les) Some packages require other packages, but these are usually automatically installed. One example is the rgl package, which makes it possible to plot 3D. demo(rgl) # A demo of rgl, plots can be rotated Since no one excpet for the administrator are allowed to install programs, unfortunately you can t use packages in the computer labs. 6