Start with R
Useful web sites for R:
- Quick-R
- R-bloggers
- r-tutor
- Kickstarting R
- cran.r-project.org
- r books
- Andrew Field, youtube videos
- Simply Statistics
- idre stat
- programiz
- shiny
- learn graph
- library: lattice, ggvis, ggplot2
About Package
- install.packages(“ggplot2”), you can also install with RStudio menu: Tools -> Install Packages…
- library(ggplot2)
About work path:
- getwd(): get the work path
- setwd(“xxx”): set the work path
- list.files(): list all the files in work path
- list(): list functions
- dir(): list files
About help:
- help(sum): get the help of a function. (“sum” is a example function name)
- example(sum): get the example of a function. (“sum” is a example function name)
load R script:
- xxx.R: standard name for R extension
- source(“xxx.R”): load the R script
Read data:
- read.csv(“xxx”)
- read.table(“xxx”, …)
Get default databases:
- >library(datasets)
>data(mtcars,iris)
type of data:
- class(xxx)
- typeof(xxx)
create vector:
- x <- c(a,b)
- x <- vector(“numeric”, length=10)
- as.numeric(x); as.logical(x); as.character(x) : coertion
create List:
- x <- list(1, “a”, T, 1+4i)
- contrast to member can have different type
create matrices:
- m <- matrix(1:6, nrow = 2, ncol = 3)
- dim(m) : dimension
- attributes(m)
- $dim
2 3
- $dim
- m <- 1:10
dim(m) <- c(2,5) - x <- 1:3
y <- 10:12
cbind(x,y)
rbind(x,y)
Factors
- x <- factor( c(“yes”,”yes”,”no”,”yes”,”no”) )
- table(x): result: no 2, yes 3
- unclass(x) : result: 2 2 1 2 1
- attr(,”levels”) : result: “no”, “yes”
- x <- factor(c(“yes”,”yes”,”no”,”yes”,”no”), levels = c(“yes”, “no”) ) : “yes” will be the first level, and “no” will be the second level
Data Frames
- > df <- data.frame(vector1, vector2,…)
- > x <- data.frame(foo=1:4, bar = c(T,T,F,F))
- >data.matrix(x) # convert to matrix, all the data type must be same because of the coertion
- > nrow(x), ncol(x)
- > subset(mtcars, mtcars$cyl==8) ##Get a subset of data set “mtcars”, where column “cyl” (number of cylinders) is 8, for more information about mtcars, please use help(mtcars)
- > mtcars[mtcars$cyl==8, ]
- > head(mtcars, n=5L ) # get the first n rows of the dataset
- >tail(mtcars, n=5L) # get the last n rows of the datset
- > str(mtcars) # get the overview of the data set
- > summary(mtcars) #get the min, 1st Qua, median, mean, 3 Qua, max for each column
- > nrow(na.omit(mtcars)) #discard the rows with missing values
- > dim(mtcars) # get the dimension of the data set
- > attributes(mtcars)
- more about subset data frame
Missing Values:
- > is.na(airquality) # return a boolean vector or matrix about the missing value
- > complete.cases(vector1, vector2,…) # return a boolean vector about the missing values in every position
- > good <- complete.cases(airquality) # return a boolean vector about wether the rows has missing values
> airquality[good, 1:6 ] - sapply(airquality, function(x) sum(is.na(x))) #elegent check the missing values for each column
- apply(is.na(airquality),2,sum)
- length(which(is.na(airquality[1]))) or length(which(is.na(airquality[1])==T))
Names Attribute:
- names to a vector: x <- 1:3
names(x) <- c(“foo”, “bar”, “norf”)
names(x) : “foo” “bar” “norf” - names to a list: x <- list(a=1, b=2, c=3)
- names to a matrix:
m <- matrix(1:4, nrow=2, ncol=2)
dimnames(m) <- list(c(“a”, “b”), c(“c”, “d”))
Reading Data:
- read.table(), read.csv() <-> write.table()
- readLines() <-> writeLines()
- sourse(): for reading in R code files <-> dump()
- dget(): for reading in R code files,(R object but have been deparsed into text files) <-> dput()
- load, for reading in saved workspaces <-> save
- unseriealize <-> serialize
Textual Formats:
- dput(y)
- dput(y, file = “y.R”)
- dget(“y.R”)
- dump() and source(): deparse multiple objects and read it back:
x <- “foo”
y <- data.frame(a=1, b=”a”)
dump(c(“x”,”y”), file = “data.R”)
rm(x,y)
source(“data.R”)
Connections:
- file(), gzfile(), bzfile(), url()
- open mode: “r” read only; “w” writing (and initializing a new file); “a” appending; “rb”, “wb”, “ab” reading, writing, or appending in binary mode (windows)
Matrix vectorized operation:
- x <- matrix(1:4, 2,2); y <- matrix(rep(10,4),2,2)
x*y : element-wise multiplication
x/y
x %*% y : true matrix multiplication