Start with R

Useful web sites for R:

About Package

  • install.packages(“ggplot2”), you can also install with RStudio menu: Tools -> Install Packages…
  • library(ggplot2)

About work path:

  • getwd(): get the work path
  • setwd(“xxx”): set the work path
  • list.files(): list all the files in work path
  • list(): list functions
  • dir(): list files

About help:

  • help(sum): get the help of a function. (“sum” is a example function name)
  • example(sum): get the example of a function. (“sum” is a example function name)

load R script:

  • xxx.R: standard name for R extension
  • source(“xxx.R”): load the R script

Read data:

  • read.csv(“xxx”)
  • read.table(“xxx”, …)

Get default databases:

  • >library(datasets)
    >data(mtcars,iris)

type of data:

  • class(xxx)
  • typeof(xxx)

create vector:

  • x <- c(a,b)
  • x <- vector(“numeric”, length=10)
  • as.numeric(x); as.logical(x); as.character(x) : coertion

create List:

  • x <- list(1, “a”, T, 1+4i)
  • contrast to member can have different type

create matrices:

  • m <- matrix(1:6, nrow = 2, ncol = 3)
  • dim(m) : dimension
  • attributes(m)
    • $dim
      2 3
  • m <- 1:10
    dim(m) <- c(2,5)
  • x <- 1:3
    y <- 10:12
    cbind(x,y)
    rbind(x,y)

Factors

  • x <- factor( c(“yes”,”yes”,”no”,”yes”,”no”) )
  • table(x): result: no 2, yes 3
  • unclass(x) : result: 2 2 1 2 1
  • attr(,”levels”) : result: “no”, “yes”
  • x <- factor(c(“yes”,”yes”,”no”,”yes”,”no”), levels = c(“yes”, “no”) ) : “yes” will be the first level, and “no” will be the second level

Data Frames

  • > df <- data.frame(vector1, vector2,…)
  • > x <- data.frame(foo=1:4, bar = c(T,T,F,F))
  • >data.matrix(x) # convert to matrix, all the data type must be same because of the coertion
  • > nrow(x), ncol(x)
  • > subset(mtcars, mtcars$cyl==8) ##Get a subset of data set “mtcars”, where column “cyl” (number of cylinders) is 8, for more information about mtcars, please use help(mtcars)
  • > mtcars[mtcars$cyl==8, ]
  • > head(mtcars, n=5L ) # get the first n rows of the dataset
  • >tail(mtcars, n=5L) # get the last n rows of the datset
  • > str(mtcars) # get the overview of the data set
  • > summary(mtcars) #get the min, 1st Qua, median, mean, 3 Qua, max for each column
  • > nrow(na.omit(mtcars)) #discard the rows with missing values
  • > dim(mtcars) # get the dimension of the data set
  • > attributes(mtcars)
  • more about subset data frame

Missing Values:

  • > is.na(airquality) # return a boolean vector or matrix about the missing value
  • > complete.cases(vector1, vector2,…) # return a boolean vector about the missing values in every position
  • > good <- complete.cases(airquality) # return a boolean vector about wether the rows has missing values
    > airquality[good, 1:6 ]
  • sapply(airquality, function(x) sum(is.na(x))) #elegent check the missing values for each column
  • apply(is.na(airquality),2,sum)
  • length(which(is.na(airquality[1]))) or length(which(is.na(airquality[1])==T))

Names Attribute:

  • names to a vector: x <- 1:3
    names(x) <- c(“foo”, “bar”, “norf”)
    names(x) : “foo” “bar” “norf”
  • names to a list: x <- list(a=1, b=2, c=3)
  • names to a matrix:
    m <- matrix(1:4, nrow=2, ncol=2)
    dimnames(m) <- list(c(“a”, “b”), c(“c”, “d”))

Reading Data:

  • read.table(), read.csv() <-> write.table()
  • readLines() <-> writeLines()
  • sourse(): for reading in R code files <-> dump()
  • dget():  for reading in R code files,(R object but  have been deparsed into text files) <-> dput()
  • load, for reading in saved workspaces <-> save
  • unseriealize <-> serialize

Textual Formats:

  • dput(y)
  • dput(y, file = “y.R”)
  • dget(“y.R”)
  • dump() and source(): deparse multiple objects and read it back:
    x <- “foo”
    y <- data.frame(a=1, b=”a”)
    dump(c(“x”,”y”), file = “data.R”)
    rm(x,y)
    source(“data.R”)

Connections:

  • file(), gzfile(), bzfile(), url()
  • open mode: “r” read only; “w” writing (and initializing a new file); “a” appending; “rb”, “wb”, “ab” reading, writing, or appending in binary mode (windows)

Matrix vectorized operation:

  • x <- matrix(1:4, 2,2); y <- matrix(rep(10,4),2,2)
    x*y : element-wise multiplication
    x/y
    x %*% y : true matrix multiplication

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.