group <- c(rep("A", times = 50), rep("B", times = 50))
var1 <- rnorm(n = 100, mean = 0, sd = 1)
var2 <- rnorm(n = 100, mean = 25, sd = 5)
dat <- data.frame(group, var1, var2)
Make a new data frame comprised of column 1 (group) and column 3 (var2):
newdat <- dat[,c(1,3)]
Make a new data frame comprised of column 1 (group) and column 3 (var2) and rows 1 to 50:
newdat <- dat[1:50,c(1,3)]
Subsets of data frames can be made using the subset function:
sdat <- subset(dat, subset = (var1 != 0 & var2 <= 25))
Alternative:
id <- dat$var1 != 0 & dat$var2 <= 25
sdat <- data.frame(dat[id,])
Generate a sequence of numbers from 0 to 10:
seq(from = 0, to = 10, by = 1)
Generate a sequence of numbers from 10 to 0:
seq(from = 10, to = 0, by = -1)
Generate a sequence of 6 numbers from 0 to 10:
seq(from = 0, to = 10, length = 6)
Generate a sequence of numbers from 0 to 10 by 2.5:
seq(from = 0, to = 10, by = 2.5)
x = 1:4
rep(x, times = 3)
rep(x, times = c(2,3,3,2))
rep(x, times = 3)
rep(x, times = rep(3, length(x)))
x = "abc"
y = rep(x, times = 3)
x = 10 * (1:10)
y = c(rep(1, times = 5), rep(2, times = 5))
x; y
Select those elements of x where y equals 1:
x[y == 1]
Select those elements of x where y equals 2:
x[y == 2]
Subsetting is often used as follows in data analysis:
mean(weight[sex == "male" & age < 30])
This can be read as: return the mean weight for males under age 30.
Vectors are the most common objects encountered in R, but sometimes the requirement that all elements must have the same type is too restrictive. There is another structure called a list which can be:
L = list(a = 1:3, b = "hello"); L
Matrices:
A = matrix(1:6, nrow = 3, ncol = 2); A
By default, matrices are created with their values running down successive columns. It is also possible to specify that the matrix be filled by rows.
B = matrix(1:6, nrow = 3, ncol = 2, byrow = TRUE); B
The number of rows and columns in a matrix can be obtained with the functions nrow and ncol. Both values can be obtained simultaneously with the function dim:
A = matrix(1:6, nrow = 3, ncol = 2)
nrow(A); ncol(A); dim(A)
Matrices can be made rather more useful by using row and column labels. A matrix can be labelled as follows:
A = matrix(1:6, nrow = 3)
dimnames(A) = list(c("sex", "drugs", "rock&roll"), c("this", "that"))
Rotate matrix clockwise by 90°:
dat <- matrix(c(1,2,3,4,5,6,7,8), ncol = 2); dat
nrow <- dim(dat)[1]
dat <- t(dat[nrow:1,])
dat
Sort data frame dat in order of variable peid:
dat <- dat[sort.list(dat$peid),]
Multiple sorting:
dat <- dat[order(dat$peid, dat$id),]
Here we used the cut function. Note that six cut points results in five levels:
time.lev <- factor(cut(dat$time, breaks = c(0, 182, 364, 547, 729, 1500), labels = c("1","3","5","7","9")))
dat <- data.frame(dat, time.lev)
Now set the contrasts for the factor you've just created. In this example the second level (i.e. time.lev == 3) is set as the baseline (reference) category:
dat$time.lev <- factor(dat$time.lev, levels = c(3,1,5,7,9), labels = c("3","1","5","7","9"))
You have a column of data called V7 in which some rows contain the text:
day=2
You want to strip the day= so that just the numeric value remains. Use function gsub:
new <- gsub(pattern = "day=", replacement = "", x = V7)
Create a sequence of dates and re-format from 2001-03-01 to 01-Mar. Calculate the number of days between each element of the generated date sequence and 18 Feb 2001.
date <- seq(from = as.Date("1/3/2001", format = "%d/%m/%Y"), to = as.Date("1/1/2002", format = "%d/%m/%Y"), by = "1 month")
date.labels <- format(date, format = "%d-%b")
days <- as.integer(date - as.Date("18/2/01", format = "%d/%m/%y"))
The objects days and date.labels can be passed to the function axis:
par(cex.axis = 0.80, cex.lab = 0.80)
plot(c(0, 302), c(0, 2500), type = "n", xaxt = "n", xlab = "Date", ylab = "Cumulative number of infected premises")
axis(side = 1, at = days, labels = date.labels, las = 1, srt = 0, tick = TRUE)
legend(x = 180, y = 750, legend = c("Actual", "Predicted"), lwd = c(3,1), col = c(8,1), bty = "n", cex = 0.80)
Calculate age as at today's date and convert to years:
date.today <- Sys.Date()
age <- (date.today - bdays.julian)
age <- trunc(as.numeric(age) / 365.25)
data.frame(birthday = bdays, standard = bdays.julian, julian = as.numeric(bdays.julian), age = age)
You have a series of unique values listed with an identifier in data frame A. In data frame B you have a list of identifiers (with duplicates) and you wish to record unique value along side the identifier:
names(parasite)
[1] "infection" "age" "weight" sex"
Return the position in the vector called age where age == 1 first appears:
match(1, parasite$age)
[1] 29
Return the sex of the first individual with age == 1:
parasite$sex[match(1, parasite$age)]
[1] male
You have a list of animal identifiers comprised of an irregular sequence of numbers. You want to re-assign these animals with identifiers numbered 1 to n. Make a list of the current identifiers and create a data frame listing the old and the new identifiers:
old <- c(dat$Srcid, dat$Desid)
old <- sort(unique(old))
fix <- as.data.frame(cbind(new = 1:length(old), old))
Re-assign the identifers:
dat$Srcid. <- fix$new[match(dat$Srcid, fix$old)]
dat$Desid. <- fix$new[match(dat$Desid, fix$old)]
grp <- rep(c("A","B","C"), times = 5)
grp[2:3] <- NA
table(factor(grp, exclude = NA))
table(factor(grp, exclude = NaN))
Create a data frame listing key, xcoord, and ycoord. The variable key may appears multiple times in this data set and we want to remove duplicates:
geo <- dat[,c(1,3,4)]
id <- match(unique(geo$key), geo$key)
geo <- geo[id, 1:3]
Compare two vectors:
farms <- 1:95
sample <- c(1,5,10)
id <- farms %in% sample
Extract data from a ZIP file where the files of interest are within a folder inside the ZIP file. Set the working folder and make a list of files in the working directory:
setwd("D:\\TEMP")
ifname <- list.files(path = getwd())
Process each ZIP file in the working directory:
for(i in 1:length(ifname)){
# Define the stem of the file do we want to extract:
stem <- substr(ifname[i], 1, 3)
ofile <- paste(stem, "/", sep = "")
# Open zip file and extract contents to working directory:
unzip(ifname[i], junkpaths = FALSE)
}
Say, for example, we have a list of folders and within each folder there are files we want to copy to a working directory:
setwd("D:\\TEMP")
ifname <- list.files(path = getwd())
Process each folder in the working directory:
for(i in 1:length(ifname)){
dfrom <- paste("D:\\TEMP\\DEM\\", ifname[i], "\\", sep = "")
flist <- list.files(path = paste("D:\\TEMP\\DEM\\", ifname[i], sep = ""))
ffrom <- paste(dfrom, flist, sep = "")
dto <- "D:\\TEMP\\DEM\\"
file.copy(from = ffrom, to = dto)
}
Data frame customer:
| custid | name | age |
| 1 | Joe | 23 |
| 2 | Mika | 45 |
| 3 | Lin | 34 |
| 4 | Sarah | 56 |
| 5 | Susan | 18 |
customer <- data.frame(custid = 1:5, name = c("Joe", "Mika", "Lin", "Sarah", "Susan"), age = c(23,45,34,56,18))
Data frame purchase:
| purchid | custid | purchase |
| 1 | 3 | fiction |
| 2 | 1 | biography |
| 3 | 1 | fiction |
| 4 | 2 | biography |
| 5 | 3 | fiction |
| 6 | 4 | fiction |
purchase <- data.frame(purchid = 1:6, custid = c(3,1,1,2,3,4), purchase = c("fiction", "biography", "fiction", "biography", "fiction", "fiction"))
Sort by one column, preserving row integrity:
customer[order(customer[,2]),]
Returns:
| custid | name | age |
| 1 | Joe | 23 |
| 3 | Lin | 34 |
| 2 | Mika | 45 |
| 4 | Sarah | 56 |
| 5 | Susan | 18 |
subset(customer, custid == 5)
Returns:
| custid | name | age |
| 5 | Susan | 18 |
merge(x = purchase, y = customer, by.x = "custid", by.y = "custid", all = FALSE)
Returns:
| custid | purchid | purchase | name | age |
| 1 | 2 | biography | Joe | 23 |
| 1 | 3 | fiction | Joe | 23 |
| 2 | 4 | biography | Mika | 45 |
| 3 | 1 | fiction | Lin | 34 |
| 3 | 5 | fiction | Lin | 34 |
| 4 | 5 | fiction | Sarah | 56 |
merge(x = purchase, y = customer, by.x = "custid" , by.y = "custid", all = TRUE)
Returns:
| custid | purchid | purchase | name | age |
| 1 | 2 | biography | Joe | 23 |
| 1 | 3 | fiction | Joe | 23 |
| 2 | 4 | biography | Mika | 45 |
| 3 | 1 | fiction | Lin | 34 |
| 3 | 5 | fiction | Lin | 34 |
| 4 | 5 | fiction | Sarah | 56 |
| 5 | NA | NA | Sarah | 56 |
custsum <- merge(x = purchase, y = customer, by.x = "custid", by.y = "custid", all = FALSE)
xtabs(~name, custsum)
Returns:
| Joe | Lin | Mika | Sarah | Susan |
| 2 | 2 | 1 | 1 | 0 |
customer[1,]$age <- 99
customer
Returns:
| custid | name | age |
| 1 | Joe | 99 |
| 2 | Mika | 45 |
| 3 | Lin | 34 |
| 4 | Sarah | 56 |
| 5 | Susan | 18 |
newcust <- data.frame(custid = 6, name = "Terry", age = 50)
rbind(newcust, customer)
Returns:
| custid | name | age |
| 6 | Terry | 50 |
| 1 | Joe | 99 |
| 2 | Mika | 45 |
| 3 | Lin | 34 |
| 4 | Sarah | 56 |
| 5 | Susan | 18 |
subset(customer, custid != 1)
Returns:
| custid | name | age |
| 2 | Mika | 45 |
| 3 | Lin | 34 |
| 4 | Sarah | 56 |
| 5 | Susan | 18 |