Download the script here.

Solutions can be downloaded here - but try to solve everything without the solutions first!


1 Objects in R: Basic Calculations

We will most often want to store results of calculations to reuse them later. For this, we can work with basic objects. An object has a name and a content. We can freely choose the name of an object and give certain rules - they have to start with a letter and include only letters, numbers and some special characters (“.”, “_“,”-“). R is case sensitive so “x” and “X” are different object names.

The content of an object is assigned using “<-” or “=” (for tidy readability, you should always use the former, though).

In order to assign the value of 5 to an object with name x do

x <- 5
# Which corresponds to
x = 5

If you already had an object named x before and want to change it, you can simply overwrite the old version.

Task Assign the value of 10 to the object x

x

Similarly, you can use an object to create other objects.

Task Multiply x with 5 and save it as b

b <- 

Task Print b by simply typing its name in the script

In R Studio all the object names are also shown in the “Workspace” window on the top right side.

If we want to delete certain objects we can do so with the function rm() (remove). rm(list=ls()) removes all objects currently saved in the list environment.

# Change and Delete objects:
rm(x) # Deletes an object
rm(list=ls()) # all objects are removed

2 Everything in R Is an Object

Coming from excel or stata, you might think that objects are objects storing data. However, everything in R is an object: functions, symbols, and even R expressions: functions, symbols, and even R expressions (see R in a Nutshell).

For example, function names in R are really symbol objects that point to function objects. (That relationship is, in turn, stored in an environment object.) You can assign a symbol to refer to a numeric object and then change the symbol to refer to a function:

Task Try to understand the different meanigs of x!

# Basic object creation
x <- 1
x

# Not working why
x(2)

# Making it work
x <- function(i) i^2

x

x(2)

3 Vectors

For statistical calculations, we obviously need to work with data sets including many numbers of instead of scalars. The simplest way to collect many numbers (or other types of information) is called a vector in R terminology (you have already been familiarized with vectors on the page before).

To define a vector, we can collect different values using c(value1, value2,...).

# Examples
a <- c(1, 2, 3, 4)

Task Do you remember a different (easier) way to build the vector a?

a <- 

Task What happens if we add 1 to a?

b <- a + 1
b
## [1] 2 3 4 5

R is object-oriented programming –> accordingly, it follows standard math rules.

c <- sqrt(a + b * 2)
c
## [1] 2.236068 2.828427 3.316625 3.741657

Important R functions for vectors:

Task Do you understand all of these functions?

# Important R functions for vectors:
# Basic functions:
sort(a)
length(a)
min(a)
max(a)
sum(a)
prod(a)

# Creating special vectors:
rep(1, 20)
seq(50)
5:15
seq(4, 20, 2)

3.1 Logical Operators and Logical Vectors

Task Do you understand all of these operators?

x <- 10
y <- 9 

x == y # x is equal to y
x > y  # x is bigger then y
x  y # HOW DO YOU TYPE: x is smaller or equal to y
x != y # x is NOT equal to y   

The contents of R vectors do not need to be numeric. A simple example of a different type are character vectors. For handling them, the contents simply need to be enclosed in quotation marks:

cities = c("Friedrichshafen", "Paris", "Tokio", "Tettnang", "Mailand")
cities
## [1] "Friedrichshafen" "Paris"           "Tokio"           "Tettnang"       
## [5] "Mailand"

Another useful type are logical vectors. Each element can only take one of two values: “TRUE” or “FALSE”. Internally, “FALSE” corresponds to 0 and “TRUE” to 1.

a <- TRUE; b <- FALSE # note the functionality of the semicolon

a | b  # Either a or b is TRUE
## [1] TRUE
a & b  # Both a and b are TRUE
## [1] FALSE
a <- c(7, 5, 2, 6, 9, 4, 1, 3)
b <- a < 3 | a >= 6
b
## [1]  TRUE FALSE  TRUE  TRUE  TRUE FALSE  TRUE FALSE

Summary: Vectors can have three different classes: “numeric”, “logical”, “character”:

x <- c(1050, 35, 2.3, 2 ^ 2, -10, sqrt(81)) # numeric vector
y <- c(TRUE, FALSE, TRUE, FALSE) # logical vector
z <- c("zero", "four", "nine") # character vector

One minor (seldomly important) addendum: Within numeric, there is a difference between 1 and 1.0. The former is a pure integer, the latter is an object called double.

# create a string of double-precision values
dbl_var <- c(1, 2.5, 4.5)  
typeof(dbl_var)
## [1] "double"
# placing an L after the values creates a string of integers
int_var <- c(1L, 6L, 10L)
typeof(int_var)
## [1] "integer"

3.2 Naming and Indexing Vectors

The elements of a vector can be named which can increase the readability of the output. Given a vector vec and a string vector namevec of the same length, the names are attached to the vector elements using names(vec) <- namevec.

If we want to access a single element or a subset form a vector, we can work with indices. They are written in square brackets next to the vector name. For example `myvector[4] returns the 4th element of myvector and myvector[6] <- 8 changes the 6th element to take the value of 8. If the vector elements have names, we can also use those as indices like in myvector["elementname"]

Task Complete the empty spaces!

# Create a vector "kicker.skills":
kicker.skills <- c(0.2, 4, 3, 10, 5,-2)

# Create a string vector of names:
players <- c("Katja", "Hans", "Theresa", "Sabine", "Katrin", "Marco")

# Assign names to vector and display vector:
names(kicker.skills) <- players
kicker.skills

# Indices by number:
kicker.skills[_] # GIVE second entry
kicker.skills[_:4] # SELECT 2nd to 4th entry

# Indices by name:
kicker.skills["Marco"] # WHAT is the level of Katja's kicker skills? 

# Logical indices:
# SUBSET for an all-star team, only kicker players with skill level above 2
kicker.skills[kicker.skills __ _ ] 

# Who is the worst kicker player? 
kicker.skills[_ == min(_)] # DO you understand every part of this line? 

4 Data Frames

A data frame is an object that collects several variables and can be thought of as a rectangular shape with the rows representing the observational units and the columns representing the variables. As such, it is similar to a matrix (see below). For us, the most important difference to a matrix is that a data frame can contain variables of different types (like numerical, logical, string and factor), whereas matrices can only contain values of one type.

Unlike a matrix, the columns always contain names which represent the variables. We can define a data frame from scratch by using the command data.frame.

# Define one x vector for the days:
days <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")

# When are Eugen, Julia, and Lena in the office?
Eugen <-
  c(0, 0, 1, 1, 0)
Julia <- c(1, 1, 0, 0, 1)
Lena <- c(0, 0, 0, 1, 1)

presence <- data.frame(days, Eugen, Julia, Lena)

If you check the Environment pane, you see that presence is described as 5 obs. of 4 variables. Again, every column is an individual variable.

To index certain entries in a data frame use [] behind the name of a data frame object. Specifically: data.frame.name[ROWnumber(s) , COLUMN-number/-name(s)]. All two dimensional objects in R follow the logic: Row , Column.

Task Complete the empty spaces!

# Show the entry in the 4th row and the second column of presence
presence[__ , __]

# Show all entries in the third row 
presence[__ , __]

# Show all entries of the third column
presence[__ , __]

# Show all entries of the variable Julia
presence[__ , __]

# Show all entries of the variables days, Julia and Lena
presence[__ , __________]

We can address a single variable var of a data frame df using the matrix-like syntax df[, "var"] or by stating df$var. This can be used for extracting the values of a variable but also for creating new variables. Sometimes, it is convenient not to have to type the name of the data frame several times within a command. The function with(df, some expression using vars of df can help.

Task Complete the empty spaces!

# Accessing a single variable:
presence$Eugen

# GENERATE a new variable total.presence which sums up the total of people who are present on each day:
presence$total.presence <- presence$Eugen + ______ + ______ 
presence

# The same but using "with":
presence$total.presence2 <- with(presence, Eugen+Julia+Lena)

Sometimes, we do not want to work with a whole data set but only with a subset. This can be easily achieved with the command `subset(df, criterion), where criterion is a logical expression which evaluetes to TRUE for the rows which are to be selected.

Task Complete the empty spaces!

# Subset: all days in which Lena is present
subset(presence, __==___) 

5 Other object types

The following three object types are of secondary importance for our tutorial but nevertheless you should become familiar with them over time.

5.1 Factors

Seemingly similar to vectors but behaving (sometimes) differently: factors. Factors are very useful to work with binary variables.

Task What is the difference between xf1 and xf2?

# Costumer Ratings
x <- c(3, 2, 2, 3, 1, 2, 3, 2, 1, 2)

xf1 <- factor(x, labels = c("bad", "okay", "good"), ordered = TRUE)
xf2 <- factor(x, labels = c("bad", "okay", "good"))

If they are ordered, factors can be seen as an ordinal variable (e.g. in the latter case, values are ranked). Under the hood, factors behave like integers.

Task Practice more

# variable gender with 20 "male" entries and 30 "female" entries
gender <- c(rep("male", _ ), rep(_, 30)_  # COMPLETE MISSING PARTS
gender <- _    # CREATE gender as factor
  

# R now treats gender as a nominal variable
summary(gender) 

5.2 Matrices

Matrices are important tools for data analysis. R has a powerful matrix algebra system. Most often, matrices will be generated from an existing data set. But you can also build the from scratch with matrix(vec, nrow=m) (takes the numbers stored in vector vec and puts them into a matrix with m rows). Other options include: rbind(r1,r2) and cbind(c1,c2) in binding several vectors (which obviously need to have the same length) by row or column.

Task Complete the empty spaces!

# GENERATE matrix A from one vector with two rows
v <- c("AG.5", "G2", "AG.5", "G1", "AG.5", "G3")
A <- matrix(v, _ = 2)

# GENERATE matrix A.row in binding vec1 and vec2 AND generate matrix A.col in binding column wise:
vec1 <- c("AG.5", "AG.5", "AG.5")
vec2 <- c("G2", "G1", "G3")

A.row <- _____(vec1, vec2) 
A.col <- ______

# Giving names to rows and columns:
colnames(A.col) <- c("Katrin", "Hans")
rownames(A.col) <- c("Monday", "Wednesday", "Friday")
A.col

Core rule for indexing all two dimensional objects in R: [row , column]

Task Complete the empty spaces!

# EXTRACT value of first column and 2nd row 
A.col[_ , _]

# GIVE all meal choices from Hans:
A.col[ , ]

# WHAT does Katrin have for lunch on Monday? call Katrin and Monday by name
A.col[_ , _]


# WHAT does Katrin have on Monday and Friday?
A.col[c(_ , _) , _]

Some quick matrix algebra:

# Element wise multiplication (not matrix multiplication but multiplying elements at same place)
A <- matrix(c(2, -4, -1, 5, 7, 0), nrow = 2)
B <- matrix(c(2, 1, 0, 3, -1, 5), nrow = 2)
A * B

# Matrix multiplication:
(D <- A %*% C)

# Transpose:
(C <- t(B))

# Inverse:
solve(D)

5.3 Lists

A list is a generic collection of objects. Unlike vectors, components can be of different types.

# Generate a list object:
mylist <- list(A = seq(8, 36, 4),
               product = "Fruechtekorb",
               idm = diag(3))

# Print whole list:
mylist
## $A
## [1]  8 12 16 20 24 28 32 36
## 
## $product
## [1] "Fruechtekorb"
## 
## $idm
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1
# Vector of names:
names(mylist)
## [1] "A"       "product" "idm"
# Print component "A":
mylist$A
## [1]  8 12 16 20 24 28 32 36