- R for Applied Economists
- What is R?
- Why learn R?
- About this course:
- R as a language
- Advice
- First Steps
- First Steps: Code
- First Steps: Exercises
- More Vectors
- More Vectors: Code
- More Vectors: Exercises
- Subsets and Functions
- Subsets and Functions: Code
- Subsets and Functions: Exercices
- Matrices
- Matrices: Code
- Matrices: Exercices

- Who? Mark Westcott mark.westcott@econ.lmu.de

- Free, open-source environment for statistical computing
- Relatively small core functionality
- Huge number of user-created packages
- Somewhat different to other programming languages (DSL)
- “Quirky, flawed, and an enormous success”

- Packages provide functions which aren’t available in Stata
- Write efficient code for data ‘wrangling’
- Popularity in industry (Microsoft, Google, Facebook, BoA, etc.)
- But don’t expect a Stata clone!

- Four sessions, hands on
- First two sessions quite general (and harder)
- Second section has a heavier focus on doing econometrics
- You can access these slides at http://git.io/vm53c
- Aim is to make you self-sufficient users of R.

- Lots of easy-to use statistical functions
- Tables easy to manipulate
- Vector-based
- Can write very compact code

- “Understand the value of frustration”
- Use help files, Google, stackoverflow.com
- You learn by doing; I’m here to make that happen more quickly
- Recommended resources:
- The Art of R programming (Matloff 2009)
- Advanced R (Wickham, online version at http://adv-r.had.co.nz)
- R Programming for Data Science (Peng, online at https://leanpub.com/rprogramming)
- Using R for Introductory Econometrics (http://www.urfie.net/)

- using the interactive console
- assigning variables
- calling functions
- loading help
- strings
- using the
`c()`

,`seq()`

,`rep()`

and`paste()`

functions

Type commands into RStudio’s console. Anything that starts with `##`

is the output that R will return.

In the slides, anything on a gray background are commands to be entered into R, anything on a blue background is R’s output.

`1`

`## [1] 1`

`8`

`## [1] 8`

`1+8`

`## [1] 9`

Assign variables using `<-`

(`=`

works too, but `<-`

is prefered)

```
x <- 1
y <- 8
x + y
```

`## [1] 9`

```
z <- x + y
z
```

`## [1] 9`

You can load the help for any function by using the `?`

command. Let’s find the square root of z:

```
?sqrt
sqrt(x = 9)
```

`## [1] 3`

`sqrt(z)`

`## [1] 3`

`sqrt(sin(2) + cos(4))`

`## [1] 0.5056222`

Combine elements into a vector (R’s one dimensional data structure) using the `c()`

function

```
?c
c(4,9)
```

`## [1] 4 9`

Vectors are always flat. These are equivalent

```
a <- c(1,2,3,4)
a <- c(c(1,2),c(3,4))
```

Some other ways of creating vectors, the range operator (colon) and `seq()`

function:

```
a <- 1:6
a
```

`## [1] 1 2 3 4 5 6`

```
?seq
b <- seq(from = 1, to = 16, by = 3)
b
```

`## [1] 1 4 7 10 13 16`

Base operations work on vectors in an intuitive way:

`a * 2`

`## [1] 2 4 6 8 10 12`

`a + b`

`## [1] 2 6 10 14 18 22`

`a * 2 + ( b - (4*a))`

`## [1] -1 0 1 2 3 4`

`sqrt(c(4,9))`

`## [1] 2 3`

`sum(1:20)`

`## [1] 210`

Use the `rep`

function to replicate elements of vectors:

```
?rep
rep(1, times = 10)
```

`## [1] 1 1 1 1 1 1 1 1 1 1`

`rep(4:7, each = 10)`

```
## [1] 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7
## [36] 7 7 7 7 7
```

`rep(4:7, times = 10)`

```
## [1] 4 5 6 7 4 5 6 7 4 5 6 7 4 5 6 7 4 5 6 7 4 5 6 7 4 5 6 7 4 5 6 7 4 5 6
## [36] 7 4 5 6 7
```

Create string variables using single or double quotes.

```
h <- "hello" #you can use single or double quotes
w <- "world"
h + w
```

`## Error in h + w: non-numeric argument to binary operator`

```
?paste
paste(h,w)
```

`## [1] "hello world"`

`paste(h,w,sep = ",")`

`## [1] "hello,world"`

`as.character(3)`

`## [1] "3"`

Create the vector: (1,2,3,…,19,20,19,18,17…,2,1) (hint: join two vectors using the

`c()`

function). What is its sum? (use the`sum()`

function)Use the

`rep()`

function to create the vector: (4,6,3,4,6,3,4,6,3,4,6,3,…. 4,6,3) (i.e. 4,6,3 repeated 10 times)Use the

`rep()`

function with the`times`

parameter to create the vector (7,7,7,6,6,6,6,6,1,1) (three 7s, four 6s, two 1s)Calculate \[\sum_{i=10}^{100} (i^3 + 4i^2)\]

Optional: explore the

`ceiling()`

,`floor()`

,`round()`

,`log()`

,`max()`

,`min()`

,`prod()`

,`mean()`

,`median()`

,`cor()`

,`sd()`

, and`var()`

functions

- indexing vector elements
- NAs
- logical types
- coercion
- recycling

Clear up the environment:

`ls()`

`## [1] "a" "b" "h" "w" "x" "y" "z"`

```
?rm
rm(a)
rm(list = ls())
```

Everything is a vector! Compare this:

```
y <- -3:3
is.vector(y)
```

`## [1] TRUE`

`length(y)`

`## [1] 7`

with this:

```
x <- 1
is.vector(x)
```

`## [1] TRUE`

`length(x)`

`## [1] 1`

Different ways of indexing vectors:

`y[]`

`## [1] -3 -2 -1 0 1 2 3`

`y[3]`

`## [1] -1`

`y[c(1,3)]`

`## [1] -3 -1`

`y[2:4]`

`## [1] -2 -1 0`

`y[-3]`

`## [1] -3 -2 0 1 2 3`

Assinging vector elements:

```
y[5] <- 0
y
```

`## [1] -3 -2 -1 0 0 2 3`

`y[c(1,3)] <- 10`

Vectors can contain empty elements, NAs

`x`

`## [1] 1`

`x[1]`

`## [1] 1`

```
x[4] <- 3
x
```

`## [1] 1 NA NA 3`

`is.na(x[1])`

`## [1] FALSE`

`is.na(x[2])`

`## [1] TRUE`

Vectors can only hold one type. Variables are coerced to most flexible type. Types from least to most flexible are:

- logical (TRUE/FALSE)
- integer (1, 2, 3, 4..)
- double (real numbers)
- character.

`x`

`## [1] 1 NA NA 3`

```
x[1] <- "Test"
x
```

`## [1] "Test" NA NA "3"`

```
y <- c(TRUE, FALSE, FALSE, TRUE)
y
```

`## [1] TRUE FALSE FALSE TRUE`

```
y[1] <- 3
y
```

`## [1] 3 0 0 1`

You can test if a vector is of a certain type using the `is`

functions

`is.logical(TRUE)`

`## [1] TRUE`

`is.double(3.5)`

`## [1] TRUE`

`is.numeric(5) #is.numeric(x) TRUE if x is integer or a double`

`## [1] TRUE`

`is.character("Test")`

`## [1] TRUE`

You can control cooercion with the `as`

functions:

`as.numeric(T)`

`## [1] 1`

`as.numeric(F)`

`## [1] 0`

`as.double("5.3")`

`## [1] 5.3`

`as.double("five")`

`## Warning: NAs introduced by coercion`

`## [1] NA`

`as.numeric(5)`

`## [1] 5`

`as.character(3.5)`

`## [1] "3.5"`

Vector operations do not require all vectors to be the same length. If they are not, the shorter vector will be *recycled*:

`1:4 + 1`

`## [1] 2 3 4 5`

`1:4 + c(1,2)`

`## [1] 2 4 4 6`

`1:4 + c(1,2,3)`

```
## Warning in 1:4 + c(1, 2, 3): longer object length is not a multiple of
## shorter object length
```

`## [1] 2 4 6 5`

Sometimes this recycling behaviour makes things a lot easier, sometimes it leads to unexpected results, so you should try and keep it in the mind.

Create the vector p with

`p <- c(1.8, 3.2, -3.3, 5.4, 8.8)`

.- Extract the first three elements of
`p`

- Extract all but the third element of
`p`

- What does
`order(p)`

do? How can you use it to create a sorted vector?

- Extract the first three elements of
Without using the computer, what is the result of

`c(10, 20, 30) + 1:9`

?- Without using the computer, what is the output of
`sum(1, FALSE)`

?`sum("1", 4)`

?

Use the

`paste()`

function to create the vector

\[ \begin{pmatrix} "Stage 4" \\ "Stage 5" \\ .. \\ "Stage 9" \\ "Stage 10" \end{pmatrix} \]

Use the

`sample()`

function to sample 200 random integers from the range 1 to 100. You’ll need to pass`"replace = TRUE"`

to the sample function. How many of the drawn numbers are bigger than 50? (hint - you can test if elements of a vector`x`

are bigger than 50 with`x > 50`

. What happens when logical types are coerced to numeric?).Repeat the above 5000 times, storing the result of each resampling in a vector, by using the

`replicate`

command,`replicate(n, expression_to_replicate)`

. Store the results in a vector named`mc`

.What is the mean of

`mc`

? Variance? (R has built in functions for these, search online if you can’t find them). Calculate the standard error of the mean, i.e. \[sqrt(Var(x) / n)\]Install the ‘ggplot2’ package by running

`install.packages("ggplot2")`

in the console. Load it using`library(ggplot2)`

. Then use the`qplot()`

function to plot a histogram of the monte carlo simulation:`qplot(mc)`

. Set the binwidth to 1 when drawing the histogram using`binwidth=1`

and set titles using the`xlab`

,`ylab`

and`main`

arguments.

- Vector subsetting
- Writing a function
- Controlling flow in a function

Logical tests of a vector element

```
y <- c(1:5*2, rev(1:4*2))
y
```

`## [1] 2 4 6 8 10 8 6 4 2`

`y[1] == 2`

`## [1] TRUE`

`y[1] == 4`

`## [1] FALSE`

`y[3] > 5`

`## [1] TRUE`

Testing all elements:

`y > 4`

`## [1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE`

`y == 8`

`## [1] FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE`

`which(y > 4)`

`## [1] 3 4 5 6 7`

`any(y > 4)`

`## [1] TRUE`

`any(y > 10)`

`## [1] FALSE`

Combine conditions with `|`

(or) and `&`

(and)

`any(y == 2 | y == 3)`

`## [1] TRUE`

`any(y > 2 & y < 4)`

`## [1] FALSE`

Select elements which match a condition:

`y[y > 4]`

`## [1] 6 8 10 8 6`

`y[y > 2 & y < 6]`

`## [1] 4 4`

A function is a piece of code that performs a specific task. They take ‘arguments’ and normally ‘return’ a value. Here’s a function that dermines if a number is divisble by 7, using the modulu operator `%%`

:

```
isDivisibleBySeven <- function(number) {
(number %% 7) == 0
}
```

`isDivisibleBySeven(21)`

`## [1] TRUE`

`isDivisibleBySeven(20)`

`## [1] FALSE`

The return value of the function is the last expression evaluated. You can be explicit about this by adding ‘return()’ around the last line, but this is usually not necessary.

Here’s a function to determine if a number is a square number, again using the modulo operator `%%`

:

```
isSquareNumber <- function(number) {
(sqrt(number) %% 1) == 0
}
```

`isSquareNumber(4)`

`## [1] TRUE`

`isSquareNumber(5)`

`## [1] FALSE`

Note that the function only operates on its arguments, and so has no *side effects*. This is good style and avoid bugs, compare to a function without arguments:

```
isSquareNumber <- function() {
(sqrt(number) %% 1) == 0
}
number <- 3
isSquareNumber()
```

`## [1] FALSE`

Our function doesn’t deal with negative values very well:

`isSquareNumber(-4)`

`## Error in isSquareNumber(-4): unused argument (-4)`

Let’s improve things using the if/else control stucture:

```
isSquareNumber <- function(x) {
if(x > 0) {
(sqrt(x) %% 1) == 0
} else {
NA
}
}
```

You can access a vector containing the letters a-z using the `letters`

command. Find two ways of listing every third letter in the alphabet.

What does this function do? How does it work?

```
myFunction <- function(n) {
sum((1:n)[isDivisibleBySeven(1:n)])
}
```

Here is a version of the `isSquareNumber`

function that works nicely with vectors:

```
isSquareNumber <- function(x) {
x <- ifelse(x > 0, x, NA)
(sqrt(x) %% 1) == 0
}
```

- Make sure you understand how this works – look at the help documentation for
`ifelse`

. - Use the function to find out if there any square numbers in the range 18354 to 18796. If so, what are they? (use the
`any()`

and`which()`

functions)

Here is an (inefficient) algorithm for determining if the number x is prime:

is x = 2? If so, x is prime. End.

is x divisible by any number in the range 2 to x-1? If so, x is not prime. End.

otherwise, x is prime

Write a function to determine if a number is prime. You will need to use the if/else if/else control structure:

```
if(condition) {
#do something
} else if(condition) {
#do something else
} else {
#do another thing
}
```

The only hard part is working out if ‘x is divisible by any number in the range 2 to x-1’? Figure out how to do this first, then add it into your function.

- You can use the modulo function
`%%`

again to test divisibility - Think about recycling! Like all R base operations,
`%%`

will operate on a vector

- creating matrices
- setting/accessing matrix elements
- inverting matrices, testing for matrix equality

First prepare some vectors:

```
a <- 1:6
b <- 2:7
c <- 3:8
```

You can create a matrix using `rbind`

(row bind) or `cbind`

(column bind)

```
m <- rbind(a,b,c)
m
```

```
## [,1] [,2] [,3] [,4] [,5] [,6]
## a 1 2 3 4 5 6
## b 2 3 4 5 6 7
## c 3 4 5 6 7 8
```

```
l <- cbind(a,b,c)
l
```

```
## a b c
## [1,] 1 2 3
## [2,] 2 3 4
## [3,] 3 4 5
## [4,] 4 5 6
## [5,] 5 6 7
## [6,] 6 7 8
```

or by passing a vector to the `matrix`

function, specifying the desired number or rows and columns:

`matrix(1:24,nrow=6)`

```
## [,1] [,2] [,3] [,4]
## [1,] 1 7 13 19
## [2,] 2 8 14 20
## [3,] 3 9 15 21
## [4,] 4 10 16 22
## [5,] 5 11 17 23
## [6,] 6 12 18 24
```

`matrix(1:24,ncol=6)`

```
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 5 9 13 17 21
## [2,] 2 6 10 14 18 22
## [3,] 3 7 11 15 19 23
## [4,] 4 8 12 16 20 24
```

Note that matrix is filled by columns. To have it filled by rows:

`matrix(1:24,nrow=6, byrow=T)`

```
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
## [2,] 5 6 7 8
## [3,] 9 10 11 12
## [4,] 13 14 15 16
## [5,] 17 18 19 20
## [6,] 21 22 23 24
```

`matrix(1:24,ncol=6, byrow=T)`

```
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 2 3 4 5 6
## [2,] 7 8 9 10 11 12
## [3,] 13 14 15 16 17 18
## [4,] 19 20 21 22 23 24
```

Find the dimensions of a matrix

`nrow(m)`

`## [1] 3`

`ncol(m)`

`## [1] 6`

Transpose a matrix:

`mt <- t(m)`

Test for matrix equality using the `identical`

function

```
?identical
identical(l,mt)
```

`## [1] TRUE`

Matrices have row and column names:

`colnames(mt)`

`## [1] "a" "b" "c"`

`rownames(mt)`

`## NULL`

```
rownames(mt) <- LETTERS[1:nrow(mt)]
mt
```

```
## a b c
## A 1 2 3
## B 2 3 4
## C 3 4 5
## D 4 5 6
## E 5 6 7
## F 6 7 8
```

Access elements or rows/colums:

`mt[2,2]`

`## [1] 3`

`mt[2,]`

```
## a b c
## 2 3 4
```

`mt[,3]`

```
## A B C D E F
## 3 4 5 6 7 8
```

or by names:

`mt["A",]`

```
## a b c
## 1 2 3
```

As you should be able to guess by now, you can access multiple elements as follows:

`mt[c("A","B") , ]`

```
## a b c
## A 1 2 3
## B 2 3 4
```

You can add additional rows/columns to a matrix using rbind/cbind

`m`

```
## [,1] [,2] [,3] [,4] [,5] [,6]
## a 1 2 3 4 5 6
## b 2 3 4 5 6 7
## c 3 4 5 6 7 8
```

```
d <- 4:9
rbind(d,m)
```

```
## [,1] [,2] [,3] [,4] [,5] [,6]
## d 4 5 6 7 8 9
## a 1 2 3 4 5 6
## b 2 3 4 5 6 7
## c 3 4 5 6 7 8
```

Write a function that computes the hilbert matrix of a given size:

```
hilbert <- function(n) {
#your code here
}
```

The hilbert matrix is a square matrix with each entry equal to:

\[H_{ij} = \frac{1}{i + j - 1}\]

where \(i\) indicates row and \(j\) column. Hint: begin by figuring out how to make a matrix \(i\) where each entry in the matrix is equal to the row index, e.g.:

\[ \begin{pmatrix} 1 & 1 & 1 & 1 & 1 \\ 2 & 2 & 2 & 2 & 2 \\ 3 & 3 & 3 & 3 & 3 \\ 4 & 4 & 4 & 4 & 4 \\ 5 & 5 & 5 & 5 & 5 \end{pmatrix} \]

Once you have completed the `hilbert()`

function, find the inverse of the 6x6 hilbert matrix using the `solve()`

function.

In this exercise you will program the OLS estimator. R of course has built in methods for running OLS regressions, so this is just an exercise. We use data from Grilichies (1976).

The following command will download the dataset from my website and load it directly into a matrix called `d`

:

```
d <- as.matrix(
read.table(
"http://goo.gl/dxjUME",header=T))
```

The matrix includes four columns:

`head(d)`

```
## motherEduc IQ KWW educ expr wage
## [1,] 8 93 35 12 0.462 365.0375
## [2,] 14 119 41 16 0.000 229.9818
## [3,] 14 108 46 14 0.423 301.8711
## [4,] 12 96 32 12 0.333 240.0867
## [5,] 6 74 27 9 9.013 375.0277
## [6,] 8 91 24 9 0.333 121.9974
```

We want to estimate the following equation: \[ln(wage_i) = \beta_0 + \beta_1 educ_i + \beta_2 IQ_i + \beta_3 expr_i+ \epsilon_i\]

Compute regression coefficients using the OLS formula: \(\hat{\beta} = (X'X)^{-1} X'y\). You will need to use

`%*%`

to multiply matricies (i.e. you multiply matrices`a`

and`b`

with the command`a %*% b`

) and the`solve`

function to invert a matrix.Compute residuals and use these to estimate the variance of the error term

\[\hat{\sigma}^2 = \frac{SSR}{n - k}\]

and the estimated variance covariance matrix:

\[ \hat{\sigma}^2 (X'X)^{-1}\]

Compute standard errors (you might find the

`diag()`

function helpful), t-statistics (\(\hat{\beta} / \hat{se}\)) and p-values (you can use the`pt`

function)(Optional) Program a two stage least squares estimator. Instrument \(educ_i\) by \(motherEduc_i\) and \(IQ_i\):

\[educ_i = \alpha_0 + \alpha_1 IQ_i + \alpha_2 motherEduc_i + v_i\]

If you have Stata installed on your laptop, verify your results.