+ - 0:00:00
Notes for current slide
Notes for next slide

Functions & Debugging

1 / 64

Functions

Data (and objects more generally) are one of the building blocks of R. The other is functions.

2 / 64

Functions

Data (and objects more generally) are one of the building blocks of R. The other is functions.

We've already used a handful of functions, including seq(), arithmetic functions (+, *, etc.), c(), list(), data.frame(), str(), etc.

3 / 64

Functions

Data (and objects more generally) are one of the building blocks of R. The other is functions.

We've already used a handful of functions, including seq(), arithmetic functions (+, *, etc.), c(), list(), data.frame(), str(), etc.


Functions take some form of an input, perform some operation, and then return some object(s) as output.

4 / 64

Functions

Data (and objects more generally) are one of the building blocks of R. The other is functions.

We've already used a handful of functions, including seq(), arithmetic functions (+, *, etc.), c(), list(), data.frame(), str(), etc.


Functions take some form of an input, perform some operation, and then return some object(s) as output.

Functions are made up of arguments.

5 / 64

Functions

Let's take another look at the help documentation for seq()...

?seq
6 / 64

Functions

Let's take another look at the help documentation for seq()...

?seq

You can see it has the arguments from, to, by, length.out, and along.with.

7 / 64

Functions

Let's take another look at the help documentation for seq()...

?seq

You can see it has the arguments from, to, by, length.out, and along.with.

You might also notice that each of the arguments have a value after the = in the documentation.

8 / 64

Functions

Let's take another look at the help documentation for seq()...

?seq

You can see it has the arguments from, to, by, length.out, and along.with.

You might also notice that each of the arguments have a value after the = in the documentation.

These values are the defaults; they are what the arguments will be set to if you don't specify them.

9 / 64

Functions

Let's take another look at the help documentation for seq()...

?seq

You can see it has the arguments from, to, by, length.out, and along.with.

You might also notice that each of the arguments have a value after the = in the documentation.

These values are the defaults; they are what the arguments will be set to if you don't specify them.


In fact, since all of the arguments have defaults, we don't have to specify any to run seq() as we saw earlier.

seq()
## [1] 1
10 / 64

Functions

Let's take a look at a new function, mean()...

?mean
11 / 64

Functions

Image from Kieran Healy

12 / 64

Functions

What happens if we run mean() without any arguments?

mean()
## Error in mean.default(): argument "x" is missing, with no default
13 / 64

Functions

What happens if we run mean() without any arguments?

mean()
## Error in mean.default(): argument "x" is missing, with no default

We get an error telling us that the argument "x" is missing and has no default.

14 / 64

Functions

What happens if we run mean() without any arguments?

mean()
## Error in mean.default(): argument "x" is missing, with no default

We get an error telling us that the argument "x" is missing and has no default.

Whenever you see this error, it means you are missing a required argument (i.e., an argument without a default).

15 / 64

Functions

What happens if we run mean() without any arguments?

mean()
## Error in mean.default(): argument "x" is missing, with no default

We get an error telling us that the argument "x" is missing and has no default.

Whenever you see this error, it means you are missing a required argument (i.e., an argument without a default).

If we look at the help documentation, you can see x is the data from which to calculate a mean.

16 / 64

Functions

Let's create some data to calculate the mean of.

vec <- c(1, 2, 3, 4, 5, 6, 2, 4)
17 / 64

Functions

Let's create some data to calculate the mean of.

vec <- c(1, 2, 3, 4, 5, 6, 2, 4)

Now let's take the mean of vec.

mean(x = vec)
## [1] 3.375
18 / 64

Functions

Let's create some data to calculate the mean of.

vec <- c(1, 2, 3, 4, 5, 6, 2, 4)

Now let's take the mean of vec.

mean(x = vec)
## [1] 3.375

Note that mean() has two more optional arguments listed:

  • trim, which returns a trimmed mean

  • na.rm, which takes a logical value indicating if it should remove missing values or not before it calculates the mean (FALSE by default).

19 / 64

Functions

What happens if we don't remove NAs before calculating the mean? Let's check it out...

20 / 64

Functions

What happens if we don't remove NAs before calculating the mean? Let's check it out...

vec_na <- c(1, 2, 3, 4, 5, 6, NA, 2, 4)
21 / 64

Functions

What happens if we don't remove NAs before calculating the mean? Let's check it out...

vec_na <- c(1, 2, 3, 4, 5, 6, NA, 2, 4)
mean(vec_na)
## [1] NA
22 / 64

Functions

What happens if we don't remove NAs before calculating the mean? Let's check it out...

vec_na <- c(1, 2, 3, 4, 5, 6, NA, 2, 4)
mean(vec_na)
## [1] NA

It returns NA. NAs are contagious! A single NA in a vector will cause many functions to return NA (unless they remove them by default).

23 / 64

Functions

What happens if we don't remove NAs before calculating the mean? Let's check it out...

vec_na <- c(1, 2, 3, 4, 5, 6, NA, 2, 4)
mean(vec_na)
## [1] NA

It returns NA. NAs are contagious! A single NA in a vector will cause many functions to return NA (unless they remove them by default).

This sort of makes sense - the mean of vec_na in its entirety is unknown, since we don't know what the NA value is. That's why you have to remove NA's before running calculations by setting na.rm = TRUE

24 / 64

Functions

What happens if we don't remove NAs before calculating the mean? Let's check it out...

vec_na <- c(1, 2, 3, 4, 5, 6, NA, 2, 4)
mean(vec_na)
## [1] NA

It returns NA. NAs are contagious! A single NA in a vector will cause many functions to return NA (unless they remove them by default).

This sort of makes sense - the mean of vec_na in its entirety is unknown, since we don't know what the NA value is. That's why you have to remove NA's before running calculations by setting na.rm = TRUE

mean(vec_na, na.rm = TRUE)
## [1] 3.375
25 / 64

Your turn 1

02:00
  1. Look up the help documentation for the function sd() (type directly in the RStudio console)

  2. Calculate the standard deviation of vec_na. Be sure to remove missing values first.

vec_na <- c(1, 2, 3, 4, 5, 6, NA, 2, 4)
26 / 64

Solution

?sd
sd(vec_na, na.rm = TRUE)
## [1] 1.685018
27 / 64

Functions

You can get the length of many objects with length()

length(vec_na)
## [1] 9
28 / 64

Functions

You can get the length of many objects with length()

length(vec_na)
## [1] 9

nrow() and ncol() can be used to get the number of rows or columns in a matrix or data frame. Let's look at the data frame df below

## a b c d
## 1 1 3 5 7
## 2 2 4 6 8
nrow(df)
## [1] 2
ncol(df)
## [1] 4

The length of a data frame is the same as the number of columns.

length(df)
## [1] 4
29 / 64

Functions

Take another look at the help documentation for sd() 👀.

Notice that there are two arguments and they are in order, x followed by na.rm = FALSE.

30 / 64

Functions

Take another look at the help documentation for sd() 👀.

Notice that there are two arguments and they are in order, x followed by na.rm = FALSE.


You can set arguments explicitly by name

sd(x = vec_na, na.rm = TRUE)
## [1] 1.685018
31 / 64

Functions

Take another look at the help documentation for sd() 👀.

Notice that there are two arguments and they are in order, x followed by na.rm = FALSE.


You can set arguments explicitly by name

sd(x = vec_na, na.rm = TRUE)
## [1] 1.685018

You can also set them positionally and drop the argument names

sd(vec_na, TRUE)
## [1] 1.685018
32 / 64

Functions

When using arguments positionally (without their names), make sure the arguments are in the right order.

33 / 64

Functions

When using arguments positionally (without their names), make sure the arguments are in the right order.

Otherwise you can end up with weird errors or warnings.

sd(TRUE, vec_na)
## Warning in if (na.rm) "na.or.complete" else "everything": the condition has
## length > 1 and only the first element will be used
## [1] NA
34 / 64

Functions

When using arguments positionally (without their names), make sure the arguments are in the right order.

Otherwise you can end up with weird errors or warnings.

sd(TRUE, vec_na)
## Warning in if (na.rm) "na.or.complete" else "everything": the condition has
## length > 1 and only the first element will be used
## [1] NA

However, if you explicitly name the arguments, you can actually put them in a different order. This isn't recommended unless there is a good reason though...

sd(na.rm = TRUE, x = vec_na)
## [1] 1.685018
35 / 64

Packages

So far, we've been working with functions that are already installed and loaded when we open R.

36 / 64

Packages

So far, we've been working with functions that are already installed and loaded when we open R.

However, many of the functions we want to use are not part of the basic R install. They come in packages that other R users create and share.

37 / 64

Packages

So far, we've been working with functions that are already installed and loaded when we open R.

However, many of the functions we want to use are not part of the basic R install. They come in packages that other R users create and share.

Most packages can be accessed from CRAN - the Comprehensive R Archive Network.

38 / 64

Packages

The most common way to get a package is to download it from CRAN using install.packages("package_name") -- notice the quotes.

39 / 64

Packages

The most common way to get a package is to download it from CRAN using install.packages("package_name") -- notice the quotes.


For example, one package we're going to use tomorrow is rio, which has really easy functions for importing and exporting data.

If we wanted to install the rio package, we would use

install.packages("rio")
40 / 64

Packages

The most common way to get a package is to download it from CRAN using install.packages("package_name") -- notice the quotes.


For example, one package we're going to use tomorrow is rio, which has really easy functions for importing and exporting data.

If we wanted to install the rio package, we would use

install.packages("rio")

A couple notes here.

1) You will sometimes see package names written inside {}, e.g. {rio}.

41 / 64

Packages

The most common way to get a package is to download it from CRAN using install.packages("package_name") -- notice the quotes.


For example, one package we're going to use tomorrow is rio, which has really easy functions for importing and exporting data.

If we wanted to install the rio package, we would use

install.packages("rio")

A couple notes here.

1) You will sometimes see package names written inside {}, e.g. {rio}.

2) To make things easier in our online format, I have pre-installed all the packages we will be needing on RStudio Cloud.

42 / 64

Packages

The most common way to get a package is to download it from CRAN using install.packages("package_name") -- notice the quotes.


For example, one package we're going to use tomorrow is rio, which has really easy functions for importing and exporting data.

If we wanted to install the rio package, we would use

install.packages("rio")

A couple notes here.

1) You will sometimes see package names written inside {}, e.g. {rio}.

2) To make things easier in our online format, I have pre-installed all the packages we will be needing on RStudio Cloud.

However, in order to access the functions from these packages, we still need to load them...

43 / 64

Packages

Installing a package puts a copy of it into our personal library that R has access to. In general, we only need to install a package once.

44 / 64

Packages

Installing a package puts a copy of it into our personal library that R has access to. In general, we only need to install a package once.


However, whenever we want to to use a package, we need to load the package in our working session in RStudio.

We load packages with the library() function -- we do this once per session.

45 / 64

Packages

Installing a package puts a copy of it into our personal library that R has access to. In general, we only need to install a package once.


However, whenever we want to to use a package, we need to load the package in our working session in RStudio.

We load packages with the library() function -- we do this once per session.


Loading a package basically makes the contents of that package searchable by R.

In other words, after loading a package, R is able to find the functions included in that package.

You can see what functions are available in your workspace by running the search() function

46 / 64

Your turn 2

03:00
  1. In your RStudio console, look up the help documentation forimport() by typing ?import. What do you see?

  2. Run search() in the console. Is the rio package included in this list?

  3. Again in the console, load the rio package using the library() function.

  4. Now look again at the help documentation for import(). What do you see this time?

  5. Run search() again. What is different this time?

47 / 64

Solution

48 / 64

Packages

Another package we're going to use a lot going forward is tidyverse.

tidyverse is actually a "meta-package", meaning it contains many individual packages inside of it that are all bundled together.

49 / 64

Packages

Another package we're going to use a lot going forward is tidyverse.

tidyverse is actually a "meta-package", meaning it contains many individual packages inside of it that are all bundled together.


When we load tidyverse we get quite a bit of info.


50 / 64

Packages

Conflicts occur when the same name is used for different things.

51 / 64

Packages

Conflicts occur when the same name is used for different things.

For example, the dplyr package and the stats package (preloaded) both have a function called filter().

52 / 64

Packages

Conflicts occur when the same name is used for different things.

For example, the dplyr package and the stats package (preloaded) both have a function called filter().

When we call filter(), R will only call one of those functions and it might not be the one we want.

53 / 64

Packages

Conflicts occur when the same name is used for different things.

For example, the dplyr package and the stats package (preloaded) both have a function called filter().

When we call filter(), R will only call one of those functions and it might not be the one we want.


Which one will R choose? R has an order in which it searches...

54 / 64

Packages

Conflicts occur when the same name is used for different things.

For example, the dplyr package and the stats package (preloaded) both have a function called filter().

When we call filter(), R will only call one of those functions and it might not be the one we want.


Which one will R choose? R has an order in which it searches...

It starts with the Global Environment, then searches packages in the order that they were loaded, searching more recently loaded packages first.

55 / 64

Packages

Conflicts occur when the same name is used for different things.

For example, the dplyr package and the stats package (preloaded) both have a function called filter().

When we call filter(), R will only call one of those functions and it might not be the one we want.


Which one will R choose? R has an order in which it searches...

It starts with the Global Environment, then searches packages in the order that they were loaded, searching more recently loaded packages first.


You can tell R explicitly that you want a function from a particular package using the notation package::function_name. When in doubt, it's better to use the double colon operator to be specific about which function you want.

56 / 64

Your turn 3

01:00
  1. Look up for the help documentation for filter() from the stats package.

  2. Now look up the help documentation for filter() from the dplyr package.

57 / 64

Solution

?stats::filter
?dplyr::filter
58 / 64

Debugging

Before we wrap up, let's talk about error messages.

59 / 64

Debugging

Before we wrap up, let's talk about error messages.

You will run into them constantly, even when using functions you've used many times before -- and especially when using functions/packages that are new to you.

Artwork by @allison_horst

60 / 64

Debugging

We're not going to go into details of debugging, because that could (and should) be a whole course on its own.

But there are a few general things to be aware of...

61 / 64

Debugging

We're not going to go into details of debugging, because that could (and should) be a whole course on its own.

But there are a few general things to be aware of...


Artwork by @allison_horst

62 / 64

Q & A

05:00
63 / 64

Next up...

Introduction to the Tidyverse

64 / 64

Functions

Data (and objects more generally) are one of the building blocks of R. The other is functions.

2 / 64
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow