R has different types of data, and an object’s type affects how it interacts with functions and other objects.
R has different types of data, and an object’s type affects how it interacts with functions and other objects.
So far, we’ve just been working with numeric data, but there are several other types to be aware of...
R has different types of data, and an object’s type affects how it interacts with functions and other objects.
So far, we’ve just been working with numeric data, but there are several other types to be aware of...
Type | Definition | Example |
---|---|---|
Integer | whole numbers from -Inf to +Inf | 1L , -2L |
Double | numbers, fractions & decimals | -7 , 0.2 , -5/2 |
Character | quoted strings of letters, numbers, and allowed symbols | "1" , "one" , "o-n-e" , "o.n.e" |
Logical | logical constants of true or false | TRUE , FALSE |
Factor | ordered, labelled variable | variable for year in college labelled "Freshman" , "Sophomore" , etc. |
You can use typeof()
to find out the type of a value or object
You can use typeof()
to find out the type of a value or object
typeof(1)
## [1] "double"
typeof(TRUE)
## [1] "logical"
typeof(1L)
## [1] "integer"
typeof("one")
## [1] "character"
There are a few special values worth knowing about too
Value | Definition |
---|---|
NA |
Missing value ("not available") |
NaN |
Not a Number (e.g. 0/0) |
Inf |
Positive infinity |
-Inf |
Negative infinity |
NULL |
An object that exists but is completely empty |
Often, we’re not working with individual values, but with a group of multiple related values -- or a vector of values.
Often, we’re not working with individual values, but with a group of multiple related values -- or a vector of values.
We can create a vector of ordered numbers using the form
starting_number
: ending_number
.
Often, we’re not working with individual values, but with a group of multiple related values -- or a vector of values.
We can create a vector of ordered numbers using the form
starting_number
: ending_number
.
For example, we could make x
a vector with the numbers between 1 and 5.
x <- 1:5x
## [1] 1 2 3 4 5
Often, we’re not working with individual values, but with a group of multiple related values -- or a vector of values.
We can create a vector of ordered numbers using the form
starting_number
: ending_number
.
For example, we could make x
a vector with the numbers between 1 and 5.
x <- 1:5x
## [1] 1 2 3 4 5
Let's look at the Environment pane in RStudio...
Since x
is a vector, it tells us what type of vector it is and its length in addition to its contents (which can be abbreviated if the object is larger).
We can create a vector of any numbers we want using c()
, which is a function. You can think of c()
as short for "combine".
We can create a vector of any numbers we want using c()
, which is a function. You can think of c()
as short for "combine".
You use c()
by putting numbers separated by a comma within the parentheses.
# combine values into a vector and assign to an object names 'x'x <- c(2, 8.5, 1, 9)# print xx
## [1] 2.0 8.5 1.0 9.0
We can create a vector of any numbers we want using c()
, which is a function. You can think of c()
as short for "combine".
You use c()
by putting numbers separated by a comma within the parentheses.
# combine values into a vector and assign to an object names 'x'x <- c(2, 8.5, 1, 9)# print xx
## [1] 2.0 8.5 1.0 9.0
We can also create a vector of numbers using seq()
.
seq()
is a function that creates a sequence of numbers.
To learn how any R function works, you can access the help documentation by typing ?function_name
.
To learn how any R function works, you can access the help documentation by typing ?function_name
.
Let's take a look at how seq()
works...
?seq
What happens if we run seq()
with no arguments?
seq()
## [1] 1
What happens if we run seq()
with no arguments?
seq()
## [1] 1
The seq()
function has arguments with default values.
The first two arguments are from
and to
, which specify the starting and end values of the sequence. By default from = 1
and to = 1
.
This means that typing seq()
is equivalent to typing seq(from = 1, to = 1)
, which generates a sequence with just one value: 1
.
We will talk more about how functions work in the next slide deck.
To make a sequence from 1 to 5 with this function, we have to set the arguments accordingly: from = 1
and to = 5
seq(from = 1, to = 5)
## [1] 1 2 3 4 5
To make a sequence from 1 to 5 with this function, we have to set the arguments accordingly: from = 1
and to = 5
seq(from = 1, to = 5)
## [1] 1 2 3 4 5
We can also set one or more of the other arguments...
To make a sequence from 1 to 5 with this function, we have to set the arguments accordingly: from = 1
and to = 5
seq(from = 1, to = 5)
## [1] 1 2 3 4 5
We can also set one or more of the other arguments...
The by
argument allows us to change the increment of the sequence. For example, to get every other number between 1 and 5, we would set by = 2
seq(from = 1, to = 5, by = 2)
## [1] 1 3 5
Vectors are just 1-dimensional sequences of a single type of data.
Vectors are just 1-dimensional sequences of a single type of data.
Note that vectors can also include strings or character values.
letters <- c("a", "b", "c", "d")letters
## [1] "a" "b" "c" "d"
Vectors are just 1-dimensional sequences of a single type of data.
Note that vectors can also include strings or character values.
letters <- c("a", "b", "c", "d")letters
## [1] "a" "b" "c" "d"
The general rule R uses is to set the vector to be the most "permissive" type necessary.
Vectors are just 1-dimensional sequences of a single type of data.
Note that vectors can also include strings or character values.
letters <- c("a", "b", "c", "d")letters
## [1] "a" "b" "c" "d"
The general rule R uses is to set the vector to be the most "permissive" type necessary.
For example, what happens if we combine the vectors x
(from earlier) and letters
together?
mixed_vec <- c(x, letters)mixed_vec
## [1] "2" "8.5" "1" "9" "a" "b" "c" "d"
mixed_vec
## [1] "2" "8.5" "1" "9" "a" "b" "c" "d"
Notice the quotes? R turned all of our numbers into strings, since strings are more "permissive" than numbers.
mixed_vec
## [1] "2" "8.5" "1" "9" "a" "b" "c" "d"
Notice the quotes? R turned all of our numbers into strings, since strings are more "permissive" than numbers.
typeof(mixed_vec)
## [1] "character"
mixed_vec
## [1] "2" "8.5" "1" "9" "a" "b" "c" "d"
Notice the quotes? R turned all of our numbers into strings, since strings are more "permissive" than numbers.
typeof(mixed_vec)
## [1] "character"
This is called coercion. R coerces a vector into whichever type will accommodate all of the values. We can coerce mixed_vec
to be numeric using as.numeric()
, but notice what happens to the character values 👀
as.numeric(mixed_vec)
## Warning: NAs introduced by coercion
## [1] 2.0 8.5 1.0 9.0 NA NA NA NA
01:30
Create an object called x
that is assigned the number 8.
Create an object called y
that is a sequence of numbers from 2 to 16, by 2.
Add x
and y
. What happens?
# Q1.x <- 8
# Q2.y <- seq(from = 2, to = 16, by = 2)y
## [1] 2 4 6 8 10 12 14 16
# Q3x + y
## [1] 10 12 14 16 18 20 22 24
This is an example of vector recycling.
When applying an operation to two vectors that requires them to be the same length, R automatically recycles, or repeats, the shorter one, until it is long enough to match the longer one.
03:00
Create an object called a
that is just the letter "a" and an object x
that is assigned the number 8. Add a
to x
. What happens?
Create a vector called b
that is just the number 8 in quotes. Add b
to x
(from above). What happens?
Find some way to add b
to x
. (Hint: Don't forget about coercion.)
How do we extract elements out of vectors?
How do we extract elements out of vectors?
This is called indexing, and it is frequently quite useful
How do we extract elements out of vectors?
This is called indexing, and it is frequently quite useful
There are a number of methods for indexing that are good to be familiar with
Vectors can be indexed numerically, starting with 1 (not 0). We can extract specific elements from a vector by putting the index of their position inside brackets []
.
Vectors can be indexed numerically, starting with 1 (not 0). We can extract specific elements from a vector by putting the index of their position inside brackets []
.
Let's take a new vector z
as an example:
z <- 6:10
Vectors can be indexed numerically, starting with 1 (not 0). We can extract specific elements from a vector by putting the index of their position inside brackets []
.
Let's take a new vector z
as an example:
z <- 6:10
Let's get just the first element of z
:
z[1]
## [1] 6
Get the first and third element by passing those indexes as a vector using c()
.
z[c(1, 3)]
## [1] 6 8
z
## [1] 6 7 8 9 10
We could also say which elements not to give us using the minus sign (-
).
Finally, if the elements in the vector have names, we can refer to them by name instead of their numerical index. You can see the names of a vector using names()
.
names(z)
## NULL
Finally, if the elements in the vector have names, we can refer to them by name instead of their numerical index. You can see the names of a vector using names()
.
names(z)
## NULL
Looks like the elements in z
have no names. We can change that by assigning them names using a vector of character values.
names(z) <- c("first", "second", "third", "fourth", "fifth")z
## first second third fourth fifth ## 6 7 8 9 10
z
## first second third fourth fifth ## 6 7 8 9 10
Now we can use the names of the elements in z
for subsetting, using quotes
z["first"]
## first ## 6
You can use indexing to change elements within a vector.
For example, we could change the first element of z
to missing, or NA
.
z[1] <- NAz
## first second third fourth fifth ## NA 7 8 9 10
03:00
Create a vector called named
that includes the numbers 1 to 5. Name the values "a", "b", "c", "d", and "e" (in order).
Print the first element using numerical indexing and the last element using name indexing.
Change the third element of named
to the value 21 and then show your results.
# Q1. named <- c(a = 1, b = 2, c = 3, d = 4, e = 5)named
## a b c d e ## 1 2 3 4 5
# this works toonamed <- c(1, 2, 3, 4, 5)names(named) <- c("a", "b", "c", "d", "e")named
## a b c d e ## 1 2 3 4 5
named[1]
## a ## 1
named["e"]
## e ## 5
# Q2.named[3] <- named[3]*7named
## a b c d e ## 1 2 21 4 5
# this works toonamed[3] <- 21
Vectors are great for storing a single type of data, but what if we have a variety of different kinds of data we want to store together?
Vectors are great for storing a single type of data, but what if we have a variety of different kinds of data we want to store together?
For example, let's say I have some information about Jane Doe that I want to store together in a single object:
Vectors are great for storing a single type of data, but what if we have a variety of different kinds of data we want to store together?
For example, let's say I have some information about Jane Doe that I want to store together in a single object:
A vector won't work -- every element is coerced to a character (notice the quotes).
c("Jane Doe", 5.5, TRUE)
## [1] "Jane Doe" "5.5" "TRUE"
Vectors are great for storing a single type of data, but what if we have a variety of different kinds of data we want to store together?
For example, let's say I have some information about Jane Doe that I want to store together in a single object:
A vector won't work -- every element is coerced to a character (notice the quotes).
c("Jane Doe", 5.5, TRUE)
## [1] "Jane Doe" "5.5" "TRUE"
Instead, we can put them in a list. Lists are very flexible -- they can contain different types of data and preserve those types.
We can create a list with the list()
function
We can create a list with the list()
function
jane_doe <- list("Jane Doe", 5.5, TRUE)jane_doe
## [[1]]## [1] "Jane Doe"## ## [[2]]## [1] 5.5## ## [[3]]## [1] TRUE
And, we can give each element of the list a name to make it easier to keep track of them.
jane_doe <- list(name = "Jane Doe", height = 5.5, right_handed = TRUE)jane_doe
## $name## [1] "Jane Doe"## ## $height## [1] 5.5## ## $right_handed## [1] TRUE
And, we can give each element of the list a name to make it easier to keep track of them.
jane_doe <- list(name = "Jane Doe", height = 5.5, right_handed = TRUE)jane_doe
## $name## [1] "Jane Doe"## ## $height## [1] 5.5## ## $right_handed## [1] TRUE
Notice that [[1]]
, [[2]]
, and [[3]]
, the element indices, have been replaced by the names name
, height
and right_handed
👀
You can also see the names of a list by running names()
on it
You can also see the names of a list by running names()
on it
names(jane_doe)
## [1] "name" "height" "right_handed"
You can also see the names of a list by running names()
on it
names(jane_doe)
## [1] "name" "height" "right_handed"
Lists are even more flexible than we've seen so far. In addition to being of heterogeneous type, each element of a list can be of different lengths.
Let's add another element to the list about Jane that contains her favorite types of ice cream (she can't choose just one!)
Notice use of c()
to create the element ice_cream
👀
jane_doe <- list(name = "Jane Doe", height = 5.5, right_handed = TRUE, ice_cream = c("mint chip", "pistachio"))jane_doe
## $name## [1] "Jane Doe"## ## $height## [1] 5.5## ## $right_handed## [1] TRUE## ## $ice_cream## [1] "mint chip" "pistachio"
Like vectors, lists can be indexed by their name or their position (numerically).
Like vectors, lists can be indexed by their name or their position (numerically).
For example, if we wanted the height
element, we could get it out using its position as the second element of the list:
jane_doe[2]
## $height## [1] 5.5
Now let's say we want to know Jane's height in inches. Let's see if we can get that by multiplying the height
element by 12.
jane_doe[2] * 12
## Error in jane_doe[2] * 12: non-numeric argument to binary operator
R is telling us that we supplied a non-numeric argument, i.e. jane_doe[2]
.
This happened because single bracket indexing on a list returns a list -- but what we need is the contents of the list (in this case, just the number 5.5
).
If we want the actual object stored at the first position instead of a list containing that object, we have to use double-bracket indexing list[[i]]
:
jane_doe[[2]]
## [1] 5.5
Notice it no longer has the $height
.
In general, a $label
is a hint that you're looking at a list (the container) and not just the object stored at that position (the contents).
Now let's see Jane's height in inches.
jane_doe[[2]] * 12
## [1] 66
The same applies to name indexing. With lists, you can get a list containing the indexed object with single brackets []
.
jane_doe["height"]
## $height## [1] 5.5
And double brackets [[]]
can be used to get the contents -- the object stored with that name.
jane_doe[["height"]]
## [1] 5.5
You can also use list$name
to get the object stored with a particular name too. It is equivalent to double brackets, but you don't need quotes
jane_doe$height
## [1] 5.5
Just like vectors, we can change or add elements to our list using indexing.
Just like vectors, we can change or add elements to our list using indexing.
Let's save the inches transformation of the height
element as height_in
.
jane_doe$height_in <- jane_doe$height * 12jane_doe
## $name## [1] "Jane Doe"## ## $height## [1] 5.5## ## $right_handed## [1] TRUE## ## $ice_cream## [1] "mint chip" "pistachio"## ## $height_in## [1] 66
03:00
Create a list like jane_doe
that is made up of name
, height
, right_handed
, and ice_cream
, but corresponds to information about you.
Index your list to print only your name.
# Q1. (Answers will vary)jane_doe <- list(name = "Jane Doe", height = 5.5, right_handed = TRUE, ice_cream = c("mint chip", "pistachio"))jane_doe
## $name## [1] "Jane Doe"## ## $height## [1] 5.5## ## $right_handed## [1] TRUE## ## $ice_cream## [1] "mint chip" "pistachio"
# Q2. jane_doe$name
## [1] "Jane Doe"
jane_doe[["name"]]
## [1] "Jane Doe"
As we saw with the object ice_cream
stored in the list jane_doe
, objects within lists can have different dimensions and length.
As we saw with the object ice_cream
stored in the list jane_doe
, objects within lists can have different dimensions and length.
What if we wanted just one element of an object in a list, such as just the second element of ice_cream
?
As we saw with the object ice_cream
stored in the list jane_doe
, objects within lists can have different dimensions and length.
What if we wanted just one element of an object in a list, such as just the second element of ice_cream
?
We can use indexing on the ice_cream
vector stored within the jane_doe
list by chaining indexes.
As we saw with the object ice_cream
stored in the list jane_doe
, objects within lists can have different dimensions and length.
What if we wanted just one element of an object in a list, such as just the second element of ice_cream
?
We can use indexing on the ice_cream
vector stored within the jane_doe
list by chaining indexes.
We could do that with numerical indexing...
jane_doe[[4]][2]
## [1] "pistachio"
...or with name indexing
jane_doe[["ice_cream"]][2]
## [1] "pistachio"
...or with dollar sign ($
) indexing:
jane_doe$ice_cream[2]
## [1] "pistachio"
A data frame is a common way of representing rectangular data -- collections of values that are each associated with a variable (column) and an observation (row). In other words, it has 2 dimensions.
A data frame is a common way of representing rectangular data -- collections of values that are each associated with a variable (column) and an observation (row). In other words, it has 2 dimensions.
A data frame is technically a special kind of list -- it can contain different kinds of data in different columns, but each column must be the same length.
A data frame is a common way of representing rectangular data -- collections of values that are each associated with a variable (column) and an observation (row). In other words, it has 2 dimensions.
A data frame is technically a special kind of list -- it can contain different kinds of data in different columns, but each column must be the same length.
We can create a data frame very similarly to how we made a list, but replacing list()
with data.frame()
.
df_1 <- data.frame(c1 = c(1, 3), c2 = c(2, 4), c3 = c("a", "b"))df_1
## c1 c2 c3## 1 1 2 a## 2 3 4 b
df_1
## c1 c2 c3## 1 1 2 a## 2 3 4 b
Indexing data frames is similar to how we index vectors, except we have two dimensions, which we use like so: [row, column]
df_1
## c1 c2 c3## 1 1 2 a## 2 3 4 b
Indexing data frames is similar to how we index vectors, except we have two dimensions, which we use like so: [row, column]
Let's get the first row and third column of df_1
using numerical indexing
df_1[1, 3]
## [1] "a"
You can also get an entire row or column by leaving an index blank. Let's get all rows but only column 2:
df_1[, 2]
## [1] 2 4
We can also index by name
df_1[, "c2"]
## [1] 2 4
df_1
## c1 c2 c3## 1 1 2 a## 2 3 4 b
As with lists, we can use the $
operator in the form dataframe$column_name
(similar to list$object
).
df_1
## c1 c2 c3## 1 1 2 a## 2 3 4 b
As with lists, we can use the $
operator in the form dataframe$column_name
(similar to list$object
).
Let's get the first column
df_1$c1
## [1] 1 3
We can also index a column using vector indexing, since a single column is just a 1-dimensional vector.
df_1$c1[1] # get the first value in column 1
## [1] 1
df_1
## c1 c2 c3## 1 1 2 a## 2 3 4 b
Just like lists and vectors, you can modify a data frame and add new elements or change existing elements by referencing indexes.
df_1
## c1 c2 c3## 1 1 2 a## 2 3 4 b
Just like lists and vectors, you can modify a data frame and add new elements or change existing elements by referencing indexes.
We could create c4
as the sum of c1
and c2
:
df_1$c4 <- df_1$c1 + df_1$c2df_1
## c1 c2 c3 c4## 1 1 2 a 3## 2 3 4 b 7
Or we could replace an element using indexing too. Let's replace c1
with c1^2
:
df_1$c1 <- df_1$c1^2df_1
## c1 c2 c3 c4## 1 1 2 a 3## 2 9 4 b 7
df_1
## c1 c2 c3 c4## 1 1 2 a 3## 2 9 4 b 7
We can use the str()
function to get the structure of the data. This tells us the type of each column.
str(df_1)
## 'data.frame': 2 obs. of 4 variables:## $ c1: num 1 9## $ c2: num 2 4## $ c3: chr "a" "b"## $ c4: num 3 7
03:00
df_2
, that has 3 columns as shown below. After you create it, check the structure with str()
.## c1 c2 c3## 1 1 2 a## 2 2 4 b## 3 3 6 c
c4
, which is the first and second columns multiplied together.We just learned about different types of data (numeric, character, logical, factor, etc.) and some different ways they can be structured -- including vectors, lists and data frames.
We just learned about different types of data (numeric, character, logical, factor, etc.) and some different ways they can be structured -- including vectors, lists and data frames.
Here's a quick table that summarizes data structures.
Homogeneous data | Heterogeneous data | |
---|---|---|
1-Dimensional | Atomic Vector | List |
2-Dimensional | Matrix * |
Data frame |
*
We didn't talk about matrices today, but if you take PSY611, you will learn more about them in the context of the General Linear Model
05:00
10:00
R has different types of data, and an object’s type affects how it interacts with functions and other objects.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
o | Tile View: Overview of Slides |
Esc | Back to slideshow |
R has different types of data, and an object’s type affects how it interacts with functions and other objects.
R has different types of data, and an object’s type affects how it interacts with functions and other objects.
So far, we’ve just been working with numeric data, but there are several other types to be aware of...
R has different types of data, and an object’s type affects how it interacts with functions and other objects.
So far, we’ve just been working with numeric data, but there are several other types to be aware of...
Type | Definition | Example |
---|---|---|
Integer | whole numbers from -Inf to +Inf | 1L , -2L |
Double | numbers, fractions & decimals | -7 , 0.2 , -5/2 |
Character | quoted strings of letters, numbers, and allowed symbols | "1" , "one" , "o-n-e" , "o.n.e" |
Logical | logical constants of true or false | TRUE , FALSE |
Factor | ordered, labelled variable | variable for year in college labelled "Freshman" , "Sophomore" , etc. |
You can use typeof()
to find out the type of a value or object
You can use typeof()
to find out the type of a value or object
typeof(1)
## [1] "double"
typeof(TRUE)
## [1] "logical"
typeof(1L)
## [1] "integer"
typeof("one")
## [1] "character"
There are a few special values worth knowing about too
Value | Definition |
---|---|
NA |
Missing value ("not available") |
NaN |
Not a Number (e.g. 0/0) |
Inf |
Positive infinity |
-Inf |
Negative infinity |
NULL |
An object that exists but is completely empty |
Often, we’re not working with individual values, but with a group of multiple related values -- or a vector of values.
Often, we’re not working with individual values, but with a group of multiple related values -- or a vector of values.
We can create a vector of ordered numbers using the form
starting_number
: ending_number
.
Often, we’re not working with individual values, but with a group of multiple related values -- or a vector of values.
We can create a vector of ordered numbers using the form
starting_number
: ending_number
.
For example, we could make x
a vector with the numbers between 1 and 5.
x <- 1:5x
## [1] 1 2 3 4 5
Often, we’re not working with individual values, but with a group of multiple related values -- or a vector of values.
We can create a vector of ordered numbers using the form
starting_number
: ending_number
.
For example, we could make x
a vector with the numbers between 1 and 5.
x <- 1:5x
## [1] 1 2 3 4 5
Let's look at the Environment pane in RStudio...
Since x
is a vector, it tells us what type of vector it is and its length in addition to its contents (which can be abbreviated if the object is larger).
We can create a vector of any numbers we want using c()
, which is a function. You can think of c()
as short for "combine".
We can create a vector of any numbers we want using c()
, which is a function. You can think of c()
as short for "combine".
You use c()
by putting numbers separated by a comma within the parentheses.
# combine values into a vector and assign to an object names 'x'x <- c(2, 8.5, 1, 9)# print xx
## [1] 2.0 8.5 1.0 9.0
We can create a vector of any numbers we want using c()
, which is a function. You can think of c()
as short for "combine".
You use c()
by putting numbers separated by a comma within the parentheses.
# combine values into a vector and assign to an object names 'x'x <- c(2, 8.5, 1, 9)# print xx
## [1] 2.0 8.5 1.0 9.0
We can also create a vector of numbers using seq()
.
seq()
is a function that creates a sequence of numbers.
To learn how any R function works, you can access the help documentation by typing ?function_name
.
To learn how any R function works, you can access the help documentation by typing ?function_name
.
Let's take a look at how seq()
works...
?seq
What happens if we run seq()
with no arguments?
seq()
## [1] 1
What happens if we run seq()
with no arguments?
seq()
## [1] 1
The seq()
function has arguments with default values.
The first two arguments are from
and to
, which specify the starting and end values of the sequence. By default from = 1
and to = 1
.
This means that typing seq()
is equivalent to typing seq(from = 1, to = 1)
, which generates a sequence with just one value: 1
.
We will talk more about how functions work in the next slide deck.
To make a sequence from 1 to 5 with this function, we have to set the arguments accordingly: from = 1
and to = 5
seq(from = 1, to = 5)
## [1] 1 2 3 4 5
To make a sequence from 1 to 5 with this function, we have to set the arguments accordingly: from = 1
and to = 5
seq(from = 1, to = 5)
## [1] 1 2 3 4 5
We can also set one or more of the other arguments...
To make a sequence from 1 to 5 with this function, we have to set the arguments accordingly: from = 1
and to = 5
seq(from = 1, to = 5)
## [1] 1 2 3 4 5
We can also set one or more of the other arguments...
The by
argument allows us to change the increment of the sequence. For example, to get every other number between 1 and 5, we would set by = 2
seq(from = 1, to = 5, by = 2)
## [1] 1 3 5
Vectors are just 1-dimensional sequences of a single type of data.
Vectors are just 1-dimensional sequences of a single type of data.
Note that vectors can also include strings or character values.
letters <- c("a", "b", "c", "d")letters
## [1] "a" "b" "c" "d"
Vectors are just 1-dimensional sequences of a single type of data.
Note that vectors can also include strings or character values.
letters <- c("a", "b", "c", "d")letters
## [1] "a" "b" "c" "d"
The general rule R uses is to set the vector to be the most "permissive" type necessary.
Vectors are just 1-dimensional sequences of a single type of data.
Note that vectors can also include strings or character values.
letters <- c("a", "b", "c", "d")letters
## [1] "a" "b" "c" "d"
The general rule R uses is to set the vector to be the most "permissive" type necessary.
For example, what happens if we combine the vectors x
(from earlier) and letters
together?
mixed_vec <- c(x, letters)mixed_vec
## [1] "2" "8.5" "1" "9" "a" "b" "c" "d"
mixed_vec
## [1] "2" "8.5" "1" "9" "a" "b" "c" "d"
Notice the quotes? R turned all of our numbers into strings, since strings are more "permissive" than numbers.
mixed_vec
## [1] "2" "8.5" "1" "9" "a" "b" "c" "d"
Notice the quotes? R turned all of our numbers into strings, since strings are more "permissive" than numbers.
typeof(mixed_vec)
## [1] "character"
mixed_vec
## [1] "2" "8.5" "1" "9" "a" "b" "c" "d"
Notice the quotes? R turned all of our numbers into strings, since strings are more "permissive" than numbers.
typeof(mixed_vec)
## [1] "character"
This is called coercion. R coerces a vector into whichever type will accommodate all of the values. We can coerce mixed_vec
to be numeric using as.numeric()
, but notice what happens to the character values 👀
as.numeric(mixed_vec)
## Warning: NAs introduced by coercion
## [1] 2.0 8.5 1.0 9.0 NA NA NA NA
01:30
Create an object called x
that is assigned the number 8.
Create an object called y
that is a sequence of numbers from 2 to 16, by 2.
Add x
and y
. What happens?
# Q1.x <- 8
# Q2.y <- seq(from = 2, to = 16, by = 2)y
## [1] 2 4 6 8 10 12 14 16
# Q3x + y
## [1] 10 12 14 16 18 20 22 24
This is an example of vector recycling.
When applying an operation to two vectors that requires them to be the same length, R automatically recycles, or repeats, the shorter one, until it is long enough to match the longer one.
03:00
Create an object called a
that is just the letter "a" and an object x
that is assigned the number 8. Add a
to x
. What happens?
Create a vector called b
that is just the number 8 in quotes. Add b
to x
(from above). What happens?
Find some way to add b
to x
. (Hint: Don't forget about coercion.)
How do we extract elements out of vectors?
How do we extract elements out of vectors?
This is called indexing, and it is frequently quite useful
How do we extract elements out of vectors?
This is called indexing, and it is frequently quite useful
There are a number of methods for indexing that are good to be familiar with
Vectors can be indexed numerically, starting with 1 (not 0). We can extract specific elements from a vector by putting the index of their position inside brackets []
.
Vectors can be indexed numerically, starting with 1 (not 0). We can extract specific elements from a vector by putting the index of their position inside brackets []
.
Let's take a new vector z
as an example:
z <- 6:10
Vectors can be indexed numerically, starting with 1 (not 0). We can extract specific elements from a vector by putting the index of their position inside brackets []
.
Let's take a new vector z
as an example:
z <- 6:10
Let's get just the first element of z
:
z[1]
## [1] 6
Get the first and third element by passing those indexes as a vector using c()
.
z[c(1, 3)]
## [1] 6 8
z
## [1] 6 7 8 9 10
We could also say which elements not to give us using the minus sign (-
).
Finally, if the elements in the vector have names, we can refer to them by name instead of their numerical index. You can see the names of a vector using names()
.
names(z)
## NULL
Finally, if the elements in the vector have names, we can refer to them by name instead of their numerical index. You can see the names of a vector using names()
.
names(z)
## NULL
Looks like the elements in z
have no names. We can change that by assigning them names using a vector of character values.
names(z) <- c("first", "second", "third", "fourth", "fifth")z
## first second third fourth fifth ## 6 7 8 9 10
z
## first second third fourth fifth ## 6 7 8 9 10
Now we can use the names of the elements in z
for subsetting, using quotes
z["first"]
## first ## 6
You can use indexing to change elements within a vector.
For example, we could change the first element of z
to missing, or NA
.
z[1] <- NAz
## first second third fourth fifth ## NA 7 8 9 10
03:00
Create a vector called named
that includes the numbers 1 to 5. Name the values "a", "b", "c", "d", and "e" (in order).
Print the first element using numerical indexing and the last element using name indexing.
Change the third element of named
to the value 21 and then show your results.
# Q1. named <- c(a = 1, b = 2, c = 3, d = 4, e = 5)named
## a b c d e ## 1 2 3 4 5
# this works toonamed <- c(1, 2, 3, 4, 5)names(named) <- c("a", "b", "c", "d", "e")named
## a b c d e ## 1 2 3 4 5
named[1]
## a ## 1
named["e"]
## e ## 5
# Q2.named[3] <- named[3]*7named
## a b c d e ## 1 2 21 4 5
# this works toonamed[3] <- 21
Vectors are great for storing a single type of data, but what if we have a variety of different kinds of data we want to store together?
Vectors are great for storing a single type of data, but what if we have a variety of different kinds of data we want to store together?
For example, let's say I have some information about Jane Doe that I want to store together in a single object:
Vectors are great for storing a single type of data, but what if we have a variety of different kinds of data we want to store together?
For example, let's say I have some information about Jane Doe that I want to store together in a single object:
A vector won't work -- every element is coerced to a character (notice the quotes).
c("Jane Doe", 5.5, TRUE)
## [1] "Jane Doe" "5.5" "TRUE"
Vectors are great for storing a single type of data, but what if we have a variety of different kinds of data we want to store together?
For example, let's say I have some information about Jane Doe that I want to store together in a single object:
A vector won't work -- every element is coerced to a character (notice the quotes).
c("Jane Doe", 5.5, TRUE)
## [1] "Jane Doe" "5.5" "TRUE"
Instead, we can put them in a list. Lists are very flexible -- they can contain different types of data and preserve those types.
We can create a list with the list()
function
We can create a list with the list()
function
jane_doe <- list("Jane Doe", 5.5, TRUE)jane_doe
## [[1]]## [1] "Jane Doe"## ## [[2]]## [1] 5.5## ## [[3]]## [1] TRUE
And, we can give each element of the list a name to make it easier to keep track of them.
jane_doe <- list(name = "Jane Doe", height = 5.5, right_handed = TRUE)jane_doe
## $name## [1] "Jane Doe"## ## $height## [1] 5.5## ## $right_handed## [1] TRUE
And, we can give each element of the list a name to make it easier to keep track of them.
jane_doe <- list(name = "Jane Doe", height = 5.5, right_handed = TRUE)jane_doe
## $name## [1] "Jane Doe"## ## $height## [1] 5.5## ## $right_handed## [1] TRUE
Notice that [[1]]
, [[2]]
, and [[3]]
, the element indices, have been replaced by the names name
, height
and right_handed
👀
You can also see the names of a list by running names()
on it
You can also see the names of a list by running names()
on it
names(jane_doe)
## [1] "name" "height" "right_handed"
You can also see the names of a list by running names()
on it
names(jane_doe)
## [1] "name" "height" "right_handed"
Lists are even more flexible than we've seen so far. In addition to being of heterogeneous type, each element of a list can be of different lengths.
Let's add another element to the list about Jane that contains her favorite types of ice cream (she can't choose just one!)
Notice use of c()
to create the element ice_cream
👀
jane_doe <- list(name = "Jane Doe", height = 5.5, right_handed = TRUE, ice_cream = c("mint chip", "pistachio"))jane_doe
## $name## [1] "Jane Doe"## ## $height## [1] 5.5## ## $right_handed## [1] TRUE## ## $ice_cream## [1] "mint chip" "pistachio"
Like vectors, lists can be indexed by their name or their position (numerically).
Like vectors, lists can be indexed by their name or their position (numerically).
For example, if we wanted the height
element, we could get it out using its position as the second element of the list:
jane_doe[2]
## $height## [1] 5.5
Now let's say we want to know Jane's height in inches. Let's see if we can get that by multiplying the height
element by 12.
jane_doe[2] * 12
## Error in jane_doe[2] * 12: non-numeric argument to binary operator
R is telling us that we supplied a non-numeric argument, i.e. jane_doe[2]
.
This happened because single bracket indexing on a list returns a list -- but what we need is the contents of the list (in this case, just the number 5.5
).
If we want the actual object stored at the first position instead of a list containing that object, we have to use double-bracket indexing list[[i]]
:
jane_doe[[2]]
## [1] 5.5
Notice it no longer has the $height
.
In general, a $label
is a hint that you're looking at a list (the container) and not just the object stored at that position (the contents).
Now let's see Jane's height in inches.
jane_doe[[2]] * 12
## [1] 66
The same applies to name indexing. With lists, you can get a list containing the indexed object with single brackets []
.
jane_doe["height"]
## $height## [1] 5.5
And double brackets [[]]
can be used to get the contents -- the object stored with that name.
jane_doe[["height"]]
## [1] 5.5
You can also use list$name
to get the object stored with a particular name too. It is equivalent to double brackets, but you don't need quotes
jane_doe$height
## [1] 5.5
Just like vectors, we can change or add elements to our list using indexing.
Just like vectors, we can change or add elements to our list using indexing.
Let's save the inches transformation of the height
element as height_in
.
jane_doe$height_in <- jane_doe$height * 12jane_doe
## $name## [1] "Jane Doe"## ## $height## [1] 5.5## ## $right_handed## [1] TRUE## ## $ice_cream## [1] "mint chip" "pistachio"## ## $height_in## [1] 66
03:00
Create a list like jane_doe
that is made up of name
, height
, right_handed
, and ice_cream
, but corresponds to information about you.
Index your list to print only your name.
# Q1. (Answers will vary)jane_doe <- list(name = "Jane Doe", height = 5.5, right_handed = TRUE, ice_cream = c("mint chip", "pistachio"))jane_doe
## $name## [1] "Jane Doe"## ## $height## [1] 5.5## ## $right_handed## [1] TRUE## ## $ice_cream## [1] "mint chip" "pistachio"
# Q2. jane_doe$name
## [1] "Jane Doe"
jane_doe[["name"]]
## [1] "Jane Doe"
As we saw with the object ice_cream
stored in the list jane_doe
, objects within lists can have different dimensions and length.
As we saw with the object ice_cream
stored in the list jane_doe
, objects within lists can have different dimensions and length.
What if we wanted just one element of an object in a list, such as just the second element of ice_cream
?
As we saw with the object ice_cream
stored in the list jane_doe
, objects within lists can have different dimensions and length.
What if we wanted just one element of an object in a list, such as just the second element of ice_cream
?
We can use indexing on the ice_cream
vector stored within the jane_doe
list by chaining indexes.
As we saw with the object ice_cream
stored in the list jane_doe
, objects within lists can have different dimensions and length.
What if we wanted just one element of an object in a list, such as just the second element of ice_cream
?
We can use indexing on the ice_cream
vector stored within the jane_doe
list by chaining indexes.
We could do that with numerical indexing...
jane_doe[[4]][2]
## [1] "pistachio"
...or with name indexing
jane_doe[["ice_cream"]][2]
## [1] "pistachio"
...or with dollar sign ($
) indexing:
jane_doe$ice_cream[2]
## [1] "pistachio"
A data frame is a common way of representing rectangular data -- collections of values that are each associated with a variable (column) and an observation (row). In other words, it has 2 dimensions.
A data frame is a common way of representing rectangular data -- collections of values that are each associated with a variable (column) and an observation (row). In other words, it has 2 dimensions.
A data frame is technically a special kind of list -- it can contain different kinds of data in different columns, but each column must be the same length.
A data frame is a common way of representing rectangular data -- collections of values that are each associated with a variable (column) and an observation (row). In other words, it has 2 dimensions.
A data frame is technically a special kind of list -- it can contain different kinds of data in different columns, but each column must be the same length.
We can create a data frame very similarly to how we made a list, but replacing list()
with data.frame()
.
df_1 <- data.frame(c1 = c(1, 3), c2 = c(2, 4), c3 = c("a", "b"))df_1
## c1 c2 c3## 1 1 2 a## 2 3 4 b
df_1
## c1 c2 c3## 1 1 2 a## 2 3 4 b
Indexing data frames is similar to how we index vectors, except we have two dimensions, which we use like so: [row, column]
df_1
## c1 c2 c3## 1 1 2 a## 2 3 4 b
Indexing data frames is similar to how we index vectors, except we have two dimensions, which we use like so: [row, column]
Let's get the first row and third column of df_1
using numerical indexing
df_1[1, 3]
## [1] "a"
You can also get an entire row or column by leaving an index blank. Let's get all rows but only column 2:
df_1[, 2]
## [1] 2 4
We can also index by name
df_1[, "c2"]
## [1] 2 4
df_1
## c1 c2 c3## 1 1 2 a## 2 3 4 b
As with lists, we can use the $
operator in the form dataframe$column_name
(similar to list$object
).
df_1
## c1 c2 c3## 1 1 2 a## 2 3 4 b
As with lists, we can use the $
operator in the form dataframe$column_name
(similar to list$object
).
Let's get the first column
df_1$c1
## [1] 1 3
We can also index a column using vector indexing, since a single column is just a 1-dimensional vector.
df_1$c1[1] # get the first value in column 1
## [1] 1
df_1
## c1 c2 c3## 1 1 2 a## 2 3 4 b
Just like lists and vectors, you can modify a data frame and add new elements or change existing elements by referencing indexes.
df_1
## c1 c2 c3## 1 1 2 a## 2 3 4 b
Just like lists and vectors, you can modify a data frame and add new elements or change existing elements by referencing indexes.
We could create c4
as the sum of c1
and c2
:
df_1$c4 <- df_1$c1 + df_1$c2df_1
## c1 c2 c3 c4## 1 1 2 a 3## 2 3 4 b 7
Or we could replace an element using indexing too. Let's replace c1
with c1^2
:
df_1$c1 <- df_1$c1^2df_1
## c1 c2 c3 c4## 1 1 2 a 3## 2 9 4 b 7
df_1
## c1 c2 c3 c4## 1 1 2 a 3## 2 9 4 b 7
We can use the str()
function to get the structure of the data. This tells us the type of each column.
str(df_1)
## 'data.frame': 2 obs. of 4 variables:## $ c1: num 1 9## $ c2: num 2 4## $ c3: chr "a" "b"## $ c4: num 3 7
03:00
df_2
, that has 3 columns as shown below. After you create it, check the structure with str()
.## c1 c2 c3## 1 1 2 a## 2 2 4 b## 3 3 6 c
c4
, which is the first and second columns multiplied together.We just learned about different types of data (numeric, character, logical, factor, etc.) and some different ways they can be structured -- including vectors, lists and data frames.
We just learned about different types of data (numeric, character, logical, factor, etc.) and some different ways they can be structured -- including vectors, lists and data frames.
Here's a quick table that summarizes data structures.
Homogeneous data | Heterogeneous data | |
---|---|---|
1-Dimensional | Atomic Vector | List |
2-Dimensional | Matrix * |
Data frame |
*
We didn't talk about matrices today, but if you take PSY611, you will learn more about them in the context of the General Linear Model
05:00
10:00