Learn about the basics in R.
To get help of a particular function, write “?” and the name of the function in your console and press enter.
?mean
To search the help files for a word or phrase, write in your console help.search()
and in parenthesis the words you want between quotes.
help.search("weighted mean")
To find help for a package, write in your console help(package = " ")
and in quotes the package you want.
help(package = "dplyr")
You will see at the bottom right panel, the window labeled “Help” open and give you all the information you asked for.
To get a summary of an object’s structure, write str()
and in parenthesis the object you want.
str(iris)
To find the class an object belongs to, write class()
and in parenthesis the object you want.
class(iris)
To find the current working directory (where inputs are found and outputs are sent), write in your console getwd()
. You will get a path to the storage location of your document.
getwd()
To change the current working directory, write in your console setwd()
and in parenthesis between quotes the path you want to save your file.
For Windows:
For Mac OSX:
The simpler way to store a number is through an assignment. For example, here we assign the number 3 in a variable called “three”.
three <- 3
The “<-” tells R to take the number to the right of the symbol (here: 3) and store it in a variable whose name is given on the left (here: three). You can also use the “=” symbol instead of “<-”.
When you make an assignment R does not print out any information. If you want to see what value a variable has, just type the name of the variable on a line and press the cmd ⌘ + enter key.
three
[1] 3
If you want to store a list of numbers, the simpler way to do it is through an assignment using the c command.
c()
<-
The idea is that a list of numbers is stored under a given name, and the name is used to refer to the data. As an example, we can create a new variable, called “randomNumbers” which will contain the numbers 4, 2, 8, and 5:
randomNumbers <- c(4,2,8,5)
When you enter this command, you will see in the top right panel under the Environment tab, your vector “randomNumbers” created.
To see what numbers are included in randomNumbers type “randomNumbers” and press the enter key:
randomNumbers
[1] 4 2 8 5
If you wish to see one number in particular you can get access to it using the variable and then square brackets indicating which number:
randomNumbers[3]
[1] 8
randomNumbers[1]
[1] 4
Notice that:
randomNumbers[3]
returns the number “8”, the 3rd digit of “randomNumbers”randomNumbers[1]
returns the number “4”, the 1st digit of “randomNumbers”You can also store strings (words) using both single and double quotes. Notice that you can also use the c command with strings to store multiple words in a vector.
a
[1] "apple"
b
[1] "banana"
ab
[1] "apple" "banana"
ba
[1] "banana" "apple"
To read a data set from a “comma separated values” (csv) file, you can use the read.csv()
function. The format csv means that, each line contains a row of values which can be numbers or letters, and each value is separated by a comma.
df <- read.csv(file="nameOfTheFile.csv", head=TRUE, sep=",")
You can see 3 options inside the read.csv()
. Let’s talk about it:
file="nameOfTheFile.csv"
| You certainly recognized that here you write the name of your file.head=TRUE
| This option specify if the first line of your file is the name of the columns or not. If TRUE, the first line will be the name of the columns, if FALSE, the function will write generic columns name.sep=","
| Here, you speficy the separator of the values. As you know csv file mean “comma separated values”, but sometimes the values are separeted by semicolon. Therefore, you write sep=";"
.To write a csv file, you can use the write.csv()
write.csv(df, "file.csv")
There is 2 options in the write.csv()
. Let’s see what it means:
df
| It represents the name of your data."file.csv"
| This is the name of the csv file that you will store your data in.To read and write a delimited text file:
df <- read.table("file.txt")
write.table(df, "file.txt")
To read and write an R data file, a file type special for R:
As we saw ahead, we can store a number or a list of numbers into a variable.
x <- 4
x
[1] 4
You can do all sorts of basic operations and save the numbers:
y <- sqrt(x*x+3)
y
[1] 4.358899
If you want to get a list of the variables that you have defined in a particular session you can list them all using the ls command:
ls()
[1] "a" "ab" "b" "ba"
[5] "randomNumbers" "three" "x" "y"
As you now know, you are not limited to just store a single number. You can create a list (also called a “vector”) using the c command:
x <- c(1,2,3,4,5)
x
[1] 1 2 3 4 5
You can notice that we have stored the series of numbers 1,2,3,4,5 in the same variable x. Above, we already have stored the number 4 in it. It does not matter, because you can store as many time as you want in the same variable name. You basically wrote over it. So, x will now have the series of numbers (1,2,3,4,5) and not only the number 4.
Another way to write a series of numbers like 1,2,3,4,5 is as follow:
y <- 1:5
y
[1] 1 2 3 4 5
We can do some maths on this vector labeled x.
As we saw above, you can get access to particular entries in the vector in the following manner:
x[1]
[1] 1
x[3]
[1] 3
x[6]
[1] NA
If you wish to determine the data type of a variable:
typeof(a)
[1] "character"
As you know, you are not limited to just storing numbers. You can also store strings. A string is specified by using quotes. Both single and double quotes will work:
The name of the type given to strings is character:
typeof(x)
[1] "character"
Another important way R can store data is as a factor. In R, you have some data already stored. The data called “iris” is one of them. We are going to look at a specific column of the iris data called “Species”.
summary(iris$Species)
setosa versicolor virginica
50 50 50
levels(iris$Species)
[1] "setosa" "versicolor" "virginica"
The summary()
function informs us that there is three levels (setosa versicolor virginica) containing each 50 rows. The levels()
function tells us the three levels but don’t say anything about the number of rows.
Another way that information is stored is in data frames. This is a way to take many vectors of different types and store them in the same variable. The vectors can be of all different types. For example, a data frame may contain many lists, and each list might be a list of factors, strings, or numbers.
There are different ways to create and manipulate data frames. One example of how to create a data frame is given below:
a <- c(2,4,6,8)
b <- c(4,8,12,16)
levels <- factor(c("A","B","A","B"))
bubba <- data.frame(first=a, second=b, f=levels)
bubba
first second f
1 2 4 A
2 4 8 B
3 6 12 A
4 8 16 B
summary(bubba)
first second f
Min. :2.0 Min. : 4 A:2
1st Qu.:3.5 1st Qu.: 7 B:2
Median :5.0 Median :10
Mean :5.0 Mean :10
3rd Qu.:6.5 3rd Qu.:13
Max. :8.0 Max. :16
bubba$first
[1] 2 4 6 8
bubba$second
[1] 4 8 12 16
bubba$f
[1] A B A B
Levels: A B
Another important data type is the logical type. There are two predefined variables, TRUE and FALSE:
The standard logical operators can be used:
Logical operators | Meaning |
---|---|
< | less than |
> | great than |
<= | less than or equal |
>= | greater than or equal |
== | equal to |
!= | not equal to |
| | entry wise or |
|| | or |
! | not |
& | entry wise and |
&& | and |
xor(a,b) | exclusive or |
Here, some maths functions.
[1] 1
max(x) # Largest element.
[1] 4
median(x) # Median.
[1] 2.5
mean(x) # Mean.
[1] 2.5
quantile(x) # Percentage quantiles.
0% 25% 50% 75% 100%
1.00 1.75 2.50 3.25 4.00
rank(x) # Rank of elements.
[1] 1 2 3 4
log(x) # Natural log.
[1] 0.0000000 0.6931472 1.0986123 1.3862944
sum(x) # Sum.
[1] 10
exp(x) # Exponential
[1] 2.718282 7.389056 20.085537 54.598150
var(x) # The variance.
[1] 1.666667
sd(x) # The standard deviation.
[1] 1.290994
[1] 1
[1] 1.4 2.5 3.9 4.2
signif(x, 1) # Round to n significant figures.
[1] 1 3 4 4
What you’ve seen so far is functions included in the R Base Package. You can install packages coming from CRAN (Comprehensive R Archive Network).
To download and install a package from CRAN:
install.packages("package")
To load the package into the session, making all its functions available to use:
library(package)
To use a particular function from a package:
As you remember, we used an existing data called “iris”. There is other data stored in R, like “mtcars” for example. So, to load a built-in dataset into the environment:
Sometimes you want to clean up your Environment. Remember the Environment tab is in the top left panel next to the History tab. As you see, there is some stuff in the Environment.
To get a list of the variables in the environment use the ls()
function.
ls()
[1] "a" "ab" "b" "ba"
[5] "bubba" "iris" "levels" "mtcars"
[9] "randomNumbers" "three" "x" "y"
To remove a specific variable from the environment. We removed the x variable.
rm(x)
To remove all variables from the environment:
This course is based on Base R Cheat Sheet and the R Tutorial.
For attribution, please cite this work as
Warin (2019, May 4). Thierry Warin, PhD: [R Course] R Basics. Retrieved from https://warin.ca/posts/rcourse-rbasics/
BibTeX citation
@misc{warin2019[r, author = {Warin, Thierry}, title = {Thierry Warin, PhD: [R Course] R Basics}, url = {https://warin.ca/posts/rcourse-rbasics/}, year = {2019} }