[R Course] R Basics

R Courses

Learn about the basics in R.

Thierry Warin https://warin.ca/aboutme.html (HEC Montréal and CIRANO (Canada))https://www.hec.ca/en/profs/thierry.warin.html
05-04-2019

Getting Help

Access the help files

To get help of a particular function, write “?” and the name of the function in your console and press enter.

?mean

To search the help files for a word or phrase, write in your console help.search() and in parenthesis the words you want between quotes.

help.search("weighted mean") 

To find help for a package, write in your console help(package = " ") and in quotes the package you want.

help(package = "dplyr")

You will see at the bottom right panel, the window labeled “Help” open and give you all the information you asked for.

More about an object

To get a summary of an object’s structure, write str() and in parenthesis the object you want.

str(iris)

To find the class an object belongs to, write class() and in parenthesis the object you want.

class(iris)

Working Directory

To find the current working directory (where inputs are found and outputs are sent), write in your console getwd(). You will get a path to the storage location of your document.

getwd()

To change the current working directory, write in your console setwd() and in parenthesis between quotes the path you want to save your file.

For Windows:

setwd(‘C://file/path’) 

For Mac OSX:

setwd(‘~/file/path’) 

Input and Output

Assignment

The simpler way to store a number is through an assignment. For example, here we assign the number 3 in a variable called “three”.

three <- 3

The “<-” tells R to take the number to the right of the symbol (here: 3) and store it in a variable whose name is given on the left (here: three). You can also use the “=” symbol instead of “<-”.

When you make an assignment R does not print out any information. If you want to see what value a variable has, just type the name of the variable on a line and press the cmd ⌘ + enter key.

three
[1] 3

If you want to store a list of numbers, the simpler way to do it is through an assignment using the c command.

The idea is that a list of numbers is stored under a given name, and the name is used to refer to the data. As an example, we can create a new variable, called “randomNumbers” which will contain the numbers 4, 2, 8, and 5:

randomNumbers <- c(4,2,8,5)

When you enter this command, you will see in the top right panel under the Environment tab, your vector “randomNumbers” created.

To see what numbers are included in randomNumbers type “randomNumbers” and press the enter key:

randomNumbers
[1] 4 2 8 5

If you wish to see one number in particular you can get access to it using the variable and then square brackets indicating which number:

randomNumbers[3]
[1] 8
randomNumbers[1]
[1] 4

Notice that:

You can also store strings (words) using both single and double quotes. Notice that you can also use the c command with strings to store multiple words in a vector.

a <- "apple"
b <- 'banana' 
ab <- c("apple", "banana")
ba <- c('banana', 'apple')
a
[1] "apple"
b
[1] "banana"
ab
[1] "apple"  "banana"
ba
[1] "banana" "apple" 

Reading and writing data

csv

To read a data set from a “comma separated values” (csv) file, you can use the read.csv() function. The format csv means that, each line contains a row of values which can be numbers or letters, and each value is separated by a comma.

df <- read.csv(file="nameOfTheFile.csv", head=TRUE, sep=",")

You can see 3 options inside the read.csv(). Let’s talk about it:

To write a csv file, you can use the write.csv()

write.csv(df, "file.csv")

There is 2 options in the write.csv(). Let’s see what it means:

txt

To read and write a delimited text file:

df <- read.table("file.txt")
write.table(df, "file.txt")

rdata

To read and write an R data file, a file type special for R:

load("file.RData")
save(df, file = "file.Rdata")

Data Types

Numbers

As we saw ahead, we can store a number or a list of numbers into a variable.

x <- 4
x
[1] 4

You can do all sorts of basic operations and save the numbers:

y <- sqrt(x*x+3)
y
[1] 4.358899

If you want to get a list of the variables that you have defined in a particular session you can list them all using the ls command:

ls()
[1] "a"             "ab"            "b"             "ba"           
[5] "randomNumbers" "three"         "x"             "y"            

As you now know, you are not limited to just store a single number. You can create a list (also called a “vector”) using the c command:

x <- c(1,2,3,4,5)
x
[1] 1 2 3 4 5

You can notice that we have stored the series of numbers 1,2,3,4,5 in the same variable x. Above, we already have stored the number 4 in it. It does not matter, because you can store as many time as you want in the same variable name. You basically wrote over it. So, x will now have the series of numbers (1,2,3,4,5) and not only the number 4.

Another way to write a series of numbers like 1,2,3,4,5 is as follow:

y <- 1:5
y
[1] 1 2 3 4 5

We can do some maths on this vector labeled x.

mean(x)
[1] 3
var(x)
[1] 2.5
mean(y)
[1] 3
var(y)
[1] 2.5

As we saw above, you can get access to particular entries in the vector in the following manner:

x[1]
[1] 1
x[3]
[1] 3
x[6]
[1] NA

If you wish to determine the data type of a variable:

typeof(a)
[1] "character"

Strings

As you know, you are not limited to just storing numbers. You can also store strings. A string is specified by using quotes. Both single and double quotes will work:

x <- "hello"
x
[1] "hello"
y <- c("hello","there")
y
[1] "hello" "there"
y[1]
[1] "hello"

The name of the type given to strings is character:

typeof(x)
[1] "character"

Factors

Another important way R can store data is as a factor. In R, you have some data already stored. The data called “iris” is one of them. We are going to look at a specific column of the iris data called “Species”.

summary(iris$Species)
    setosa versicolor  virginica 
        50         50         50 
levels(iris$Species)
[1] "setosa"     "versicolor" "virginica" 

The summary() function informs us that there is three levels (setosa versicolor virginica) containing each 50 rows. The levels() function tells us the three levels but don’t say anything about the number of rows.

Data Frames

Another way that information is stored is in data frames. This is a way to take many vectors of different types and store them in the same variable. The vectors can be of all different types. For example, a data frame may contain many lists, and each list might be a list of factors, strings, or numbers.

There are different ways to create and manipulate data frames. One example of how to create a data frame is given below:

a <- c(2,4,6,8)
b <- c(4,8,12,16)
levels <- factor(c("A","B","A","B"))
bubba <- data.frame(first=a, second=b, f=levels)
bubba
  first second f
1     2      4 A
2     4      8 B
3     6     12 A
4     8     16 B
summary(bubba)
     first         second   f    
 Min.   :2.0   Min.   : 4   A:2  
 1st Qu.:3.5   1st Qu.: 7   B:2  
 Median :5.0   Median :10        
 Mean   :5.0   Mean   :10        
 3rd Qu.:6.5   3rd Qu.:13        
 Max.   :8.0   Max.   :16        
bubba$first
[1] 2 4 6 8
bubba$second
[1]  4  8 12 16
bubba$f
[1] A B A B
Levels: A B

Logical

Another important data type is the logical type. There are two predefined variables, TRUE and FALSE:

a = TRUE
typeof(a)
[1] "logical"
b = FALSE
typeof(b)
[1] "logical"

The standard logical operators can be used:

Logical operators Meaning
< less than
> great than
<= less than or equal
>= greater than or equal
== equal to
!= not equal to
| entry wise or
|| or
! not
& entry wise and
&& and
xor(a,b) exclusive or

Maths Functions

Here, some maths functions.

x <- c(1,2,3,4)

min(x) # Smallest element. 
[1] 1
max(x) # Largest element. 
[1] 4
median(x) # Median.
[1] 2.5
mean(x) # Mean.
[1] 2.5
quantile(x) # Percentage quantiles.
  0%  25%  50%  75% 100% 
1.00 1.75 2.50 3.25 4.00 
rank(x) # Rank of elements.
[1] 1 2 3 4
log(x) # Natural log. 
[1] 0.0000000 0.6931472 1.0986123 1.3862944
sum(x) # Sum.
[1] 10
exp(x) # Exponential
[1]  2.718282  7.389056 20.085537 54.598150
var(x) # The variance.
[1] 1.666667
sd(x) # The standard deviation.
[1] 1.290994
y <- c(2,4,6,8)

cor(x, y) # Correlation. 
[1] 1
x <- c(1.42,2.55,3.86,4.21)

round(x, 1) # Round to n decimal places.
[1] 1.4 2.5 3.9 4.2
signif(x, 1) # Round to n significant figures.
[1] 1 3 4 4

Packages and Libraries

What you’ve seen so far is functions included in the R Base Package. You can install packages coming from CRAN (Comprehensive R Archive Network).

To download and install a package from CRAN:

install.packages("package")

To load the package into the session, making all its functions available to use:

library(package)

To use a particular function from a package:

package::function()

As you remember, we used an existing data called “iris”. There is other data stored in R, like “mtcars” for example. So, to load a built-in dataset into the environment:

data("iris")
data("mtcars")

Environment

Sometimes you want to clean up your Environment. Remember the Environment tab is in the top left panel next to the History tab. As you see, there is some stuff in the Environment.

To get a list of the variables in the environment use the ls() function.

ls()
 [1] "a"             "ab"            "b"             "ba"           
 [5] "bubba"         "iris"          "levels"        "mtcars"       
 [9] "randomNumbers" "three"         "x"             "y"            

To remove a specific variable from the environment. We removed the x variable.

rm(x)

To remove all variables from the environment:

rm(list = ls()) 

Acknowledgments

This course is based on Base R Cheat Sheet and the R Tutorial.


Citation

For attribution, please cite this work as

Warin (2019, May 4). Thierry Warin, PhD: [R Course] R Basics. Retrieved from https://warin.ca/posts/rcourse-rbasics/

BibTeX citation

@misc{warin2019[r,
  author = {Warin, Thierry},
  title = {Thierry Warin, PhD: [R Course] R Basics},
  url = {https://warin.ca/posts/rcourse-rbasics/},
  year = {2019}
}