Introduction

Welcome to this book on “Data Pipeline with R”.

In this book, you wil learn some fundamentals about coding with R, its grammar, its vocabulary, as well as some of the best practices in dealing with data, doing some data wrangling, and visualizing data on graphs and dashboards.

You will thus learn how to use R with data and how to produce your desired output such as a graph or a full report. We will thus introduce Markdown as another language for the production part of your work, and we will introduce Git (and Github) in order to make you go through the whole circle in terms of data pipeline.

The goal is to make you comfortable with the new technologies used in data science. It is seems maybe overwhelming, but with just R, Markdown and Git, you will build your data pipeline. This pipeline will serve as the basis of any further analysis.

To draw these analyses, you will need to build your models on top of this data pipeline. In this book, we will stop right there. To be able to perform some exploratory data analysis and some modeling, please go to the other book titled Quantitative Methods in International Business with R.

This book uses research projects that I have been involved in through the years, using various data pipelines and models. I will use these datasets and databases as examples in the different applications you will encounter.

This book is designed to “get your hands dirty.” It is not a book designed to be read from cover to cover. It is designed around different modules, all complementary and building up on each other.