HEC Montréal - MATH60033A.H2021

Thierry Warin (HEC Montréal & Cirano (Montréal))



School’s regulations

Pedagogical guidelines for learning assessments

Exams: check them out before the big day (when not online! 1. The validity of your student card. To know more about it… 2. 2. The exam schedule and room (in HEC online) 3. The documentation authorized for the exam (on my ZoneCours site) 4. The conformity of my calculator.

Research Project


This research project is designed to make you get ready for your supervised project or your research project during these years at the Msc. And you do in a team of 3 people. So, it is a good training before your solitary venture into your own supervised or research project.

Of course, since we look everything with a data science perspective, this research project has to be evidenced-based, ie based on data and well-analyzed data (= well modeled).

The idea of doing research is profoundly a humbling exercise. It is about the researcher recognizing that s.he knows very little. As a researcher, we need to look at the previous contributions. We do not start a research project based on an intuition per se. We start by looking at the the results other researchers have produced. And we wonder what could be the gaps and what could be our contribution as someone who knows data science. What new data can I bring to serve as a better proxy? What new models can I build to refine the results. Footnote: refining means very often adding complexity to the analysis.

Once we know how we can contribute to the literature, we formulate our research question and methodological design.

At this stage, we take the previous results from the literature review as the relevant hypotheses for our own modeling. These hypotheses will be challenged by our data and models, and we will produce new results. These new results will thus serve as hypotheses for other researchers.

Research is thus a continuum of trials and errors, validation and invalidation of previous theories. According to Karl Popper, this is the essence of science.

I expect you to follow this process. I am not expecting you to produce a Nobel Prize paper, let the pressure down. If you produce a Nobel Prize paper, please do it, do not refrain yourself though.

A research paper is organized as follows:

  1. Introduction: research question and context
  2. Literature review
  3. Exploratory Data Analysis
  4. Models
  5. Results and implications (policies, strategies, etc.)
  6. Conclusion
  7. References

This research project has to be done with the tools we provide you. Otherwise, it is not fun.


To help you deconstruct your research project into a lean project management, here is a schedule of what you should be aiming at:

  1. Research question an research context: session 4 (Please fill out this form: [here])
  2. First draft of outline of your literature review: session 6 (by this week, you should know who does what in the literature)
  3. EDA: session 8 (by this week, you should have identified your data and work on them a bit to make sure they will be useful for your project)
  4. Models: session 10 (by this week, you should have tried to create a model for your research question)
  5. Final paper: session 12

The research paper length should be a maximum of 5,000 words including the references. For your report, within RStudio, click on the edit menu and go to “wordcount” to get the… word count.

Where can I find research questions feasible in R?

What you are learning in the course

You are learning how to use R for quantitative-based research. Quantitative-based research means that on the “quantitative-based” side, we need to have a quick refresher of your statistics knowledge. When it comes to the “research” side of the phrase, we need to develop some skills into study design, i.e. what is a model and how to create one.

You are also learning how to put all this knowledge into a Markdown document, in order to have an ultimate programmatic approach, and not software-based approach. Why is it important? Because we love the principles of reproducible research for one, and for two because it is in fact great to benefit from economies of scale in your workflow (I think now you know what I mean by that).

In being a modeler, you need to particularly pay attention to your research question (which means the literature review as well) and how you can test it.

Hypothesis testing

A hypothesis is a tentative statement that proposes a possible explanation to some phenomenon or event. A useful hypothesis is a testable statement, which may include a prediction. This is why in statistics we form H0 and H1. The most important between both is H0, as H1 is phrased as “we cannot reject H0”. It is not only a philosophical position, it means we cannot prove H1, just think about it for a minute. You will see how big it is ;-).

A hypothesis should not be confused with a theory. Theories are general explanations based on a large amount of data. For example, the theory of multinational corporations’ internationalization is comprehensive and is based on a wide range of observations. However, there are many things about this theory that are not fully understood such as gaps in some propositions. Many hypotheses have been proposed and tested.

The key word is testable. That is, you will perform a test of how two variables might be related. This is when you are doing a real experiment. You are testing variables. Usually, a hypothesis is based on some previous observation such as noticing potential correlations. Are these two events connected? How?


The first sessions of the course (sessions #1 to #3) insists a lot on the **magic in the data"" and I think you realized that it can have a nice and a dark side. So think about the data you need and the potential biases to test your model.

Competitions to consider

Your research paper should definitely be considered for student competitions.