Session 5

Hans Rosling

Hans Rosling Ted 2006

Automated Intelligence

Machine Learning

MIT : Machine learning has been used to automatically translate long-lost languages

A Machine Learning Approach to Automated Customer Satisfaction Surveys

Algorithms

101 Machine Learning Algorithms

Augmented Intelligence

Big Data

Data Flow

Structured Data and Unstructured Data

According to Oracle and IDC, a provider of market intelligence and advisory services, unstructured data accounts for almost 80% of total enterprise data and is growing 42% p.a. versus just 22% growth in structured data.

Understanding Data Bias

Cat or Guacamole?

AI and Ethics

Déclaration de Montréal

Objectives

At the end of this session, you should be able to:

  1. Rename and delete columns
  2. Make new variables
  3. Reshape and subset data
  4. Handle missing values
  5. Combine tables
  6. Aggregate data
  7. Adjust the case of strings
  8. Replace words, letters, numbers or specifics characters
  9. Extract strings
  10. Clean a dataset

Plan

Plan

  • 5.1 Data Wrangling
  • 5.2 Combining Tables
  • 5.3 Computing Summary Statistics
  • 5.4 Converting Text Case
  • 5.5 Pattern Matching and Replacement
  • 5.6 Extracting Substrings
  • 5.7 Mission Impossible : Clean a dataset

TL;DR

Syntax

Rename columns

Deleting columns and rows

Counting columns and rows

Making new variables

Handling Missing Values

Reshaping data

Subsetting data

Mission Impossible

Your mission, should you choose to accept it…

… is to clean a dataset on SKEMA Quantum Studio!

You must clean the mi2 csv and combine the locations data set to it. Make sure that you follow the steps explained in the Rmd exercice file .

Pay attention, every little detail is important!

We recommend you to review your courses on the Virtual Campus.