## Regression analysis

## Quadratic effects

A linear regression model was fit to predict how much money a customer would spend at an online retailer (in CAN$) based on the amount of time they were browsing the website (ranging between 1 and 100 minutes) along with the quadratic term for time. The coefficients table from the R output is:

Estimate | SE | t-value | Pr(>|t|) | ||
---|---|---|---|---|---|

Intercept | 6.902146 | 0.900384 | 7.666 | 9.41e-14 | *** |

time | 1.598175 | 0.122716 | 13.023 | <2e-16 | *** |

I(time^2) | -0.008558 | 0.002852 | -3.001 | 0.00282 | ** |

## Binary & categorical independent variables

The main selling price of a sample of condos in Montreal was calculated to be $740,000 while the mean selling price of single family homes was calculated to the $975,000. If a regression mode was fit to predict selling price of a home based on a binary predictor for whether it was a condo ( x = 1 represents the condominium group).

## Interactions

A regression model was fit to predict selling price of condos and single family homes in Montreal from \(x_1 = house~size\), \(x_2 =\) a binary independent variable for whether a home is a single family home (\(x_2 = 1\) for single family homes) and the interaction between the two. the estimated regression model is given below:

\[\hat y = 428 + 0.286 x_1 + 104 x_2 - 0.140 (x_1 \times x_2)\]

The regression model from the previous part is repeated here: to predict selling price of condominiums and single-family homes in Cambridge from x1 = house size, x2 = a binary predictor for whether a home is a single family home (x2 = 1 for single family homes), and the interaction between the two. The estimated regression model is given below:

\[\hat y = 428 + 0.286 x_1 + 104 x_2 - 0.140(x_1 \times x_2)\]

## Comparing models

Two regression models are to be considered:

Model 1: \(y = \beta_0 + \beta_1 x_1\)

Model 2: \(y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3\)

Two regression models are to be considered:

Model 1: \(y = \beta_0 + \beta_1 x_1\)

Model 2: \(y = \beta_0 + \beta_1 x_2 + \beta_2 x_3\)

## Automatic Model Selection

## Model diagnostics

You’d like to determine whether the normal distribution assumption is reasonable for a simple linear regression model.

Let us fit a linear regression:

```
re = read.csv("Real_Estate_Sample.csv")
lm1 = lm(Price ~ year, data=re)
```

The following histogram was produced after fitting the previous simple linear regression:

```
re = read.csv("Real_Estate_Sample.csv")
lm1 = lm(Price ~ year, data=re)
```

`hist(lm1$residuals,col="gray")`

The following boxplot was produced after fitting a simple linear regression of Y on X:

`boxplot(lm1$residuals,col="gray")`

The following residual-versus-predicted scatterplot was produced after fitting a simple linear regression of Y on X:

```
plot(lm1$residuals~lm1$fitted,cex=0.7)
abline(h=0,lwd=2)
```