04-Linear Regression Code Demo

Author

Dr. Cheng-Han Yu

R implementation

Show/Hide
advertising_data <- read.csv("../data/Advertising.csv")
advertising_data <- advertising_data[, 2:5]
head(advertising_data)
     TV radio newspaper sales
1 230.1  37.8      69.2  22.1
2  44.5  39.3      45.1  10.4
3  17.2  45.9      69.3   9.3
4 151.5  41.3      58.5  18.5
5 180.8  10.8      58.4  12.9
6   8.7  48.9      75.0   7.2
Show/Hide
lm_out <- lm(advertising_data$sales ~ ., data = advertising_data)
summary(lm_out)

Call:
lm(formula = advertising_data$sales ~ ., data = advertising_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.8277 -0.8908  0.2418  1.1893  2.8292 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.938889   0.311908   9.422   <2e-16 ***
TV           0.045765   0.001395  32.809   <2e-16 ***
radio        0.188530   0.008611  21.893   <2e-16 ***
newspaper   -0.001037   0.005871  -0.177     0.86    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.686 on 196 degrees of freedom
Multiple R-squared:  0.8972,    Adjusted R-squared:  0.8956 
F-statistic: 570.3 on 3 and 196 DF,  p-value: < 2.2e-16
Show/Hide
confint(lm_out)
                  2.5 %     97.5 %
(Intercept)  2.32376228 3.55401646
TV           0.04301371 0.04851558
radio        0.17154745 0.20551259
newspaper   -0.01261595 0.01054097

Python implementation

Show/Hide
import pandas as pd
import numpy as np
Show/Hide
advertising_data = pd.read_csv("../data/Advertising.csv")
advertising_data = advertising_data.iloc[:, 1:5]
X = advertising_data.drop(columns=["sales"])
y = advertising_data["sales"]

scikit-learn

Show/Hide
from sklearn.linear_model import LinearRegression
reg = LinearRegression().fit(X, y)
reg.intercept_
2.9388893694594014
Show/Hide
reg.coef_
array([ 0.04576465,  0.18853002, -0.00103749])

statsmodels

Show/Hide
from statsmodels.formula.api import ols
ols_out = ols(formula='sales ~ TV + radio + newspaper', data=advertising_data).fit()
ols_out.params
Intercept    2.938889
TV           0.045765
radio        0.188530
newspaper   -0.001037
dtype: float64
Show/Hide
print(ols_out.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  sales   R-squared:                       0.897
Model:                            OLS   Adj. R-squared:                  0.896
Method:                 Least Squares   F-statistic:                     570.3
Date:                Mon, 13 Jan 2025   Prob (F-statistic):           1.58e-96
Time:                        11:56:31   Log-Likelihood:                -386.18
No. Observations:                 200   AIC:                             780.4
Df Residuals:                     196   BIC:                             793.6
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      2.9389      0.312      9.422      0.000       2.324       3.554
TV             0.0458      0.001     32.809      0.000       0.043       0.049
radio          0.1885      0.009     21.893      0.000       0.172       0.206
newspaper     -0.0010      0.006     -0.177      0.860      -0.013       0.011
==============================================================================
Omnibus:                       60.414   Durbin-Watson:                   2.084
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              151.241
Skew:                          -1.327   Prob(JB):                     1.44e-33
Kurtosis:                       6.332   Cond. No.                         454.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Show/Hide
coef_summary = ols_out.summary2().tables[1]  # Get the coefficients table
print(coef_summary)
              Coef.  Std.Err.          t         P>|t|    [0.025    0.975]
Intercept  2.938889  0.311908   9.422288  1.267295e-17  2.323762  3.554016
TV         0.045765  0.001395  32.808624  1.509960e-81  0.043014  0.048516
radio      0.188530  0.008611  21.893496  1.505339e-54  0.171547  0.205513
newspaper -0.001037  0.005871  -0.176715  8.599151e-01 -0.012616  0.010541
Show/Hide
conf_intervals = ols_out.conf_int()
print(conf_intervals)
                  0         1
Intercept  2.323762  3.554016
TV         0.043014  0.048516
radio      0.171547  0.205513
newspaper -0.012616  0.010541