R for Spatial Statistics


Linear Regression

To get started, we'll load the "stats" library which contains a number of example data sets including a small table of speeds and distances for cars. Enter the code below to make sure the "stats" library is loaded and then plot speed against distance.


You can display data by just typing the name of the variable. You can also check the length of a vector or matrix with the "length()" function.


To create a linear model, you can use the "lm()" function. The code below specifies the column "dist" within the "cars" table and the column "speed". The tilde character (~) indicates that we want to regress speed as the predictor (independent) variable against distance as the response (dependent) variable. Note that the order is opposite of the "plot()" command.


We can then add the regression model as a straight line with "a" and "b" parameters with the "abline()" function.


Your plot should now appear similar to the one below.

You can print out the coefficients for the model by just entering the name of the model:

> RegressionModel Call: lm(formula = cars$dist ~ cars$speed) Coefficients: (Intercept) cars$speed -17.579 3.932

The output above shows the original call that was made and the intercept and slope of the line for th linear regression. The image below shows how the coefficients in R relate to the coefficients in a multiple linear equation.

Coeficients for lm

There is more information available if you use the summary() function. Take some time reviewing the values below with the definitions that follow as this is a typical output from a regression function in R and you'll be seeing a lot of them.

       lm(formula = cars$dist ~ cars$speed)
       Min      1Q  Median      3Q     Max 
       -29.069  -9.525  -2.272   9.215  43.201

       Estimate Std. Error t value Pr(>|t|) 
       (Intercept) -17.5791     6.7584  -2.601   0.0123 * 
       cars$speed    3.9324     0.4155   9.464 1.49e-12 ***
       Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 15.38 on 48 degrees of freedom
       Multiple R-squared:  0.6511,	Adjusted R-squared:  0.6438 
       F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12 

Call: Original function that was executed

Residuals: The min/max and quantiles for the distribution of the residuals, we would expect the "Median" to be near 0

Coefficients: The coefficients of the fitted function and how significant they are.

Residual standard error: standard deviation of the residuals. For a normal distribution, the 1st and 3rd quantiles should be 1.5 +/- the std error.

F-statistic: A test to see if a model with fewer parameters will be better

Warning: Remember that "explaining" variance is an indication of correlation and not necessarily causation.

Additional Resources

Easy to read explanation of lm in R