This lab will introduce you to modeling continuous response variables with Generalized Additive Models or "GAMs". GAMs allow a level of flexibility that is not available with linear models and more control than polynomial models.
Refer to the "R for Spatial Stats" web site for information on running GAMs for this lab. You'll also want to review the sections on loops and sub-sampling data.
Note that you will need the mgcv library for this lab.
Below is some code that will generate a polynomial up to the 4th order with optional noise injection.
##################################################################### # Function to Create data from a polynomial up to order 4. # # Inputs: # NumEntries - The number of y-values to create # RandomStdDev - The standard of deviation for the random component # B0, B1, B2, B3, B4 - coefificents for the polynomial # # Outputs: # A DataFrame with a column for the Xs and another column for the "Measures" # # By: Jim Graham # February 23rd, 2014 ##################################################################### Poly2D=function(NumEntries=100,RandomStdDev=0,B0=0,B1=0,B2=0,B3=0,B4=0) { Xs=as.vector(array(1:NumEntries)) # Create the independent variable Measures=as.vector(array(1:NumEntries)) # create the response variable for (Index in 1:NumEntries) { X=Xs[Index] Measures[Index]=B0+B1*X+B2*(X**2)+B3*(X**3)+B4*(X**4) # compute the response if (RandomStdDev>0) # if a random value was provided { # add some raondomness Measures[Index]= Measures[Index]+rnorm(1, 0, sd = RandomStdDev); } } # create the data frame TheDataFrame = data.frame(Xs, Measures) return(TheDataFrame) }
Use the code above to create a series of polynomials with different levels of noise as below. Then, create a GAM model as below and examine the model and the AIC values.
TheData=Poly2D(100,100,10,10,-.5,+0.004) # create some data plot(TheData) # take a look at the data TheModel=gam(Measures~s(Xs),data=TheData,gamma=1.4) # Recommended smoothing factor plot(TheModel) # plot the model (how does it compare with the data?) summary(TheModel) AIC(TheModel)
Now, change the polynomial coeficients and the amount of randomness to see how well a GAM can model the effect in the data.
Your results will appear something like the following:
Family: gaussian Link function: identity Formula: Measures ~ s(Xs) Parametric coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -157.453 1.017 -154.8 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Approximate significance of smooth terms: edf Ref.df F p-value s(Xs) 8.724 8.978 2635 <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 R-sq.(adj) = 0.996 Deviance explained = 99.6% GCV = 125.22 Scale est. = 103.51 n = 100
The coefficnets are similar to a linear model as the spline is used to linearize the function.
Deviance explained is the proportion of the deviance for this model divided by the total possible deviance where deviance is the difference between the likelhood of a model that fits the data perfectly minus the liklihood of this model. See Generalized Additive Models: an introduction with R by Wood for more information.
What we are really interested in is AIC which is obtained with the AIC(TheModel) function call.
Here is a zip file that contains the Doug-Fir data and R scripts with GAMs for one and two covariate models and associated graphs.
Using a data set with a continuous response varible (the covariates can be continuous or categorical), create a number of GAM models with different predictor layers and determine the "best" model. Try adjusting the "gamma" value to set the smoothing to a level you are comfortable with.
© Copyright 2018 HSU - All rights reserved.