R for Spatial Statistics

 

Writing Scripts

You're going to get tired of writing the same commands over and over again in the console pretty quickly. Fortunately, you can write scripts in RStudio and then execute all or part of a script very easily.

Below, you'll learn to use the basic R commands that will allow you to work with individual values (or "scalar" values) and data that is stored in vectors and data tables. Data tables are key to spatial analysis as this is how we work with point datasets in R.

The Script Panel

In RStudio, go to the "File" menu and select "New" and then "Script". You'll see a new, blank, script appear.

To start, enter some of the code we have been working with into the script like the following:

x=12
y=34
z=x*y

As you type the code, it is not executed right away. Select the lines and then press "Run" at the top of the panel. All three lines of the script should run and the output will appear in the console. In this way, you can select just the lines of code you are working on and execute them. If there is a problem, you can change the line with the problem and run it again. From now on, you'll want to work almost entirely with scripts.

Take a look at the "Workspace" panel in the upper left of RStudio. This panel shows the objects that are currently defined in RStudio. You can select "Clear All" to remove all the objects.

In the lower right of RStudio is the panel that will show your plots. You can see the different plots you've generated by selecting the back and forward arrows. You can also clear all your plots by clicking "Clear All".

You can also clear the Console panel by typing <ctrl> L while the panel is selected.

Note: When you close RStudio, it will save your "workspace" including all your open files, even if you have not saved them before.

Print

You've already seen that you can just send R the name of an object and it will print it to the console. You can also print something to the console with the "print()" command. This will come in handy later. Try writing some code like the following and execute it.

x=100
print(x)

Comments

It's just as important to add comments to R code as any other software. If you add a pound sign (#) to a line, everything after the # is ignored and can be a comment.

x=12 # this is a comment

You can also use # to add header blocks to your code. Each script should start with a header block with; the author (you), the date it was writing, and a description of what the code does.

###############################################
# Script to do some really neat stuff
# Author: Jim Graham
# Date: 4/16/2013
###############################################

Vector

As you've seen before, a vector in R is a linear sequence of values. These values can be integers, floating point, strings, or other vectors. A matrix adds dimensions to a vector to allow it to be used as a multidimensional vector. A list is a vector that contains other vectors.

Note: In R, the basic data type is called a "Vector". This can be a number of different things and in most languages would be called an "Array". You will see vectors called arrays from time to time.

TheVector=1:12 # create a vector with entries 1,2,3,4...12
TheVector=c(12,4,3,1.23) # create a vector with the entries shown

You can access the values in a vector using brackets ("[]"). Note that the first element in a vector is indexed with "1", not zero.

x=TheVector[1] # get the first value in the vector 

SubVector=TheVector[1:2] # get two entries from the vector, starting at 1

NewVector=TheVector[-1] # get a new vector with all the entries from "TheVector" except the first one

You can "concatenate" multiple vectors and scalar values together to make new vectors.

TheVector3 = c(TheVector1,0,TheVector2) # Concatenates the vectors together into a new vector

There are a number of functions that operate on vectors.

NumEntries=length(TheVector) # return the number of entries in TheVector

print(summary(TheVector)) # provide a summary, including min, max, and mean, for a vector

The "summary(...)" function is very handy and works on most data types in R to provide detailed information on an object.

Subset vectors based on conditions

You can subset a vector by first creating a vector with Boolean values (TRUE or FALSE) and then using that vector to sample the origina vector. The code below creates a Boolean vector from the condition "Xs>40" and then uses that vector to sample the Xs vector and pull out the values that are only above 40.

BooleanVector=Xs>40 # creates a vector with TRUE and FALSE based on the condition

Test=Xs[BooleanVector,] # subsets the rows based on the BooleanVector

Matrices

A matrix in R is a vector that has a dimension added to it.

TheDimension=c(3,4) # create a dimension for the matrix           
TheMatrix=array(TheVector,TheDimension) # create a new matrix variable with the vector and the dimensions

You can also add a dimension to an existing vector as follows:

dim(TheVector) = c(3,4,2)

Matrices can use the same arithmetic operators as scalar values and vectors. Matrices also have special functions (see the R reference for more information).

Lists

Lists in R contain a sequence of elements but those elements can be of different types, including vectors. The following code shows how to create a list that has vectors as it's elements.

Vector1=c(2,3,6,3)
Vector2=c("Dorthy","Scarecrow","Lion","Toto")
TheList=list(Vector1,Vector2)

Data Frames

Data frames in R are tables with columns, rows, and names for each column. R implements data frames as a list where each entry in the list is a column in the table. You can create data frames directly from text files and from R data files.

HousePrice = read.table("houses.data") 
Test=read.table("C:/Temp/Table.txt", header=T) # tabular data with a header
TheData = read.csv('C:/ProjectsR/Clustering/TwoClusters.csv')

We use data frames extensively for spatial analysis in R as this is how we load tables of points, typically from CSV files, into R.

Once you load a data frame into R, you can access the columns of the data frame using the dollar sign ("$") symbol.

MaxValue=max(TheDataTable$Elev) # return the maximum value from the column "Elev"

Definitions

As a reminder, the following definitions are provided to show the relationship between vectors, matrices, lists, and data frames.

Vector: n elements in a linear set

Matrix: a Vector with dimensions added

List: a Vector that contains vectors as it's elements

Data Frame: a List where all the vectors (columns) must have the same length and the columns and rows must have names.

Factors

Categorical data

Note: This section is incomplete.

Special Numbers

There are special numbers available in R. "NA" is used as "NULL" in most other languages and means a numeric value is "not available". This could be the result of a mathematic function where the result is not defined.

NA # not available
is.na(x) # tests if a value is NA 

Numeric values can also be "infinite".

Inf # infinite

Mode (Type) Conversion

There are a large number of object types available in the R base package. The "as" functions create new objects and convert between object types. Below are a few examples:

x = as.character(123)
x = as.interger("123")
x = as.double(123)

Additional Resources

R Base Package - all the functions in the base package

R Tutor: Lists

Debugging With RStudio - RStudio has a rather unique debugger but this is a good tutorial for it.