Tutorials

Learn how to use R.

Install ggplot2


First download and install the ggplot2 package.

install.packages( 'ggplot2' )
library( 'ggplot2' )

Here is a useful cheatsheet for those of you wanting to dig deeper into ggplot2.

Cheat-sheet

R base graphics


plot(), and barplot() will create an initial plot.

Then lines(), text(), and legend() will write on top of your existing plot.

Histogram example

Load airline data and have a look at the distribution of delays.

airline <- read.csv( 'http://latul.be/mba_565/data/airline.csv' )
hist(airline$DepDelay[])

Make the histogram look better by truncating the data and adding a title and labels.

index <- airline$DepDelay<60
hist(airline$DepDelay[index], main = "Departure Delay",
        xlab="minutes of delay")

Quickly change the default base with ggplot.

index <- airline$DepDelay<60
g1 <- ggplot(airline[index,], aes(x = DepDelay))
g1 + geom_histogram(stat = "bin", binwidth=1)

Over 4000 fligths have a delay of zero.

Relation between departure delay and arrival delay

g1 = ggplot( subset(airline[index,], UniqueCarrier == "AA"),
       aes(y = DepDelay, x = ArrDelay))
g1 + geom_point()

We can use text instead of points.

g1 + geom_text(aes(label=Origin), size = 3)

An example using airline data

plot(), and barplot() will create an initial plot. Then abline(), lines(), text(), and legend() will write on top of your existing plot. Try the following.

plot( airline$Distance, airline$AirTime )
abline( a = 50, b = 1/10, col = 2 )

Same thing with ggplot2. Notice how we use the ‘+’ to add elements (layers) to the initial plot. That is ggplot2 syntax.

ggplot( data = airline ) +
    geom_point( mapping = aes( x = Distance, 
        y = AirTime ) ) +
    geom_abline( intercept = 50, slope = 1/10, color = 'red' )

  • Pie charts
  • Bar graphs
  • Comparison bar graphs
  • Box plots
  • Kernel density estimates (to show skew)
  • Preview: scatterplots

Use ggplot


  • ggplot() creates a new plot object.
  • geom_point() adds a layer of points to the initial object.
  • aes() is a function to define the aesthetics.

Aesthetics


Continuous aesthetics

Here we look at the effect of departure delay on airtime. Note that you can substitute size ='' forcolor =‘’, shape ='', oralpha =’’, where alpha is a measure of transparency.

index <- airline$AirTime >= 0 
index2 <- airline$ArrDelay >= 60 & airline$ArrDelay <= 300 
ggplot( data = airline[index & index2,] ) + 
    geom_point( mapping = aes( x = Distance, 
        y = AirTime, 
        size = ArrDelay ) )

Fitted Line


We can draw the mean estimate for a given DepDelay, along with its confidence interval. We use geom_smooth.

index <- airline$ArrDelay > 0 
ggplot( data = airline[index,] ) + 
    geom_smooth( mapping = aes( x = DepDelay, 
        y = ArrDelay ) )

Fitted Line and Scatterplot

Note that the more observation we have, the narrower our confidence interval will be. In addition, you can overlay the actual data points with geom_point

index <- airline$UniqueCarrier == "OO" 
ggplot( data = airline[index,] ) + 
    geom_smooth( mapping = aes( x = DepDelay, 
        y = ArrDelay ) ) +
    geom_point( mapping = aes( x = DepDelay, 
        y = ArrDelay ) )