Problems

Hone your R skills by doing problems.

The assignment is due next Friday (24 November) and you are encouraged to work in group and to hand in a single copy for the group.

1 Nanaimo housing

Table ‘http://latul.be/mba_565/data/nanaimo.csv’ gives the size of the floor area (area) and the price (price), for houses listed in the Nanaimo area in 2015. Load these data into a data frame.

1.1 Create an area (use the ‘area’ variable) histogram.

nanaimo <- read.csv('http://latul.be/mba_565/data/nanaimo.csv',
    stringsAsFactors=FALSE)
library(ggplot2)
g <- ggplot(subset(nanaimo, area>0), aes(area))
g + geom_histogram(colour = 'black')

1.2 Break out the histogram of house area by type.

    # A colour blind colour palette
    g <- ggplot(subset(nanaimo, area>0), aes(area, fill=type))
    g + geom_histogram(colour = 'black')+
        scale_fill_brewer(palette="Spectral")

1.3 Create a histogram of house area and use zoning to colour the histogram bars.

    # A colour blind colour palette
    g <- ggplot(subset(nanaimo, area>0), aes(area, fill=zoning))
    g + geom_histogram(colour = 'black')+
        scale_fill_brewer(palette="Spectral")

1.4 Create a scatterplot (geom_point) of house area vs. land area, and colour the points by type.

    # A colour blind colour palette
    g <- ggplot(subset(nanaimo, area>0 & landarea<1), 
        aes(x=area, y = landarea, colour=type))
    g + geom_point(size=4, alpha=.8) + #scale_color_brewer(palette="YIOrRd")
        scale_color_manual(values=wesanderson::wes_palette(n=4, 
            name="Royal1"))

2 Temperature in Saudi Arabia

Scrape the data table from Wiki and create a data frame. Clean up the data, e.g. rename variables.

url <- "https://en.wikipedia.org/wiki/Climate_of_Saudi_Arabia"
sa_page <- RCurl::getURL(url)
sa_table <- XML::readHTMLTable(sa_page, which = 1,
    stringsAsFactors = FALSE)
    
# Minus sign in Unicode is written as \U2212
sa_table[6,] <- gsub("\\\U2212","-",sa_table[6,])
month <- as.character(sa_table[1, 2:13])

# Create a data frame
sa <- data.frame(
    month = factor(month, levels = month),
    high = as.numeric(gsub("\\\n.*", "", sa_table[2, 2:13])),
    low  = as.numeric(gsub("\\\n.*", "", sa_table[6, 2:13])),
    prec  = as.numeric(gsub("\\\n.*", "", sa_table[7, 2:13]))
    )

2.1 Create a Cleveland Dot Plot to show Record High by Month.

library(ggplot2)
ggplot(sa, aes(high, month)) +
        geom_point( colour = 'red', size = 3) +
        theme(panel.grid.major = element_blank()
            )

2.2 Put Record High on x-axis.

Order Month according to the value of Record High, size the points according to precipitation.

library(ggplot2)
sa$month <- factor(sa$month, levels = levels(sa$month)[order(sa$high)])
ggplot(sa[order(-sa$high),], aes(high, month, size = prec)) +
        geom_point( colour = 'red') +
        theme(panel.grid.major = element_blank()
            )

2.3 Ensure the axis labels are human readable.

2.4 Colour the points according Record Low.

library(ggplot2)
ggplot(sa[order(-sa$high),], aes(high, month, 
            size = prec, colour = low)) +
        geom_point() +
        theme(panel.grid.major = element_blank()
            )