Problems

Hone your R skills by doing problems.

Please attempt all questions.

The assignment is due Friday 10 November and you are encouraged to work in group and to hand in a single copy for the group.

1 Scrape the data table from Monorail Systems and create a data frame. Clean up the data, e.g. rename variables.


1.1 Clean the data

Make reasonable column names and convert the columns into the correct data types. Name the resulting data frame ‘rail’. Show me part of the results with head(rail).

1.2 Create a new column.

Add a column to the data frame that records the relative frequency of station per country.

1.3 Stations and outliers.

Create a plot of total number of station per year. Do you think any year is an outlier? Remove the outlier(s) and recreate the plot.

2 Load American presidents inaugural addresses from the quanteda package.

president <- quanteda::data_corpus_inaugural

2.1 Use key words in context (KWIC), and return the context around ‘cotton’.

Report your results. Do you notice any pattern?

2.2 What are the five most frequent words in the speeches?

Are they stopwords? Punctuation?

2.3 Any president has the word ‘cesspools’ in one of its speech?

Which one? What is the context?

2.4 Which president is the most concerned with foreign policy? Tell me why.

2.5 Compare the word cloud from Trump’s speech with the one from Obama’s two speeches.