Peer critic of Visual analysis report of demographic patterns in Ohio, USA
In this take-home exercise, a submission by a classmate for take-home exercise 1 is selected and it will be critic will be raised in terms of clarity and aesthetics. A remake of the original design will be done by using the data visualisation principles and best practice learnt in Lessons 1 and 2.
For this exercise, I will be critiquing the exercise 1 submission of Heranshan Subramaniam
Before we get started, it is important for us to ensure that the required R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.
The chunk code below will do the trick.
packages = c('tidyverse', 'patchwork', 'ggthemes')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
The code chunk below imports Participants.csv from the data
folder into R by using the read_csv()
function of readr
and saves it as a tibble data frame called participants.
participants <- read_csv("data/Participants.csv")
The following code chunk was used by Subramaniam to change the data type of the variables in participants using the as.character() function.
participants$householdSize <- as.character(participants$householdSize)
The following code chunk as used by Subramaniam to order the education levels in participants using the factor() function. Order used will be from the least advanced to the most advanced qualification.
Subramaniam performed the data preparation well and was attentive to the variable formats and the order of categorical variables. This will be important when plotting the graphs later on.
The graph below was plotted by Subramaniam:
The changes made were the removal of the tick marks for the x-axis and the change in graph title. The new graph was generated with the code chunk below
The graph below was plotted by Subramaniam:
Similar to the previous graph, the verticle gridlines were removed and the title was changed. The mean points were modified in terms of colour and size. The code chunk is shown below:
ggplot(data=participants, aes(x=educationLevel, y=joviality)) +
geom_boxplot(notch=TRUE) +
stat_summary(geom = "point",
fun="mean",
colour="salmon1",
size=2) +
labs(y="Joviality", x="Education Level", title="How does Education Level Affect Joviality?") +
stat_summary(geom = "text",
aes(label=paste("mean=",round(..y..,3))),
fun.y="mean",
colour="salmon1",
size=3,
vjust = -2) +
theme(axis.ticks = element_blank(),
panel.background = element_rect(fill = "white"),
panel.grid.major = element_line(size = 0.5, linetype = 'solid',
colour = "grey"), panel.grid.major.x = element_blank())
The graph below was plotted by Subramaniam:
The colour scheme was changed to a continuous colour scheme from white to red and the frequency was recalculated with all participants included in the denominator. Lastly, frequency numbers were changed to 4 decimal points.
participants %>%
group_by(educationLevel, interestGroup) %>%
summarise(n = n()) %>%
mutate(freq = round(n / 1011,4)) %>%
ggplot(aes(y=interestGroup, x=educationLevel, fill=freq)) +
geom_tile() +
geom_text(aes(label = freq)) +
scale_fill_continuous(low = "white", high = "red") +
labs(y='Interest Group', x='Education Level', color ='Freq',
title = 'Education Level vs Interest Group') +
theme_minimal()