First make sure there is a folder called data in your wd
Let us download the gapminder_data.csv
dataset into our project
download.file(url="https://raw.githubusercontent.com/cambiotraining/reproducibility-training/master/data/gapminder_data.csv", destfile="data/gapminder_data.csv")
Start to load the dataset
# Load all libraries
library(tidyverse) # use for data manipulation and visualisation
library(kableExtra) # used for kbl()
library(rmarkdown) # used for paged_table function
library(ggpubr) #used for ggarrange function
# read file into R
pop_data = read_csv("data/gapminder_data.csv")
# create a table with data from Euro countries in 2007 - ordered by life expectancy
euro_data_tbl = pop_data %>%
# Filter by continent and year
filter(continent == "Europe" & year == 2007) %>%
# remove continent and year (since they are all the same)
select(-continent, -year) %>%
# arrange by life expecntancy
arrange(desc(lifeExp)) %>%
# Rename columns names to how we want them
rename(Country = country,
"Population size" = pop,
"Life Expectancy" = lifeExp,
"GDP" = gdpPercap)
We will use the kableExtra
package to design some tables. The vignette for kableExtra can be found here
The results in euro_data_tbl are displayed in the table below:
euro_data_tbl %>%
kbl() %>%
kable_styling(bootstrap_options = c("striped", full_width = F)) %>%
scroll_box(width = "100%", height = "200px" )
Country | Population size | Life Expectancy | GDP |
---|---|---|---|
Iceland | 301931 | 81.757 | 36180.789 |
Switzerland | 7554661 | 81.701 | 37506.419 |
Spain | 40448191 | 80.941 | 28821.064 |
Sweden | 9031088 | 80.884 | 33859.748 |
France | 61083916 | 80.657 | 30470.017 |
Italy | 58147733 | 80.546 | 28569.720 |
Norway | 4627926 | 80.196 | 49357.190 |
Austria | 8199783 | 79.829 | 36126.493 |
Netherlands | 16570613 | 79.762 | 36797.933 |
Greece | 10706290 | 79.483 | 27538.412 |
Belgium | 10392226 | 79.441 | 33692.605 |
United Kingdom | 60776238 | 79.425 | 33203.261 |
Germany | 82400996 | 79.406 | 32170.374 |
Finland | 5238460 | 79.313 | 33207.084 |
Ireland | 4109086 | 78.885 | 40675.996 |
Denmark | 5468120 | 78.332 | 35278.419 |
Portugal | 10642836 | 78.098 | 20509.648 |
Slovenia | 2009245 | 77.926 | 25768.258 |
Czech Republic | 10228744 | 76.486 | 22833.309 |
Albania | 3600523 | 76.423 | 5937.030 |
Croatia | 4493312 | 75.748 | 14619.223 |
Poland | 38518241 | 75.563 | 15389.925 |
Bosnia and Herzegovina | 4552198 | 74.852 | 7446.299 |
Slovak Republic | 5447502 | 74.663 | 18678.314 |
Montenegro | 684736 | 74.543 | 9253.896 |
Serbia | 10150265 | 74.002 | 9786.535 |
Hungary | 9956108 | 73.338 | 18008.944 |
Bulgaria | 7322858 | 73.005 | 10680.793 |
Romania | 22276056 | 72.476 | 10808.476 |
Turkey | 71158647 | 71.777 | 8458.276 |
A better way to display long tables is using the function paged_table()
in the rmarkdown
library can be used to do this
paged_table(euro_data_tbl)
Adding images is straightforward and does not require using a specific R Markdown function.
Create a new dataset euro_data_fig
by filtering the pop_data
tibble to contain only data from Europe. Draw a plot to display the lifeExp
on the y axis and year
on the x axis. Use geom_violin()
to draw this as a violin plot to show the distrubution of the data across each year and save it in a 1euro_plot
variable.
euro_plot = pop_data %>%
filter(continent == "Europe") %>%
select(-continent) %>%
# Add factor() to ensure year is treated as discrete
ggplot(aes(x = factor(year), y = lifeExp)) +
# use geom_violin
geom_violin() +
# plot the median as a point on the violin
stat_summary(fun = median, geom = "point") +
ylim(40, 85)
# display plot
euro_plot
Create a new dataset uk_data_fig
by filtering the pop_data
tibble to contain only data from the United Kingdom. Draw a scatter plot to display the lifeExp
on the y axis and year
on the x axis and save it in a uk_plot
variable. Draw the euro_plot
created in the previous challenge next to a uk_plot
using the ggarrange()
function. Label the plots A and B respectively.
uk_plot = pop_data %>%
# filter for United Kingdom
filter(country == "United Kingdom") %>%
# use mutate to convert year to factor
mutate(year = as_factor(year)) %>%
# Plot a scatterplot with geom_point
ggplot(aes(x = year, y = lifeExp)) +
geom_point() +
ylim(40, 85)
# display plot
uk_plot
ggarrage() is a function present in the ggpubr pacakge that can place two or more figures next to each other
# Use ggarrange to arrange the uk_plot and euro_plot
ggarrange(uk_plot, euro_plot, ncol = 2, nrow = 1, labels = c("A", "B"))
First render files to publish files in Github using the follwoing commands
rmarkdown::render("inserting-code-in-rmarkdown.Rmd")
rmarkdown::render("index.Rmd")
To download git first go to here
git config --global user.name "yourGitHubUsername"
git config --gloabl user.email "name@provider.com"
git remote add origin https://github.com/Another-Goodman/R-Markdown-Basics.git