Importing and manipulating Data

First make sure there is a folder called data in your wd

Let us download the gapminder_data.csv dataset into our project

download.file(url="https://raw.githubusercontent.com/cambiotraining/reproducibility-training/master/data/gapminder_data.csv", destfile="data/gapminder_data.csv")

Start to load the dataset

# Load all libraries
library(tidyverse) # use for data manipulation and visualisation
library(kableExtra) # used for kbl()
library(rmarkdown) # used for paged_table function
library(ggpubr) #used for ggarrange function

# read file into R 
pop_data = read_csv("data/gapminder_data.csv")

# create a table with data from Euro countries in 2007 - ordered by life expectancy 
euro_data_tbl = pop_data %>% 
  # Filter by continent and year 
  filter(continent == "Europe" & year == 2007) %>% 
  # remove continent and year (since they are all the same)
  select(-continent, -year) %>% 
  # arrange by life expecntancy 
  arrange(desc(lifeExp)) %>% 
  # Rename columns names to how we want them 
  rename(Country = country, 
         "Population size" = pop, 
         "Life Expectancy" = lifeExp,
         "GDP" = gdpPercap)

 

Tables in R Markdown

We will use the kableExtra package to design some tables. The vignette for kableExtra can be found here

The results in euro_data_tbl are displayed in the table below:

euro_data_tbl %>% 
  kbl() %>% 
  kable_styling(bootstrap_options = c("striped", full_width = F)) %>% 
                  scroll_box(width = "100%", height = "200px" )
Country Population size Life Expectancy GDP
Iceland 301931 81.757 36180.789
Switzerland 7554661 81.701 37506.419
Spain 40448191 80.941 28821.064
Sweden 9031088 80.884 33859.748
France 61083916 80.657 30470.017
Italy 58147733 80.546 28569.720
Norway 4627926 80.196 49357.190
Austria 8199783 79.829 36126.493
Netherlands 16570613 79.762 36797.933
Greece 10706290 79.483 27538.412
Belgium 10392226 79.441 33692.605
United Kingdom 60776238 79.425 33203.261
Germany 82400996 79.406 32170.374
Finland 5238460 79.313 33207.084
Ireland 4109086 78.885 40675.996
Denmark 5468120 78.332 35278.419
Portugal 10642836 78.098 20509.648
Slovenia 2009245 77.926 25768.258
Czech Republic 10228744 76.486 22833.309
Albania 3600523 76.423 5937.030
Croatia 4493312 75.748 14619.223
Poland 38518241 75.563 15389.925
Bosnia and Herzegovina 4552198 74.852 7446.299
Slovak Republic 5447502 74.663 18678.314
Montenegro 684736 74.543 9253.896
Serbia 10150265 74.002 9786.535
Hungary 9956108 73.338 18008.944
Bulgaria 7322858 73.005 10680.793
Romania 22276056 72.476 10808.476
Turkey 71158647 71.777 8458.276

A better way to display long tables is using the function paged_table() in the rmarkdown library can be used to do this

paged_table(euro_data_tbl)

 

Adding images

Adding images is straightforward and does not require using a specific R Markdown function.

Create a new dataset euro_data_fig by filtering the pop_data tibble to contain only data from Europe. Draw a plot to display the lifeExp on the y axis and year on the x axis. Use geom_violin() to draw this as a violin plot to show the distrubution of the data across each year and save it in a 1euro_plot variable.

euro_plot = pop_data %>% 
  filter(continent == "Europe") %>% 
  select(-continent) %>% 
  # Add factor() to ensure year is treated as discrete 
  ggplot(aes(x = factor(year), y = lifeExp)) +
  # use geom_violin
  geom_violin() +
  # plot the median as a point on the violin 
  stat_summary(fun = median, geom = "point") +
  ylim(40, 85)

# display plot 
euro_plot

Create a new dataset uk_data_fig by filtering the pop_data tibble to contain only data from the United Kingdom. Draw a scatter plot to display the lifeExp on the y axis and year on the x axis and save it in a uk_plot variable. Draw the euro_plot created in the previous challenge next to a uk_plot using the ggarrange() function. Label the plots A and B respectively.

uk_plot = pop_data %>% 
  # filter for United Kingdom 
  filter(country == "United Kingdom") %>%
  # use mutate to convert year to factor 
  mutate(year = as_factor(year)) %>% 
  # Plot a scatterplot with geom_point
  ggplot(aes(x = year, y = lifeExp)) +
  geom_point() +
  ylim(40, 85)

# display plot 
uk_plot

ggarrage() is a function present in the ggpubr pacakge that can place two or more figures next to each other

# Use ggarrange to arrange the uk_plot and euro_plot 
ggarrange(uk_plot, euro_plot, ncol = 2, nrow = 1, labels = c("A", "B"))

Send to Github

First render files to publish files in Github using the follwoing commands

rmarkdown::render("inserting-code-in-rmarkdown.Rmd") rmarkdown::render("index.Rmd")

To download git first go to here

  1. Then click Tools > Version Control > Project set up Git
  2. If asked whether you want to initialise a new Git repository for this project, click Yes.
  3. If asked whether RStudio needs to be restarted, click Yes.
  4. Next we need to configure Git. Go to Tools > Shell. This should take you to the Terminal window in RStudio. Make sure you are first in the working directory of your project (use getwd() to check that). Then set the Git username and email using the following two commands:
git config --global user.name "yourGitHubUsername"
git config --gloabl user.email "name@provider.com"
  1. Next create a repository in GitHub.
  2. After you click on the Create repository button, a Quick setup page is displayed which provides code for different scenarios. Copy the first line of the “…or push an existing repository from the command line” section and run it in the RStudio terminal. The line of code should look something like this: git remote add origin https://github.com/Another-Goodman/R-Markdown-Basics.git
  3. From the Git tab in RStudio, select the files you would like to commit to GitHub.
  4. Press the Commit button, enter a commit message and then press the Commit button in the new pop-up commit window.