Hmw I : Tables and visualization

Published

January 26, 2026

Important
  • Due : March 7, 2026
  • Work in pairs
  • Deliver your work as qmd files through a github repository (your github repository should be organized and named accoring to the rules)
  • Use the quarto package for reproducible research (see Projects)
  • I should be able to render the report and presentation at least in HTML format

Objectives

This homework is dedicated to table wrangling and visualization. You will choose data from public sources, for example:

Using data gathered from those public sources, build dataframes with a gapminder-like schema (territorial units/timestamps as keys, statistics for other columns). Using ggplot2 and plotly, design plots and animations like the [Rossling address to BBC](). Display tables using package gt (or any other nice competing tool)

Your data and graphical pipelines should be designed so that they can be at least partially reused to perform gapminder-like animations with other kind of data gathered from public sources. For example one, could replace life expectancy at birth by life expectancy at 60, infant mortality, or any other health index. One could replace GDP per capita with social expenditure per capita, or something like that.

Any reader should be able to retrieve the data and metadata from the public sources using your pipeline. This is necessary if your study aims at checkability and reproducibility.

Your deliverable shall consist in a qmd file (for report, for presentation) that can be rendered in HTML format.

You shall describe the downloaded data.

Plots shall be endowed with titles, legends and captions. Plot comments should not boil down to AI generated trivialities.

Data pipelines and graphical pipelines shall be given in an appendix.

Data

Data have to be downloaded from https://data.oecd.org (the old site https://stats.oecd.org which could be drilled using package OECD is shutting down).

Use OECD Data explorer to find the data you need. Follow the guidleines given in the OECD Data Explorer API.

Once you have decided which data you need, you shall build an extraction pipeline using httr and rvest to download the data or tools from OECD package.

Table wrangling should be performed using tidyverse tools.

Your extraction pipeline shall be reproducible and shall be given in an appendix.

Take advantage of the fact sites like OECD, EUROSTAT, … use SDMX and stick the FAIR principles

Keep the downloaded data in a separate subdirectory. Your working directory (working tree) should look like something like that:

.
├── .git/
├── DATA/
│   ├── OECD.foo.csv
│   ├── OECD.foobar.csv
│   ├── OECD.bar.csv
|   :
├── _extensions/
├── _outdir/
├── _metadata.yml
├── _quarto.yml
├── my_report.qmd
├── my_presentation.qmd
├── :
└── README.md

Report organization

The first part (introduction) of the report shall be dedicated to the description of the data you have downloaded. You shall motivate your choice and non-trivial aspects of the data (for example if you were discussing GDP per capita against Life expectancy, you should remind the reader about the definition of Life expectancy and GDP). You shall also give a hint about why you intend to plot some variables against others.

The second part (results) shall be dedicated to plots and animations. Commenting a plot is not paraphrasing. It consists in adding informations and explanations that are not already in and around the plot (this includes the plot itself, title, subtitle, caption, and guides). It also consists in questions and issues that the plot raises. For example, in the Gapminder presentation, the apparent connection between life expectancy and GDP per capita deserves to be discussed (is it stationary? is it homogeneous throughout continents ? …). Refrain from overplaying your hand: yours plots are not likely to provide causal explanations. Comment the data, all the data, and nothing but the data.

The third part is the appendix. The first two parts should be text and plots only. The third part should be code only.

The appendix shall be dedicated to the description of the data wrangling pipeline. You shall give the code.

You shall also give the code of the table and graphical pipelines in the appendix.

You shall avoid copy-paste coding. Don’t Repeat Yourself. The tidyverse is your friend. knitr provide the tools to organize the Quarto file so that you can write your code once and use it many times, once for data wrangling and plotting (without echoing), then for listing and explanation.

TipTip for organizing the report

Look at fake report organized along this principles. Note that works with the knitr engine.

Have a look at Rmarkdown the definitive guide to learn about knitr tricks (this works with quarto provided tou use the knitr engine).

This trick is described in Section 4.19