LAB: Fake report

Motivation

'Labore modi labore voluptatem non ipsum dolor. Ut neque tempora voluptatem porro numquam dolor adipisci. Consectetur velit eius est. Modi ipsum ut aliquam tempora quaerat consectetur eius. Neque labore porro porro labore. Sed consectetur velit velit sed sed porro. Etincidunt quiquia dolorem est dolor magnam. Magnam tempora modi etincidunt consectetur etincidunt.'

Reproducible data gathering

Data and metadata should be retrieved programmatically.

Here is an example from the OECD API documentation. Other providers have their own API for example Eurostat.

url = "https://sdmx.oecd.org/public/rest/data/OECD.SDD.STES,DSD_STES@DF_CLI/.M.LI...AA...H?startPeriod=2023-02&dimensionAtObservation=AllDimensions&format=csvfilewithlabels"

# df =  pd.read_csv(url)

# df.info

Note that in your report, data gathering chunks should go to the appendix.

200 countries, 50 years, 20 lines of code

You should build your report and your presentation around plots and tables.

/home/boucheron/Documents/IFEBY310/.nlp-venv/lib/python3.12/site-packages/gapminder/data.py:1: UserWarning:

pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.

Display tables with style (no kitsch please).

  Continent level_1 Country Year Life Expectancy Population (Millions Inhab.) GDP per Capita
0 Africa 1262 Reunion 1962 57.7 0.4 $3,174
1 Africa 1348 Sierra Leone 1972 35.4 2.9 $1,354
2 Americas 1115 Nicaragua 2007 72.9 5.7 $2,749
3 Americas 1610 United States 1962 70.2 186.5 $16,173
4 Asia 702 India 1982 56.6 708.0 $856
5 Asia 1213 Philippines 1957 51.3 26.1 $1,548
6 Europe 1148 Norway 1992 77.3 4.3 $33,966
7 Europe 150 Bosnia and Herzegovina 1982 70.7 4.2 $4,127
8 Oceania 68 Australia 1992 77.6 17.5 $23,425
9 Oceania 1096 New Zealand 1972 71.9 2.9 $16,046

Visualization matters

Wizardry

Appendix

The appendix should contain the code that helped you create the document




# Load the data
#
import pandas as pd
import plotly
import plotly.express as px

pd.options.plotting.backend = "plotly"
from gapminder import gapminder




#
# Sample 2 rows per continent and format for display
#
df = (
  gapminder
    .groupby("continent", group_keys=True)
    .apply(lambda g: 
           g.sample(n=2, random_state=42), include_groups=False)
    .reset_index()
    .assign(pop=lambda x: x["pop"] / 1e6)
)

df = df.rename(columns={
    "country": "Country",
    "continent": "Continent",
    "year": "Year",
    "lifeExp": "Life Expectancy",
    "pop": "Population (Millions Inhab.)",
    "gdpPercap": "GDP per Capita",
})

df.style.format(
  {
    "Population (Millions Inhab.)": "{:.1f}", 
    "Life Expectancy": "{:.1f}", 
    "GDP per Capita": "${:,.0f}"
    }
)



#
#
# A neater color scale 
#
neat_color_scale = {
      "Africa" : "#01d4e5",
      "Americas" : "#7dea01" ,
      "Asia" : "#fc5173",
      "Europe" : "#fde803",
      "Oceania" : "#536227"
}




pd.options.plotting.backend = 'plotly'



#
# Use plotly backend for dataframes
#


a_year = gapminder["year"].sample(1, random_state=42).iloc[0]
df = gapminder[gapminder["year"] == a_year].copy()

p = px.scatter(
    data_frame=df,
    x="gdpPercap",
    y="lifeExp",
    size="pop",
    color="continent",
    hover_data={
      "country": True, 
      "pop": ":,", 
      "gdpPercap": ":,.0f", 
      "lifeExp": ":.1f"},
    log_x=True,
    size_max=15,
    color_discrete_map=neat_color_scale,
    opacity=0.5,
    title=f"Gapminder {a_year}",
    labels={
        "gdpPercap": "Yearly Income per Capita",
        "lifeExp": "Life Expectancy",
    },
)

p.update_layout(
    xaxis_title="Yearly Income per Capita",
    yaxis_title="Life Expectancy",
    annotations=[
        dict(
            text="From sick and poor (bottom left) to healthy and rich (top right)",
            xref="paper",
            yref="paper",
            x=0.5,
            y=-0.12,
            showarrow=False,
            font=dict(size=10),
        )
    ],
)
p



#
# Animated scatter on the whole gapminder dataframe, year as frame
#
p_animated = px.scatter(
    data_frame=gapminder,
    x="gdpPercap",
    y="lifeExp",
    size="pop",
    color="continent",
    animation_frame="year",
    hover_data={"country": True, "pop": ":,", "gdpPercap": ":,.0f", "lifeExp": ":.1f"},
    log_x=True,
    size_max=15,
    color_discrete_map=neat_color_scale,
    opacity=0.5,
    title="Gapminder",
    labels={
        "gdpPercap": "Yearly Income per Capita",
        "lifeExp": "Life Expectancy",
    },
)
p_animated.update_layout(
    height=500,
    width=750,
    xaxis_title="Yearly Income per Capita",
    yaxis_title="Life Expectancy",
    showlegend=True,
)
p_animated