Rows: 2,320,744
Columns: 21
$ Country <chr> "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "AB…
$ Region <chr> "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",…
$ Residence <chr> "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",…
$ Ethnicity <chr> "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",…
$ SocDem <chr> "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0",…
$ Version <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",…
$ `Ref-ID` <chr> "1708.01", "1708.01", "1708.01", "1708.01", "1708.01", "170…
$ Year1 <int> 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010,…
$ Year2 <int> 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010, 2010,…
$ TypeLT <int> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,…
$ Sex <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ Age <int> 0, 1, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70…
$ AgeInt <dbl> 1, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,…
$ `m(x)` <dbl> 0.00715, 0.00000, 0.00000, 0.00026, 0.00026, 0.00147, 0.001…
$ `q(x)` <dbl> 0.00710, 0.00000, 0.00000, 0.00132, 0.00132, 0.00730, 0.009…
$ `l(x)` <dbl> 100000, 99290, 99290, 99290, 99159, 99028, 98305, 97348, 96…
$ `d(x)` <dbl> 710, 0, 0, 131, 131, 723, 957, 835, 1228, 1237, 1606, 3046,…
$ `L(x)` <dbl> 99341, 397160, 496450, 496122, 495468, 493332, 489132, 4846…
$ `T(x)` <dbl> 7387752, 7288411, 6891251, 6394801, 5898679, 5403211, 49098…
$ `e(x)` <dbl> 73.88, 73.41, 69.41, 64.41, 59.49, 54.56, 49.95, 45.41, 40.…
$ `e(x)Orig` <chr> "73.88", "73.41", "69.41", "64.41", "59.48", "54.56", "49.9…
Lab: Life Tables (1948-2016)
M1 MIDS/MFA/LOGOS |
| Année 2025 |
Introduction
Life tables
A period life table is supposed to represent the mortality conditions at a specific moment in time. Period life tables are said to be synthetic in that each age group of data comes from a different birth cohort.
Demography
- History
- Perspective
Getting the data
First attempt at understanding
Glossary
Connections between terms
Data wrangling
Schema
Region: string
Residence: string
Ethnicity: string
SocDem: string
Version: string
Ref-ID: string
Year1: int32
Year2: int32
TypeLT: int32
Age: int32
AgeInt: double
m(x): double
q(x): double
l(x): double
d(x): double
L(x): double
T(x): double
e(x): double
e(x)Orig: string
Country: string
Sex: int32
Saving and reloading
The evolution of mortality quotients in Western Europe and the USA since WWII
Facets
Animations
Focus on infant mortality
Focus on young adults
Older people
========================================================================================
Definitions
Check on http://www.mortality.org the meaning of the different columns.
See: Demography: Measuring and Modeling Population Processes by SH Preston, P Heuveline, and M Guillot. Blackwell. Oxford. 2001.
Document Tables de mortalité françaises pour les XIXe et XXe siècles et projections pour le XXIe siècle contains detailed information on the construction of Life Tables for France.
Period tables versus cohort tables
Two kinds of Life Tables can be distinguished: Period tables (Table du moment) which contain for each period (here a period is a calendar year), the mortality risks at different age ranges (here, we have one year ranges) for that very period; and Cohort tables (Tables de génération) which contain for a given birthyear, the mortality risks at which an individual born during that year has been exposed.
The life tables investigated in this lab are Period table (Table du moment). According to the cited documents (Methodology, Vallin and Meslé), building life tables requires decisions and doctoring (imputations, smoothing, …).
Lexis diagrams
Lexis diagrams provide a graphical device that summarizes the construction of mortality quotients (and other rates in demography).
Each line represents the life line of an individual born during years 1999 and 2000 and deceased beetween mid 2009 and mid 2012. In order to compute the mortality quotient at age 10 for year 2010, we have to compute the relevant number of occurrences, that is the number of segments ending in the grey rectangle, and the sum of exposure times, which is proportional to the sum of the lengths of the segments crossing the grey rectangle.
Have a look at Lexis diagram, at Preston et al, at .
Definitions can be obtained from www.lifeexpectancy.org. We translate it into mathematical (rather than demographic) language.
The mortality quotients define a probability distribution over \(\mathbb{N}\). This probability distribution is a construction that reflects the health situation in a population at a given time. This probability distribution does not describe the sequence of health situations experienced by a cohort (people born during a specific year).
One works with a period, or current, life table (table du moment). This summarizes the mortality experience of persons across all ages in a short period, typically one year or three years. More precisely, the death probabilities \(q_x\) for every age \(x\) are computed for that short period, often using census information gathered at regular intervals. These \(q_x\)’s are then applied to a hypothetical cohort of \(100 000\) people over their life span to produce a life table.
Columns q(x) and m(x) (for mortality quotient and central death rate) are displayed in scientific notation to stress the fact that their range extends over several orders of magnitude.
| Age | q(x) | m(x) | l(x) | d(x) | L(x) | T(x) | e(x) |
|---|---|---|---|---|---|---|---|
| 0 | 3.43 × 10−3 | 3.44 × 10−3 | 100000 | 343 | 99678 | 8150587 | 81.51 |
| 1 | 2.80 × 10−4 | 2.80 × 10−4 | 99657 | 28 | 99643 | 8050909 | 80.79 |
| 2 | 1.70 × 10−4 | 1.70 × 10−4 | 99629 | 17 | 99620 | 7951266 | 79.81 |
| 3 | 1.20 × 10−4 | 1.20 × 10−4 | 99612 | 12 | 99606 | 7851646 | 78.82 |
| 4 | 9.00 × 10−5 | 9.00 × 10−5 | 99600 | 9 | 99596 | 7752040 | 77.83 |
| 5 | 1.00 × 10−4 | 1.00 × 10−4 | 99591 | 10 | 99586 | 7652444 | 76.84 |
| 6 | 9.00 × 10−5 | 9.00 × 10−5 | 99581 | 9 | 99576 | 7552858 | 75.85 |
| 7 | 8.00 × 10−5 | 8.00 × 10−5 | 99572 | 8 | 99568 | 7453282 | 74.85 |
| 8 | 7.00 × 10−5 | 7.00 × 10−5 | 99564 | 7 | 99560 | 7353714 | 73.86 |
| 9 | 8.00 × 10−5 | 8.00 × 10−5 | 99557 | 8 | 99553 | 7254153 | 72.86 |
| 64 | 9.20 × 10−3 | 9.25 × 10−3 | 88340 | 813 | 87934 | 1913253 | 21.66 |
| 80 | 3.88 × 10−2 | 3.96 × 10−2 | 66010 | 2564 | 64728 | 644765 | 9.77 |
| 81 | 4.40 × 10−2 | 4.50 × 10−2 | 63446 | 2793 | 62050 | 580037 | 9.14 |
| 82 | 5.00 × 10−2 | 5.13 × 10−2 | 60653 | 3034 | 59136 | 517987 | 8.54 |
| 83 | 5.68 × 10−2 | 5.85 × 10−2 | 57619 | 3275 | 55982 | 458851 | 7.96 |
| 84 | 6.39 × 10−2 | 6.60 × 10−2 | 54344 | 3471 | 52608 | 402870 | 7.41 |
| 85 | 7.27 × 10−2 | 7.55 × 10−2 | 50873 | 3701 | 49022 | 350261 | 6.89 |
| 86 | 8.24 × 10−2 | 8.60 × 10−2 | 47172 | 3889 | 45228 | 301239 | 6.39 |
| 87 | 9.26 × 10−2 | 9.71 × 10−2 | 43283 | 4007 | 41280 | 256011 | 5.91 |
| 88 | 1.05 × 10−1 | 1.10 × 10−1 | 39276 | 4110 | 37221 | 214732 | 5.47 |
Understanding the columns of the life table
In the sequel, we denote by \(F_{t}\) the cumulative distribution function for year \(t\). We agree on \(\overline{F}_t = 1 - F_t\) and \(F_t(-1)=0\). Henceforth, \(\overline{F}\) is called the survival function.
q(x)-
(age-specific)
risk of deathat age \(x\), ormortality quotientat given age \(x\) for given year \(t\).
q_{t,x}
Defining and computing q_{t,x} does not boil down to knowing the number of people at age \(x\) at the beginning of ear \(t\) and knowing how many of them died during year \(t\). If we want to be rigorous, we need to know all life lines in the Lexis diagram, or equivalently, how many people at Age \(x\) were alive on each day of Year \(t\). That is, we need accurate death counts and exposures to risk.
For a given year \(t\), the sequence of mortality quotients define a survival function \(\overline{F}_t\) using the following recursion:
\[q_{t,x} = \frac{\overline{F}_t(x) - \overline{F}_t(x+1)}{\overline{F}_t(x)}\] with boundary condition \(\overline{F}_t(-1) =1\).
This recursion can also be read as:
\[\overline{F}_{t}(x+1) = \overline{F}_{t}(x) \times (1-q_{t,x+1})\, .\]
Up to (non-trivial) corrections, this artificial probability distribution is used to define and compute life expectancies.
\(q_{t,x}\) is the hazard rate of \(\overline{F}_t\) at age \(x\).
m(x)- central death rate at age \(x\) during year \(t\). This is connected with \(q_{t,x}\) by \[m_{t,x} = -\log(1- q_{t,x}) \,,\]
or equivalently \[q_{t,x} = 1 - \exp(-m_{t,x})\]
If we want to define an absolutely continuous probability distribution \(G\) over \([0,\infty)\) so that \(G\) and \(F\) coincide over integers and \(G\) has piecewise constant hazard rate, we can pick \(m_{t,x}\) as the piecewise constant hazard rate.
l(x)- the so-called survival function: the scaled proportion of persons alive at age \(x\). These values are computed recursively from the \(q_{t,x}\) values using the formula
\[l_t(x+1) = l_t(x) \times (1-q_{t,x}) \, ,\] with \(l_{t,0}\), the radix of the table, (arbitrarily) set to \(100000\). In the table lx is rounded to the next integer
Function \(l_{t,\cdot}\) and \(\overline{F}_t\) are connected by
\[l_{t,x + 1} = l_{t,0} \times \overline{F}_t(x)\,.\]
dx- \(d_{t,x} = q_{t,x} \times l_{t,x}\). The fictitious number of deaths occurring at age \(x\) during year \(t\). Again this is a rounded quantity.
T(x)- Total number of person-years lived by the cohort from age \(x\) to \(x+1\). This is the sum of the years lived by the \(l_{t, x+1}\) persons who survive the interval, and the \(d_{t,x}\) persons who die during the interval. The former contribute exactly \(1\) year each, while the latter contribute, on average, approximately half a year, so that \(L_{t,x} = l_{t,x+1} + 0.5 \times d_{t,x}\). This approximation assumes that deaths occur, on average, half way in the age interval x to x+1. Such is satisfactory except at age 0 and the oldest age, where other approximations are often used.
Compare with the denominator in the definition of q(x) and its description using the Lexis diagram.
We will stick to a simplified vision \(L_{t,x}= l_{t,x+1}\)
-
e(x): - Residual Life Expectancy at age \(x\) and year \(t\)
This is the expectation of \(X -x\) for a random variable \(X\) distributed according to \(\overline{F}_t\) conditionnally on the event \(\{ X \geq x \}\). That is \(e_{t,x}\) is the expectation of the probability distribution defined by \(\overline{F}_t(\cdot + x-1)/\overline{F}_t(x-1)\).
Check dependencies between columns
Western countries aroun 1950
Several pictures share a common canvas:
Plot mortality quotients (q(x)) against age using a logarithmic scale on the \(y\) axis. Countries are identified by aesthetics (shape, color, linetype).
- Use facetting to plot
q(x)of all countries at all ages for years 1950, 1960, …, 2010. - Use
plotlyto build an animated plot usingYearfor theframeaesthetics.
Abiding to the DRY principle, define a prototype ggplot (alternatively plotly) object.
The prototype will then be fed with different datasets and decorated and arranged for the different figures.
In 1948, NE (Northern Europe) and the USA exhibit comparable mortality quotients at all ages for the two genders, the USA looking like a more dangerous place for young adults. Spain lags behind, Italy and France showing up at intermediate positions.
By year 1962, SE (Northern Europe)has almost caught up the USA. Italy and Spain still have higher infant mortality while mortality quotients in the USA and France are almost identical at all ages for both genders. Mortality quotients attain a minimum around 10-12 for both genders. In Spain the minium central death rate has been divided by almost ten between 1948 and 1962.
If we dig further we observe that the shape of the male mortality quotients curve changes over time. In 1962, in the USA and France, mortality quotients exhibit a sharp increase between years 12 and 18, then remain almost constant between 20 and 30 and afterwards increase again. This pattern shows up in other countries but in a less spectacular way.
Twenty years afterwards, during years 1980-1985, death rates at age 0 have decreased at around \(1\%\) in all countries while it was \(7\%\) in Spain in 1948. The male curve exhibits a plateau between ages 20 and 30. Mortality quotients at this age look higher in France and the USA.
By year 2000, France is back amongst European countries (at least with respect to mortality quotients). Young adult mortality rates are higher in the USA than in Europe. This phenomenon became more pregnant during decade 2010-2020.
Plot ratios between mortality quotients (q(x)) in European countries and mortality quotients in the USA in 1948.
This animation reveals less than the preceding one since we just have ratios with respect to the USA. But the patterns followed by European societies emerge in a more transparent way. The divide between northern and southern Europe at the onset of the period is even more visible. The ratios are important across the continent: there is a factor of 10 between spanish and swedish infant mortality rates. But the ratios at ages 50 and above tend to be similar. By the early 60s, the gap between southern and northern Europe has shrinked. By now, the ratios between mortality quotients tend to be within a factor of 2 across all ages, and even less at ages 50 and above.
Death rates evolution since WW II
Plot mortality quotients (column q(x)) for both genders as a function of Age for years 1946, 1956, ... up to 2016 . Use aesthetics to distinguish years. You will need to categorize the Year column (forcats:: may be helpful).
- Facet by
GenderandCountry - Pay attention to axes labels, to legends. Assess logarithmic scales.
Write a function ratio_mortality_rates with signature function(df, reference_year=1946, target_years=seq(1946, 2016, 10)) that takes as input:
- a dataframe with the same schema as
life_table, - a reference year
ref_yearand - a sequence of years
target_years
and that returns a dataframe with schema:
| Column Name | Column Type |
|---|---|
| Year | integer |
| Age | integer |
| q(x) | double |
| q(x).ref_year | double |
| Country | factor |
| Gender | factor |
where (Country, Year, Age, Gender) serves as a primary key, q(x) denotes the mortality quotient at Age for Year and Gender in Country whereas q(x)_ref_year denotes mortality quotient at Age for argument reference_year in Country for Gender.
Draw plots displaying the ratio \(q_{x,t}/q_{x, 1946}\) for ages \(x \in 1, \ldots, 90\) and year \(t\) for \(t \in 1946, \ldots, 2013\) where \(q_{x,t}\) is the mortality quotient at age \(x\) during year \(t\).
- Handle both genders and all countries
- One properly facetted plot is enough.
During the last seventy years, death rates decreased at all ages in all seven countries.This progress has not been uniform across ages, genders and countries. Across most countries, infant mortality dramatically improved during the first post-war decade while death rates at age 50 and above remained stable until the mid seventies.
Trends
We noticed that mortality quotients did not evolve in the same way across all ages: first, the decay has been much more significant at low ages; second, the decay of mortality quotients at old ages (above 60) mostly took place during the last four decades. It is worth digging separately at what happened for different parts of life.
All European countries achieved the same infant mortality rates after year 2000. The USA now lag behind.
During years 1940-1945, in the Netherlands and France, gains obtained before 1940 were reversed. Year 1945 was particularly difficult in the Netherlands.
Plot mortality quotients at ages \(15, 20, 40, 60\) as a function of time. Facet by Gender and Country
While death rates at ages 15 and 20 among women are close across all societies, death rates are higher at age 20 than at age 15 among men. In France, at age 20, death rates declined from 1945 until 1960, and then increased back to their initial level until 1980. Male death rates at age 60 started to decline around 1980. Female death rates at age 60 declined steadily throughout the 7 decades. Years 1940-1945 exhibit disruptions with different shapes and intensities in Italy, France, England & Wales, and the Netherlands.
Appendix
res_path <- fs::path('~/Dropbox/HMD/res') |> fs::path_expand()
res_url <- 'https://www.lifetable.de/File/GetDocument/data/hld.zip'
dest_path <- fs::path_join(c(fs::path_dir(res_path), fs::path_file(res_url)))
dpath <- fs::path('~/Dropbox/HMD/res') |> fs::path_expand()
if (!fs::file_exists(res_path)) {
download.file(res_url, destfile = dest_path)
zip::unzip(dest_path)
}
res_col_spec <- cols(
Country = col_character(),
Region = col_character(),
Residence = col_character(),
Ethnicity = col_character(),
SocDem = col_character(),
Version = col_character(),
`Ref-ID` = col_character(),
Year1 = col_integer(),
Year2 = col_integer(),
TypeLT = col_integer(),
Sex = col_integer(),
Age = col_integer(),
AgeInt = col_double(),
`m(x)` = col_double(),
`q(x)` = col_double(),
`l(x)` = col_double(),
`d(x)` = col_double(),
`L(x)` = col_double(),
`T(x)` = col_double(),
`e(x)` = col_double(),
`e(x)Orig` = col_character()
)
ds_path <- '~/Dropbox/HMD/hld-part.parquet'
if (!fs::file_exists(ds_path)){
tb |>
group_by(Country, Sex) |>
arrow::write_dataset(path=ds_path)
}
country_iso3 <- c(
"France"="FRA",
"Canada"="CAN",
"Netherlands"="NLD",
"Sweden"="SWE",
"Italy"="ITA",
"Spain"="ESP",
"Portugal"="PRT",
"USA"="USA"
)
tb_iso3 <- tibble(
name=names(country_iso3), iso3=country_iso3
)
ds <- open_dataset(ds_path)
ds$schema
birth_dates <- lubridate::as_date("1999-01-01") + lubridate::duration(sample(2*365, size=20, replace=T),units="day")
death_dates <- lubridate::as_date("2009-07-01") + lubridate::duration(sample(3*365, size=20, replace=T),units="day")
b_period <- lubridate::as_date("2010-01-01")
b_frame <- lubridate::as_date(b_period - lubridate::duration(1, units = "year"))
b_age <- 10L
tb_ld <- tibble(birth=birth_dates, death=death_dates)
tb_ld |>
ggplot() +
geom_segment(aes(x=b_frame,
xend=death,
y=lubridate::interval(birth, b_frame)/lubridate::years(1),
yend=lubridate::interval(birth, death)/lubridate::years(1))
) +
annotate(geom="rect",
xmin=b_period,
xmax=b_period + lubridate::duration(1, units = "year"),
ymin=b_age,
ymax=b_age + 1L,
fill="grey",
alpha=.5) +
ylab("Age") +
xlab("Time") +
coord_cartesian(xlim=c(lubridate::as_date(b_period - lubridate::duration(6, units = "months")),
lubridate::as_date(b_period + lubridate::duration(18, units = "months"))),
ylim=c(b_age - .5, b_age+1.5)) +
labs(
title="A Lexis diagram",
subtitle = "for mortality quotient at Age 10 during Year 2010-11"
) +
theme_minimal()
tb |>
filter(Country=='FRA', Year1== 2010, Sex==2, Age<10|Age==64|between(Age, 80, 88)) |>
select(Age, `q(x)`, `m(x)`, `l(x)`, `d(x)`, `L(x)`, `T(x)`, `e(x)`) |>
gt::gt() |>
gt::tab_caption("Excerpt of French Period Life Table for Year 2010, Females") |>
gt::fmt_scientific(columns=c(`q(x)`, `m(x)`))
life_tables |>
filter( Year>=1948, Age < 90, Gender != "Both") |>
group_by(Country, Year, Gender) |>
summarise(m1 =max(abs(lx -dx -lead(lx))/lx, na.rm = T),
m2 =max(abs(lx * qx -dx)/dx, na.rm=T),
m3 =max(abs(Lx -lx * (1 + qx * (ax-1)))/Lx, na.rm=T),
m4 =max(abs(1-exp(-mx)-qx)/qx, na.rm=T),
.groups = "drop") |>
select(Year, Country, Gender, m1, m2, m3, m4) |>
rename(lx=m1, dx=m2, Lx=m3, qx=m4) |>
group_by(Country, Gender) |>
slice_max(order_by = desc(qx), n = 1) |>
ungroup() |>
gt() |>
tab_header(
title = "Life table (relative discrepancies)",
subtitle = ""
) |>
fmt_engineering(columns = ends_with("x"),
decimals=2,
drop_trailing_zeros = T ) |>
tab_source_note(source_note = "From https://mortality.org")