Code
params <- list(
year_e=2017,
year_p=1948
)M1 MIDS/MFA/LOGOS |
| Année 2025 |
params <- list(
year_e=2017,
year_p=1948
)datafile <- '../../resources/full_life_table.Rds'
fpath <- datafile #str_c("./DATA/", datafile) # here::here('DATA', datafile) # check getwd() if problem
if (! file.exists(fpath)) {
download.file("https://stephane-v-boucheron.fr/data/tamed_life_table.Rds",
fpath,
mode="wb")
}
life_table <- readr::read_rds(fpath)life_table <- life_table |>
mutate(Country = as_factor(Country)) |>
mutate(Country = fct_relevel(Country,
"Spain", "Italy", "France", "England & Wales", "Netherlands", "Sweden", "USA")) |>
mutate(Gender = as_factor(Gender)) |>
mutate(Area = fct_collapse(Country,
SE = c("Spain", "Italy", "France"),
NE = c("England & Wales", "Netherlands", "Sweden"),
USA = "USA")) Check on http://www.mortality.org the meaning of the different columns.
See: Demography: Measuring and Modeling Population Processes by SH Preston, P Heuveline, and M Guillot. Blackwell. Oxford. 2001.
Document Tables de mortalité françaises pour les XIXe et XXe siècles et projections pour le XXIe siècle contains detailed information on the construction of Life Tables for France.
Two kinds of Life Tables can be distinguished: Period tables (Table du moment) which contain for each period (here a period is a calendar year), the mortality risks at different age ranges (here, we have one year ranges) for that very period; and Tables de génération which contain for a given birthyear, the mortality risks at which an individual born during that year has been exposed.
The life tables investigated in this lab are Table du moment. According to the document by Vallin and Meslé, building the life tables required ,decisions and doctoring.
Lexis diagrams provide a graphical device that summarizes the construction of mortality quotients (and other rates in demography).
birth_dates <- as_date("1999-01-01") + duration(sample(2*365, size=20, replace=T),units="day")
death_dates <- as_date("2009-07-01") + duration(sample(3*365, size=20, replace=T),units="day")
b_period <- as_date("2010-01-01")
b_frame <- as_date(b_period - duration(1, units = "year"))
b_age <- 10L
tb_ld <- tibble(birth=birth_dates, death=death_dates)
tb_ld |>
ggplot() +
geom_segment(aes(x=b_frame,
xend=death,
y=interval(birth, b_frame)/years(1),
yend=interval(birth, death)/years(1))
) +
annotate(geom="rect",
xmin=b_period,
xmax=as_date(b_period + duration(1, units = "year")),
ymin=b_age,
ymax=b_age + 1L,
fill="grey",
alpha=.5) +
ylab("Age") +
xlab("Time") +
coord_cartesian(xlim=c(as_date(b_period - duration(6, units = "months")),
as_date(b_period + duration(18, units = "months"))),
ylim=c(b_age - .5, b_age+1.5)) +
labs(
title="A Lexis diagram",
subtitle = "for mortality quotient at Age 10 during Year 2010-11"
)Each line represents the life line of an individual born during years 1999 and 2000 and deceased beetween mid 2009 and mid 2012. In order to compute the mortality quotient at age 10 for year 2010, we have to compute the relevant number of occurrences, that is the number of segments ending in the grey rectangle, and the sum of exposure times, which is proportional to the sum of the lengths of the segments crossing the grey rectangle.
Have a look at Lexis diagram or at Preston et al.
Definitions can be obtained from www.lifeexpectancy.org. We translate it into mathematical (rather than demographic) language.
The mortality quotients define a probability distribution over \(\mathbb{N}\). This probability distribution is a construction that reflects the health situation in a population at a given time. This probability distribution does not describe the sequence of sanitary situations experienced by a cohort (people born during a specific year).
One works with a period, or current, life table (table du moment). This summarizes the mortality experience of persons across all ages in a short period, typically one year or three years. More precisely, the death probabilities \(q_x\) for every age \(x\) are computed for that short period, often using census information gathered at regular intervals. These \(q_x\)’s are then applied to a hypothetical cohort of \(100 000\) people over their life span to produce a life table.
small_tb <- life_table |>
filter(Country=='France', Year== 2010, Gender=='Female', Age < 10 | between(Age, 80, 89)) |>
select(Age, qx, mx, lx, dx, Lx, Tx, ex)
small_tb# A tibble: 20 × 8
Age qx mx lx dx Lx Tx ex
<int> <dbl> <dbl> <int> <int> <int> <int> <dbl>
1 0 0.00324 0.00325 100000 324 99722 8465207 84.6
2 1 0.00032 0.00032 99676 32 99660 8365484 83.9
3 2 0.00015 0.00015 99645 15 99637 8265824 83.0
4 3 0.00011 0.00011 99630 11 99624 8166187 82.0
5 4 0.00008 0.00008 99619 8 99615 8066563 81.0
6 5 0.00005 0.00005 99611 5 99608 7966948 80.0
7 6 0.00008 0.00008 99606 8 99602 7867339 79.0
8 7 0.00008 0.00008 99598 8 99594 7767737 78.0
9 8 0.00008 0.00008 99590 8 99586 7668143 77
10 9 0.00007 0.00007 99582 7 99578 7568557 76
11 80 0.0298 0.0302 75619 2252 74493 802295 10.6
12 81 0.0346 0.0352 73367 2535 72099 727802 9.92
13 82 0.0398 0.0406 70832 2818 69423 655702 9.26
14 83 0.0464 0.0475 68014 3158 66435 586280 8.62
15 84 0.0539 0.0554 64856 3493 63109 519845 8.02
16 85 0.0610 0.0630 61362 3745 59490 456736 7.44
17 86 0.0699 0.0725 57617 4029 55603 397246 6.89
18 87 0.0793 0.0826 53588 4249 51464 341643 6.38
19 88 0.0922 0.0966 49339 4547 47066 290180 5.88
20 89 0.105 0.111 44792 4706 42440 243114 5.43
The table above is not as readable as it should. Use package gt to get a more tunable outpout.
Reorder and filter the columns so that Age comes first (they identify rows), then qx, mx up to ex. You can use select or relocate, or both to do this. Note that Gender, Country, Year are constant in this tibble and need to be reported in the table header, but nowhere else.
Columns qx and mx (for mortality quotient and central death rate) should be dsplayed in scientific notation so that the fact that their range extends over several orders of magnitude shows up.
Columns lx, dx, Lx, Tx contain integer values.
Column ex (residual life expectancy) is a (fictional) decimal number of years
| Life table (extract) | |||||||
| France, Women, 2010 | |||||||
| Age | qx | mx | lx | dx | Lx | Tx | ex |
|---|---|---|---|---|---|---|---|
| 0 | 3.24e−03 | 3.25e−03 | 100,000 | 324 | 99,722 | 8,465,207 | 84.65 |
| 1 | 320e−06 | 320e−06 | 99,676 | 32 | 99,660 | 8,365,484 | 83.93 |
| 2 | 150e−06 | 150e−06 | 99,645 | 15 | 99,637 | 8,265,824 | 82.95 |
| 3 | 110e−06 | 110e−06 | 99,630 | 11 | 99,624 | 8,166,187 | 81.97 |
| 4 | 80e−06 | 80e−06 | 99,619 | 8 | 99,615 | 8,066,563 | 80.97 |
| 5 | 50e−06 | 50e−06 | 99,611 | 5 | 99,608 | 7,966,948 | 79.98 |
| 6 | 80e−06 | 80e−06 | 99,606 | 8 | 99,602 | 7,867,339 | 78.98 |
| 7 | 80e−06 | 80e−06 | 99,598 | 8 | 99,594 | 7,767,737 | 77.99 |
| 8 | 80e−06 | 80e−06 | 99,590 | 8 | 99,586 | 7,668,143 | 77.00 |
| 9 | 70e−06 | 70e−06 | 99,582 | 7 | 99,578 | 7,568,557 | 76.00 |
| 80 | 29.78e−03 | 30.24e−03 | 75,619 | 2,252 | 74,493 | 802,295 | 10.61 |
| 81 | 34.55e−03 | 35.16e−03 | 73,367 | 2,535 | 72,099 | 727,802 | 9.92 |
| 82 | 39.78e−03 | 40.59e−03 | 70,832 | 2,818 | 69,423 | 655,702 | 9.26 |
| 83 | 46.44e−03 | 47.54e−03 | 68,014 | 3,158 | 66,435 | 586,280 | 8.62 |
| 84 | 53.86e−03 | 55.36e−03 | 64,856 | 3,493 | 63,109 | 519,845 | 8.02 |
| 85 | 61.03e−03 | 62.95e−03 | 61,362 | 3,745 | 59,490 | 456,736 | 7.44 |
| 86 | 69.93e−03 | 72.46e−03 | 57,617 | 4,029 | 55,603 | 397,246 | 6.89 |
| 87 | 79.29e−03 | 82.56e−03 | 53,588 | 4,249 | 51,464 | 341,643 | 6.38 |
| 88 | 92.15e−03 | 96.6e−03 | 49,339 | 4,547 | 47,066 | 290,180 | 5.88 |
| 89 | 105.05e−03 | 110.88e−03 | 44,792 | 4,706 | 42,440 | 243,114 | 5.43 |
| From https://mortality.org | |||||||
In the sequel, we denote by \(F_{t}\) the cumulative distribution function for year \(t\). We agree on \(\overline{F}_t = 1 - F_t\) and \(F_t(-1)=0\). Henceforth, \(\overline{F}\) is called the survival function.
qxrisk of death at age \(x\), or mortality quotient at given age \(x\) for given year \(t\).
q_{t,x}
Defining and computing q_{t,x} does not boil down to knowing the number of people at age \(x\) at the beginning of ear \(t\) and knowing how many of them died during year \(t\). If we want to be rigorous, we need to know all life lines in the Lexis diagram, or equivalently, how many people at Age \(x\) were alive on each day of Year \(t\).
For a given year \(t\), the sequence of mortality quotients define a survival function \(\overline{F}_t\) using the following recursion:
\[q_{t,x} = \frac{\overline{F}_t(x) - \overline{F}_t(x+1)}{\overline{F}_t(x)}\] with boundary condition \(\overline{F}_t(-1) =1\).
This recursion can also be read as:
\[\overline{F}_{t}(x+1) = \overline{F}_{t}(x) \times (1-q_{t,x+1})\, .\]
This artificial probability distribution is used to define and compute life expectancies.
\(q_{t,x}\) is the hazard rate of \(\overline{F}_t\) at age \(x\).
mxor equivalently \[q_{t,x} = 1 - \exp(-m_{t,x})\]
If we want to define a continuous probability distribution \(G\) over \([0,\infty)\) so that \(G\) and \(F\) coincide over integers and \(G\) has piecewise constant hazard rate, we can pick \(m_{t,x}\) as the piecewise constant hazard rate.
lx\[l_t(x+1) = l_t(x) \times (1-q_{t,x}) \, ,\] with \(l_{t,0}\), the radix of the table, (arbitrarily) set to \(100000\). In the table lx is rounded to the next integer
Function \(l_{t,\cdot}\) and \(\overline{F}_t\) are connected by
\[l_{t,x + 1} = l_{t,0} \times \overline{F}_t(x)\,.\]
dxTxCompare with the denominator in the definition of qx and its description using the Lexis diagram.
We will stick to a simplified vision \(L_{t,x}= l_{t,x+1}\)
ex:This is the expectation of \(X -x\) for a random variable \(X\) distributed according to \(\overline{F}_t\) conditionnally on the event \(\{ X \geq x \}\). That is \(e_{t,x}\) is the expectation of the probability distribution defined by \(\overline{F}_t(\cdot + x-1)/\overline{F}_t(x-1)\).
Check dependencies between columns
life_table |>
filter( Year>=1948, Age < 90, Gender != "Both") |>
group_by(Country, Year, Gender) |>
summarise(m1 =max(abs(lx -dx -lead(lx))/lx, na.rm = T),
m2 =max(abs(lx * qx -dx)/dx, na.rm=T),
m3 =max(abs(Lx -lx * (1 + qx * (ax-1)))/Lx, na.rm=T),
m4 =max(abs(1-exp(-mx)-qx)/qx, na.rm=T),
.groups = "drop") |>
select(Year, Country, Gender, m1, m2, m3, m4) |>
rename(lx=m1, dx=m2, Lx=m3, qx=m4) |>
group_by(Country, Gender) |>
slice_max(order_by = desc(qx), n = 1) |>
ungroup() |>
gt() |>
tab_header(
title = "Life table (relative discrepancies)",
subtitle = ""
) |>
fmt_engineering(columns = ends_with("x"),
decimals=2,
drop_trailing_zeros = T ) |>
tab_source_note(source_note = "From https://mortality.org")| Life table (relative discrepancies) | ||||||
| Year | Country | Gender | lx | dx | Lx | qx |
|---|---|---|---|---|---|---|
| 2013 | England & Wales | Female | 20.33 × 10−6 | 77.95 × 10−3 | 10.96 × 10−6 | 1.51 × 10−3 |
| 2002 | England & Wales | Male | 55.97 × 10−6 | 41.69 × 10−3 | 31.12 × 10−6 | 3.15 × 10−3 |
| 2005 | France | Female | 10.7 × 10−6 | 31.67 × 10−3 | 12.56 × 10−6 | 1.93 × 10−3 |
| 2007 | France | Male | 34.2 × 10−6 | 105.17 × 10−3 | 28.96 × 10−6 | 2.22 × 10−3 |
| 2010 | Italy | Female | 19.78 × 10−6 | 49.96 × 10−3 | 16.84 × 10−6 | 1.5 × 10−3 |
| 2012 | Italy | Male | 31.08 × 10−6 | 95.38 × 10−3 | 27.53 × 10−6 | 2.31 × 10−3 |
| 2008 | Netherlands | Female | 21.37 × 10−6 | 85.69 × 10−3 | 14.31 × 10−6 | 2.08 × 10−3 |
| 2009 | Netherlands | Male | 25.74 × 10−6 | 65.96 × 10−3 | 27.43 × 10−6 | 2.74 × 10−3 |
| 2015 | Spain | Female | 17 × 10−6 | 10.9 × 10−3 | 11.22 × 10−6 | 2.16 × 10−3 |
| 2016 | Spain | Male | 31.74 × 10−6 | 33.98 × 10−3 | 21.2 × 10−6 | 2.14 × 10−3 |
| 2004 | Sweden | Female | 19.22 × 10−6 | 25.87 × 10−3 | 19.81 × 10−6 | 1.87 × 10−3 |
| 2005 | Sweden | Male | 28.5 × 10−6 | 32.16 × 10−3 | 26.8 × 10−6 | 2.88 × 10−3 |
| 2012 | USA | Female | 25.87 × 10−6 | 17.3 × 10−3 | 14.73 × 10−6 | 1.98 × 10−3 |
| 2010 | USA | Male | 40.17 × 10−6 | 62 × 10−3 | 36.57 × 10−6 | 2.53 × 10−3 |
| From https://mortality.org | ||||||
Several pictures share a common canvas:
Plot mortality quotients (qx) against age using a logarithmic scale on the \(y\) axis. Countries are identified by aesthetics (shape, color, linetype).
qx of all countries at all ages for years 1950, 1960, …, 2010.plotly to build an animated plot using Year for the frame aesthetics.Abiding to the DRY principle, define a prototype ggplot (alternatively plotly) object.
The prototype will then be fed with different datasets and decorated and arranged for the different figures.
p_1 |>
plotly::ggplotly()proto_plt2 <-
ggplot() +
aes(x=Age, y=qx, colour=Area, frame=Year, linetype=Country) +
geom_point(size=.1) +
geom_line(size=.1) +
scale_y_log10() +
labs(linetype=c("Country")) +
scale_x_continuous(breaks = c(seq(0, 100, 10), 109)) +
xlab("Age") +
ylab("Mortality quotients") +
facet_grid(cols=vars(Gender)) +
theme_minimal()p_2 |>
plotly::ggplotly()In 1948, NE and the USA exhibit comparable mortality quotients at all ages for the two genders, the USA looking like a more dangerous place for young adults. Spain lags behind, Italy and France showing up at intermediate positions.
By year 1962, SE has almost caught up the USA. Italy and Spain still have higher infant mortality while mortality quotients in the USA and France are almost identical at all ages for both genders. Mortality quotients attain a minimum around 10-12 for both genders. In Spain the minium central death rate has been divided by almost ten between 1948 and 1962.
If we dig further we observe that the shape of the male mortality quotients curve changes over time. In 1962, in the USA and France, mortality quotients exhibit a sharp increase between years 12 and 18, then remain almost constant between 20 and 30 and afterwards increase again. This pattern shows up in other countries but in a less spectacular way.
Twenty years afterwards, during years 1980-1985, death rates at age 0 have decreased at around \(1\%\) in all countries while it was \(7\%\) in Spain in 1948. The male central death curve exhibits a plateau between ages 20 and 30. Mortality quotients at this age look higher in France and the USA.
By year 2000, France is back amongst European countries (at least with respect to mortality quotients). Young adult mortality rates are higher in the USA than in Europe. This phenomenon became more pregnant during the last decade.
Plot ratios between mortality quotients (qx) in European countries and mortality quotients in the USA in 1948.
p <- with(params,
(filter(eur_us_table, Year %% 10==0) |>
ggplot() +
aes(x=Age,
y=Ratio,
frame=Year) +
aes(linetype=Country, show.legend = FALSE) +
aes(color=Area, show.legend = FALSE) +
scale_y_log10() +
scale_x_continuous(breaks = c(seq(0, 100, 10), 109)) +
geom_point(size=.1) +
geom_smooth(method="loess",
formula= 'y~ x',
se=FALSE,
span=.1,
size=.1) +
ylab("Ratio of mortality quotients with respect to US") +
ggtitle(label = glue("European countries with respect to US,{year_p}-{year_e}"),
subtitle = "Sweden consistently ahead") +
facet_grid(rows = vars(Gender))
))
gp <- p |>
ggplotly()
gp for (i in seq_along(gp$x$data)) {
gp$x$data[[i]]$showlegend <- FALSE
}
gpThis animation reveals less than the preceding one since we just have ratios with respect to the USA. But the patterns followed by European societies emerge in a more transparent way. The divide between northern and southern Europe at the onset of the period is even more visible. The ratios are important across the continent: there is a factor of 10 between spanish and swedish infant mortality rates. But the ratios at ages 50 and above tend to be similar. By the early 60s, the gap between southern and northern Europe has shrinked. By now, the ratios between mortality quotients tend to be within a factor of 2 across all ages, and even less at ages 50 and above.
Plot mortality quotients (column qx) for both genders as a function of Age for years 1946, 1956, ... up to 2016 . Use aesthetics to distinguish years. You will need to categorize the Year column (forcats:: may be helpful).
Gender and Country
p_3 |> ggplotly()Write a function ratio_mortality_rates with signature function(df, reference_year=1946, target_years=seq(1946, 2016, 10)) that takes as input:
life_table,ref_year andtarget_years
and that returns a dataframe with schema:
| Column Name | Column Type |
|---|---|
| Year | integer |
| Age | integer |
| mx | double |
| mx.ref_year | double |
| Country | factor |
| Gender | factor |
where (Country, Year, Age, Gender) serves as a primary key, mx denotes the central death rate at Age for Year and Gender in Country whereas mx_ref_year denotes central death rate at Age for argument reference_year in Country for Gender.
ratio_mortality_rates <- function(df,
reference_year=1946,
target_years=seq(1946, 2016, 10)){
jbe <- join_by(Age, Gender, Country)
right_df <- df |>
filter(Year==reference_year) |>
select(Age, Gender, Country, qx)
df |>
filter(Year %in% target_years, Age <90) |>
select(Age, Area, Gender, Country, qx, Year) |>
inner_join(right_df, by = jbe)
}Draw plots displaying the ratio \(q_{x,t}/q_{x, 1946}\) for ages \(x \in 1, \ldots, 90\) and year \(t\) for \(t \in 1946, \ldots, 2013\) where \(q_{x,t}\) is the mortality quotient at age \(x\) during year \(t\).
geom_smooth_line <- geom_smooth(method="loess",
formula = y ~ x,
se= FALSE,
size =.2,
span= .1
)q <- df_ratios |>
ggplot() +
aes(x=Age,
y=qx.x/qx.y) +
geom_smooth_line +
scale_y_log10()
q1 <- q +
aes(linetype=as_factor(Year),
col=as_factor(Year)) +
ylab("Ratio of mortality rates, reference Year 1946") +
labs(linetype="Year", col="Year") +
scale_colour_brewer() +
theme(legend.position = "none") +
facet_grid(
rows = vars(Country),
cols =vars(Gender),
scales = "free_y"
)
q1q1 + (
df_ratios |>
filter(Age <= 20)
)