12 Characterizations of probability distributions

12.1 Motivation

In full generality, a probability distribution is a complex and opaque object. It is a \([0,1]\)-valued function defined over a \(\sigma\)-algebra of subsets of some universe. A concrete \(\sigma\)-algebra, let alone the abstract notion of \(\sigma\)-algebra, is not easily grasped. Hence, looking for simpler characterizations of probability distributions is a sensible goal. When facing questions like: are two probability distributions equal?, we know it suffices to check that the two distributions coincide on generating \(\pi\)-classes (see Theorem 11.4 and consequences). This makes Cumulative Distribution Functions (CDFs) precious tools. Cumulative Distribution Functions and their generalized inverse functions (quantile functions see Chapter 13) are very convenient when handling maxima, minima, or more generally order statistics of collections of independent random variables, but when it comes to handling sums of independent random variables or branching processes, cumulative distribution functions are of moderate help.

In this lesson, we review two related ways of characterizing probability distributions through functions defined on the real line: Laplace transforms (Section 12.2)) and characteristic functions which extend Fourier transforms to probability distributions (Section 12.3). The two methods are distinct in scope but they rely on the same idea as Probability Generating Functions (Chapter 6) and share common features.

Indeed, Probability Generating Functions can be seen as special case of Laplace transforms. The latter can be seen as special cases of Fourier transforms.All three methods do characterize probability distributions. They are equipped with inversion formulae. The three methods provide us with a seamless treatment of sums of independent random variables. All three methods relate the integrability of probability distributions and the smoothness of transforms.

In the next lessons (?sec-chaprevisiCLT), we shall see that the three transforms also characterize convergence in distribution.

Probability generating functions, Laplace transforms and characteristic functions deliver an important analytical machinery to Probability Theory. From Analysis, we get off-the-shelf arguments to establish smoothness properties of transforms, and with little more work, we can construct the inversion formulae.

12.2 Laplace transform

Laplace transforms characterize probability distributions on \([0, \infty).\)

12.2.1 Definition and elementary properties

Definition 12.1 Let \(P\) be a probability distribution function over \([0,\infty]\) with cumulative distribution function \(F\). The Laplace transform of \(P\) is the function \(U\) from \([0,\infty)\) to \([0,1]\) defined by

\[ U(\lambda) = \mathbb{E}\left[\mathrm{e}^{- \lambda X}\right] = \int_{[0,\infty)} \mathrm{e}^{- \lambda x} \mathrm{d}F(x) \, \]

where \(X \sim P\).

A probability distribution \(P\) over \(\mathbb{N}\) is also a probability distribution over \([0,\infty)\), as such it has both a probability generating function \(G\) and a Laplace transform \(U\). They are connected by

\[U(\lambda) = G(\mathrm{e}^{-\lambda}) \, .\]

Which properties of Probability Generating Functions are also satisfied by Laplace transforms?

Proposition 12.1 If \(U: [0,\infty) \to [0,1]\) is the Laplace transform of a probability distribution \(P\) over \([0, \infty)\), then

\(U(0)=1\);
\(U\) is continuous;
\(U\) is non-increasing.
\(U\) is convex.

Exercise 12.1 Check the assertions in the proposition.

Can we recognize Laplace transform of probability distributions over \([0,\infty)\)? This is the content of the next Theorem (which proof is beyond the reach of this course).

Theorem 12.1 (Bernstein’s Theorem) A function \(U: (0, \infty) \to (0,\infty)\) is the Laplace transform of a probability distribution over \([0,\infty)\) iff

\(U\) is infinitely many times differentiable over \((0, \infty)\)
\(U(0)=1\)
\(U\) is completely monotonous: \((-1)^k U^{(k)} \geq 0\) over \((0, \infty)\)

Using the connexion between Probability Generating Functions and Laplace transforms, we are in position to characterize those power series that are Probability Generating Functions.

Corollary 12.1 A function \(G: [0, 1] \to [0,1]\) is the Probability Generating Function of a probability distribution over \(\mathbb{N}\) iff

\(G\) is infinitely many times differentiable over \((0,1)\)
\(G(1)=1\)
\(G\) is completely monotonous: \((-1)^k G^{(k)} \geq 0\) over \((0, 1)\)

Example 12.1 Let \(X\) be \(\text{Gamma}(p, \nu)\)-distributed. The Laplace transform of (the distribution of) \(X\) is

\[ \begin{array}{rl} U(\lambda) & = \int_0^\infty \nu \mathrm{e}^{-\lambda x} \mathrm{e}^{-\nu x} \frac{(\nu x)^{p-1}}{\Gamma(p)} \mathrm{d} x \\ & = \frac{\nu^p}{(\lambda +\nu)^p} \int_0^\infty (\lambda +\nu) \mathrm{e}^{-(\lambda +\nu) x} \frac{((\nu+\lambda) x)^{p-1}}{\Gamma(p)} \mathrm{d} x \\ & = \frac{\nu^p}{(\lambda +\nu)^p} \, . \end{array} \]

12.2.2 Injectivity of Laplace transforms and an inversion formula

Theorem 12.2 (Widder’s Theorem) A probability distribution on \([0, \infty)\) is characterized by its Laplace transform.

The construction of the inversion formula relies on deviation inequalities for Poisson distribution. The next proposition is easily checked by using Markov’s inequality with exponential functions and optimization.

Theorem 12.3 (Tail bounds for Poisson distribution) Let \(Z\) be Poisson distributed. Let \(h(x) = \mathrm{e}^x - x -1\) and \(h^*(x)= (x+1)\log (x+1) -x, x\geq -1\) be its convex dual. Then for all \(\lambda \in \mathbb{R}\)

\[\log \mathbb{E} \mathrm{e}^{\lambda (Z-\mathbb{E}Z)} = \mathbb{E}Z h(\lambda) \, .\]

For \(t\geq 0\) \[ \Pr \Big\{ Z \geq \mathbb{E}Z + t \Big\} \leq \mathrm{e}^{-\mathbb{E}Z h^*\Big(\frac{t}{\mathbb{E}Z}\Big)} \] and for \(0 \leq t \leq \mathbb{E}Z\) \[ \Pr \Big\{ Z \leq \mathbb{E}Z -t \Big\} \leq \mathrm{e}^{-\mathbb{E}Z h^*\Big(\frac{-t}{\mathbb{E}Z}\Big)} \, . \]

Remark 12.1.

See Section 4.3) for the notion of convex duality.
The next bounds on \(h^*\) deliver looser but easier tail bounds

\[\begin{array}{rll} h^*(t) & \geq \frac{t^2}{2(1 + t/3)} & \text{for } t >0 \\ h^*(t) & \geq \frac{t^2}{2} & \text{for } t <0 \, . \end{array} \]

Corollary 12.2 For all positive \(x, y, y \neq x\), \[ \lim_{n \to \infty} \sum_{k=0}^{nx} e^{-n y} \frac{(ny)^k}{k!} = \mathbb{I}_{y<x} \,. \]

We shall check in one of the next lessons that for \(x >0\): \[ \lim_{n \to \infty} \sum_{k=0}^{\lfloor nx\rfloor} e^{-n x} \frac{(nx)^k}{k!} = \frac{1}{2} \, . \]

Proof. Let \(F\) be the cumulative distribution function of \(P\) and \(U\) its Laplace transform. Let \(X \sim P\).

It suffices to show that \(F(x)\) can be computed from \(U\) at any \(x\) where \(F\) is continuous.

Function \(U\) is infinitely many times differentiable on \((0, \infty)\). For \(k\in \mathbb{N},\)

\[ \frac{\mathrm{d}^kU}{\mathrm{d}\lambda^k} = (-1)^k \int_{[0,\infty)} x^k e^{-\lambda x} \mathrm{d}F(x) \, . \] and \(U\) has a power series expansion at every \(\lambda \in (0,1)\), for \(\lambda' \in (0,1)\):

\[\begin{array}{rl} U(\lambda') & = \sum_{k=0}^\infty \frac{(\lambda' -\lambda)^k}{k!} \frac{\mathrm{d}^kU}{\mathrm{d}\lambda^k} \, . \end{array} \]

By [Corollary 12.2), for all \(0 < y \neq x\), \(\lim_{n \to \infty} \sum_{k=0}^{nx} e^{-n y} \frac{(ny)^k}{k!} = \mathbb{I}_{y<x}\).

\[\begin{array}{rl} F(x) & = \int_{\mathbb{R_+}} \mathbb{I}_{y\leq x} \mathrm{d}F(y) \\ & = \int_{\mathbb{R_+}} \mathbb{I}_{y< x} \mathrm{d}F(y) \\ & = \int_{(-\infty, x)} \mathbb{I}_{y< x} \mathrm{d}F(y) + \int_{\{x\}} 1 \mathrm{d}F(y) + \int_{(x, \infty)} \mathbb{I}_{y< x} \mathrm{d}F(y) \\ & = \int_{(-\infty, x)} \mathbb{I}_{y< x} \mathrm{d}F(y) + \int_{\{x\}} 1 \mathrm{d}F(y) + \int_{(x, \infty)} \mathbb{I}_{y< x} \mathrm{d}F(y) \\ & = \int_{(-\infty, x) \cup (x, \infty)} \lim_{n \to \infty} \sum_{k=0}^{nx} e^{-n y} \frac{(ny)^k}{k!} \mathrm{d}F(y) + \int_{\{x\}} 1 \mathrm{d}F(y) \\ & = \lim_{n \to \infty} \sum_{k=0}^{nx} \frac{(-n)^k}{k!}\int_{(-\infty, x) \cup (x, \infty)} e^{-n y} {(-y)^k} \mathrm{d}F(y) + \int_{\{x\}} 1 \mathrm{d}F(y)\\ & \text{by dominated convergence} \\ & = \lim_{n \to \infty} \sum_{k=0}^{nx} \frac{(-n)^k}{k!} \frac{\mathrm{d}^kU}{\mathrm{d}\lambda^k}_{\mid \lambda=n} \, . \end{array} \]

If \(F\) is continuous at \(x\), \[ F(x) = \lim_{n \to \infty} \sum_{k=0}^{nx} \frac{(-n)^k}{k!} \frac{\mathrm{d}^kU}{\mathrm{d}\lambda^k}_{\mid \lambda=n} \, . \]

If \(F\) jumps at \(x\),

\[ F(x) - \frac{P\{X=x\}}{2} =\lim_{n \to \infty} \sum_{k=0}^{nx} \frac{(-n)^k}{k!} \frac{\mathrm{d}^kU}{\mathrm{d}\lambda^k}_{\mid \lambda=n} \, . \]

This process shows that the Laplace transform contains enough information to reconstruct the distribution function which in turn characterizes the probability distribution.

Laplace transforms of sums of independent non-negative random variables are easily obtained from the Laplace transforms of the summands.

Proposition 12.2 Let \(X\) and \(Y\) be two independent \([0,\infty)\)-valued random variables, with Laplace transforms \(U_X\) and \(U_Y\). The Laplace transform of (the distribution of) \(X+Y\) is

\[ G_{X+Y} = G_X \times G_Y \, . \]

Proof. \[ \begin{array}{rl} G_{X+Y}(\lambda) & = \mathbb{E}\Big[\mathrm{e}^{\lambda (X+Y)}\Big] \\ & = \mathbb{E}\Big[\mathrm{e}^{\lambda X} \times \mathrm{e}^{\lambda Y}\Big] \\ & = \mathbb{E}\Big[\mathrm{e}^{\lambda X} \Big] \times \mathbb{E}\Big[\mathrm{e}^{\lambda Y}\Big]\\ & \text{independence}\\ & = G_X(\lambda) \times G_Y(\lambda) \, . \end{array} \]

Combining the inversion theorem and the explicit formula for the Laplace transform of Gamma distributions, we recover the fact that sums of independent Gamma-distributed random variables with the same intensity parameter is also Gamma distributed.

Corollary 12.3 If \(X \sim \text{Gamma}(p, \lambda)\) is independent from \(Y \sim \text{Gamma}(q, \lambda)\) then \(X+Y\) has Laplace transform \(\Big(\frac{\nu}{\lambda+\nu}\Big)^{p+q}\) and is \(\text{Gamma}(p+q, \lambda)\)-distributed.

12.2.3 Laplace transform smoothness and integrability

12.3 Characteristic functions and Fourier transforms

The Laplace transform characterizes probability distributions supported by \([0, \infty)\). Characteristic functions deal with general probability distributions. They extend to multivariate distributions.

12.3.1 Characteristic function

The next transform can be defined for all probability distributions over \(\mathbb{R}\). And the definition can be extended to distributions on \(\mathbb{R}^k, k\geq 1\).

Definition 12.2 (Characteristic function) Let the real-valued random variable \(X\) be distributed according to \(P\) with cumulative distribution function \(F\), the characteristic function of distribution \(P\) is the function from \(\mathbb{R}\) to \(\mathbb{C}\) defined by \[ \widehat{F}(t) = \mathbb{E}\left[\mathrm{e}^{i t X}\right] = \int_{\mathbb{R}} \mathrm{e}^{i t x} \mathrm{d}F(x) = \int_{\mathbb{R}} \cos(t x) \mathrm{d}F(x) + i \int_{\mathbb{R}} \sin(t x) \mathrm{d}F(x) \, . \]

Remark 12.2. If \(F\) is absolutely continuous with density \(f\) then \(\widehat{F}\) is (up to a multiplicative constant) the Fourier transform of \(f\).

Proposition 12.3 Let the real-valued random variable \(X\) be distributed according to \(P\) with characteristic function \(\widehat{F}\).

\(\widehat{F}\) is (uniformly) continuous over \(\mathbb{R}\)
\(\widehat{F}(0)=1\)
If \(X\) is symmetric, \(\widehat{F}\) is real-valued
The characteristic function of the distribution of \(a X +b\) is \[\mathrm{e}^{it b} \widehat{F}(at) \, .\]

Proof. Let us check the continuity property. The three others are left as exercises.

Trigonometric calculus leads to \[ \begin{array}{rl} \Big| \mathrm{e}^{i(t+ \delta)x} - \mathrm{e}^{itx}\Big| & = \Big| \mathrm{e}^{itx}\Big| \times \Big|\mathrm{e}^{i\delta x} - 1\Big|\\ & \leq \Big|\mathrm{e}^{i\delta x} - 1\Big| \\ & \leq 2 \Big( 1 \wedge \big| \delta x \big| \Big) \end{array} \]

for every \(t\in \mathbb{R}, \delta \in \mathbb{R}, x \in \mathbb{R}\). Taking integration with respect to \(F\),

\[ \begin{array}{rl} \Big| \widehat{F}(t+\delta) - \widehat{F}(t)\Big| & \leq \int 2 \Big( 1 \wedge \big| \delta x \big| \Big) \mathrm{d}F(x) \,. \end{array} \]

Resorting to the dominated convergence theorem, we conclude
\[ \lim_{\delta \to 0} \Big| \widehat{F}(t+\delta) - \widehat{F}(t)\Big| = 0 \]

uniformly in \(t\).

Exercise 12.2 The next properties are easily checked:

\(|\widehat{F}(t)|\leq 1\) for every \(t\in \mathbb{R}\);

Exercise 12.3 Compute the characteristic function of:

The Poisson distribution with parameter \(\lambda>0\);
The uniform distribution on \([-1,1]\);
The triangle distribution on \([-1,1]\) (density: \(1-|x|\) on \([-1,1]\));
The Laplace distribution, density \(1/2 \exp(-|x|)\).
The exponential distribution with density \(\exp(-x)\) on \([0,+\infty)\);

Just as Probability Generating Functions and Laplace transforms, Characteristic functions of sums of independent random variables have a simple structure.

Proposition 12.4 Let \(X\) and \(Y\) be independent random variables with cumulative distribution functions \(F_X\) and \(F_Y\), then

\[\widehat{F}_{X+Y}(t) = \widehat{F}_X(t) \times \widehat{F}_Y(t)\]

for all \(t \in \mathbb{R}\).

Proof. The third equality is a consequence of the independence of \(X\) and \(Y\): \[\begin{array}{rl} \widehat{F}_{X+Y}(t) & = \mathbb{E}\Big[\mathrm{e}^{it (X+Y)}\big] \\ & = \mathbb{E}\Big[\mathrm{e}^{it X} \mathrm{e}^{it Y}\big] \\ & = \mathbb{E}\Big[\mathrm{e}^{it X} \big] \times \mathbb{E}\big[\mathrm{e}^{it Y}\big] \\ & = \widehat{F}_X(t) \times \widehat{F}_Y(t) \, . \end{array} \]

Exercise 12.4 Use a counter-example to prove that \[ \Big(\forall t \in \mathbb{R}, \quad \widehat{F}_{X+Y}(t) = \widehat{F}_X(t) \times \widehat{F}_Y(t) \Big) \not\Rightarrow X \perp\!\!\!\perp Y \, . \]

12.3.2 Characteristic function of a univariate Gaussian distribution

It is possible to compute characteristic functions by resorting to Complex Analysis. But we shall refrain from this when computing the most important characteristic function, the characteristic function of the standard Gaussian distribution.

Proposition 12.5 Let \(\widehat{\Phi}\) denote the characteristic function of the standard univariate Gaussian distribution \(\mathcal{N}(0,1)\), the following holds

\[\widehat{\Phi}(t) = \mathrm{e}^{-\frac{t^2}{2}} \, .\]

Proof. Recall that as the standard Gaussian density is even, the characteristic function is real-valued and even.

Moreover, \(\widehat{\Phi}\) is differentiable and the derivative can be computing by interverting expectation and derivation with respect to \(t\). \[ \begin{array}{rl} \widehat{\Phi}'(t) & = - \mathbb{E}\left[X \sin(t X) \right] \\ & = - \frac{1}{\sqrt{2 \pi}}\int_{\mathbb{R}} x \sin(tx) \mathrm{e}^{-\frac{x^2}{2}} \mathrm{d}x \\ & = \frac{1}{\sqrt{2 \pi}} \Big[\sin(tx) \mathrm{e}^{-\frac{x^2}{2}} \Big]_{-\infty}^{\infty} - t \frac{1}{\sqrt{2 \pi}}\int_{\mathbb{R}} \cos(tx) \mathrm{e}^{-\frac{x^2}{2}} \mathrm{d}x \\ & = - t \widehat{\Phi}(t) \,. \end{array} \] Hence, \(\widehat{F}\) is a solution of the differential equation: \(g'(t) = -t g(t)\) with \(g(0)=1\).

The differential equation is readily solved, and the solution is \(g(t)= \mathrm{e}^{- \frac{t^2}{2}}\).

Exercise 12.5 Why is \(\widehat{\Phi}\) differentiable? Why are we allowed to interchange expectation and derivation?

Note that a byproduct of Proposition 12.5 is the following integral representation of the Gaussian density.

\[ \phi(x) = \frac{1}{2 \pi} \int_{\mathbb{R}} \widehat{\Phi}(t) \mathrm{e}^{-itx} \mathrm{d}t \, . \]

It does not look interesting, but it is a milestone for the derivation of the general inversion formula below.

12.3.3 Sums of independent random variables and convolutions

The interplay between Characteristic functions/Fourier transforms and summation of independent random variables is one of the most attractive features of this transformation. In order to understand it, we shall need an operation stemming from analysis. Recall that if \(f\) and \(g\) are two integrable functions, the convolution of \(f\) and \(g\) is defined as \[ f \star g (x) = \int_{\mathbb{R}} f(x-y)g(y) \mathrm{d}y = \int_{\mathbb{R}} g(x-y)f(y) \mathrm{d}y \, . \]

Note that \(f \star g\) is also integrable. It is not too hard to check that if \(f\) and \(g\) are two probability densities then so is \(f \star g\), moreover \(f \star g\) is the density of the distribution of \(X+Y\) where \(x \sim f\) is independent from \(Y \sim g\). The next proposition extends this observation.

Proposition 12.6 Let \(X,Y\) be two independent random variables with distributions \(P_X\) and \(P_Y\). Assume that \(P_X\) is absolutely continuous with density \(p_X\). Then the distribution of \(X+Y\) is absolutely continuous and has density \[ p_x \star P_Y (z) = \int_{\mathbb{R}} p_X(z -y ) \mathrm{d}P_Y(y) \, . \]

Proof. Let \(B\) be Borel subset of \(\mathbb{R}\). \[ \begin{array}{rl} P \Big\{ X+Y \in B\Big\} & = \int_{\mathbb{R}} \Big( \int_{\mathbb{R}} \mathbb{I}_B(x+y) p_X(x)\mathrm{d}x\Big) \mathrm{d}P_Y(y) \\ & = \int_{\mathbb{R}} \Big(\int_{\mathbb{R}} \mathbb{I}_B(z) p_X(z-y)\mathrm{d}z\Big) \mathrm{d}P_Y(y) \\ & = \int_{\mathbb{R}} \mathbb{I}_B(z) \Big(\int_{\mathbb{R}} p_X(z-y) \mathrm{d}P_Y(y) \Big) \mathrm{d}z \\ & = \int_{\mathbb{R}} \mathbb{I}_B(z) p_x \star P_Y (z) \mathrm{d}z \end{array} \] where the first equality follows from the Tonelli-Fubini Theorem, the second equality is obtained by change of variable \(x \mapsto z = x+y\) for every \(y\), the third equality follows again from the Tonelli-Fubini Theorem.

Remark 12.3. Convolution is not tied to Probability theory.

In Analysis, convolution is known to be a regularizing (smoothing) operation. This also holds in Probability theory: if the distribution of either \(X\) or \(Y\) has a density and \(X \perp\!\!\!\perp Y\), then the distribution of \(X+Y\) has a density.
Convolution with smooth distributions plays an important role in non-parametric statsitics, it is at the root of kernel density estimation.
Convolution is an important tool in Signal Processing.

Exercise 12.6 Check that if \(X\) and \(Y\) are independent with densities \(f_X\) and \(f_Y\), \(f_X \star f_Y\) is a density of the distribution of \(X+Y\).

If \(Y =0\) almost surely (its distribution is \(\delta_0\)), then \(p_X \star \delta_0 = p_X\).

What happens in Proposition 12.6 if we consider the distributions of \(\sigma X +Y\) and let \(\sigma\) decrease to \(0\)? This is the content of the next proposition.

Proposition 12.7 Let \(X,Y\) be two independent random variables with distributions \(P_X\) and \(P_Y\). Assume that \(P_X\) is absolutely continuous with density \(p_X\) and that \(P_X(-\infty, 0] = \alpha \in (0,1)\). Then \[ \lim_{\sigma \downarrow 0} \mathbb{P}\big\{ Y + \sigma X \leq a \Big\} = P_Y(-\infty, a) + \alpha P_Y\{a\} \, . \]

Proof. \[\begin{array}{rl} \mathbb{P}\big\{ Y + \sigma X \leq a \Big\} & = \int_{\mathbb{R}} \int_{\mathbb{R}} \mathbb{I}_{x \leq \frac{a-y}{\sigma}} p_X(x) \mathrm{d}x \mathrm{d}P_Y(y) \\ & = \int_{(-\infty,a)} \int_{\mathbb{R}} \mathbb{I}_{x \leq \frac{a-y}{\sigma}} p_X(x) \mathrm{d}x \mathrm{d}P_Y(y) \\ & + \int_{\mathbb{R}} \mathbb{I}_{x \leq \frac{a-a}{\sigma}} p_X(x) \mathrm{d}x P_Y\{a\} \\ & + \int_{(a, \infty)} \int_{\mathbb{R}} \mathbb{I}_{x \leq \frac{a-y}{\sigma}} p_X(x) \mathrm{d}x \mathrm{d}P_Y(y) \end{array} \]

By monotone convergence, the first and third integrals converge respectively to \(P_Y(-\infty, a)\) and \(0\) while the second term equals \(\alpha P_Y\{a\}\).

12.3.4 Injectivity Theorem and inversion formula

The characteristic function maps probability measures to \(\mathbb{C}\)-valued functions. The main result of this section is that characteristic functions/Fourier transforms define is an injective operator on the set of Probability measures on the real line.

Theorem 12.4 If two probability distribution \(P\) and \(Q\) have the same characteristic function, they are equal.

The injectivity property follows from an explicit inversion recipe. The characteristic function allows us to recover the cumulative distribution function at all its continuity points (just as the Laplace transform did). Again, as continuity points of cumulative distribution functions are dense on \(\mathbb{R}\), this is enough.

Proposition 12.8 Let \(X \sim F\) and \(Z \sim \mathcal{N}(0,1)\) be independent. Then:

the distribution of \(X+ \sigma Z\) has characteristic function \[ \widehat{F}_\sigma(t) = \widehat{\Phi}(t\sigma) \times \widehat{F}(t) = \mathrm{e}^{- \frac{t^2 \sigma^2}{2}} \widehat{F}(t) \]
the distribution of \(X + \sigma Z\) is absolutely continuous with respect to Lebesgue measure
a version of the density of the distribution of \(X+ \sigma Z\) is given by \[ y \mapsto \frac{1}{{2 \pi}}\int_{\mathbb{R}} \mathrm{e}^{- \frac{t^2 \sigma^2}{2}} \widehat{F}(t)\mathrm{e}^{-ity} \mathrm{d}t = \frac{1}{{2 \pi}}\int_{\mathbb{R}} \widehat{F}_\sigma(t)\mathrm{e}^{-ity} \mathrm{d}t \,. \]

Exercise 12.7 Why can we take for granted the existence of a probability space with two independent random variables \(X, Z\) distributed as above?

The proposition states that a density of the distribution of \(X + \sigma Z\) can be recovered from the characteristic function of the distribution of \(X + \sigma Z\) by the Fourier inversion formula for functions with integrable Fourier transforms.

Proof. The fact that for any \(\sigma >0\), the distribution of \(Y = X + \sigma Z\) is absolutely continuous with respect to Lebesgue measure comes from Proposition 12.6.

A density of the distribution of \(X + \sigma Z\) is given by

\[ \int_{\mathbb{R}} \frac{1}{\sigma} \phi\Big(\frac{y -x}{\sigma}\Big) \mathrm{d}F(x) \]

The characteristic function of \(X+\sigma Z\) at \(t\) is \(\mathrm{e}^{- \frac{t^2 \sigma^2}{2}} \widehat{F}(t)\).

\[\begin{array}{rl} \mathbb{P}\Big\{ X+ \sigma Z \leq u\Big\} & = \int_{-\infty}^u \int_{\mathbb{R}} \frac{1}{\sigma} \phi\Big(\frac{y -x}{\sigma}\Big) \mathrm{d}F(x) \mathrm{d}y \\ & = \int_{-\infty}^u \int_{\mathbb{R}} \frac{1}{\sigma} \left(\frac{1}{{2 \pi}} \int_{\mathbb{R}} \mathrm{e}^{- \frac{t^2}{2}} \mathrm{e}^{-it \frac{y-x}{\sigma}} \mathrm{d}t\right) \mathrm{d}F(x) \mathrm{d}y \\ & = \int_{-\infty}^u \left(\int_{\mathbb{R}} \frac{1}{\sigma} \frac{1}{{2 \pi}} \mathrm{e}^{- \frac{t^2}{2}} \mathrm{e}^{-\frac{ity}{\sigma}} \left(\int_{\mathbb{R}} \mathrm{e}^{\frac{itx}{\sigma}} \mathrm{d}F(x)\right) \mathrm{d}t\right) \mathrm{d}y \\ & = \int_{-\infty}^u \left(\int_{\mathbb{R}} \frac{1}{\sigma} \frac{1}{{2 \pi}} \mathrm{e}^{- \frac{t^2}{2}} \mathrm{e}^{-\frac{ity}{\sigma}} \widehat{F}(t/\sigma) \mathrm{d}t\right) \mathrm{d}y \\ & = \int_{-\infty}^u \left( \frac{1}{{2 \pi}}\int_{\mathbb{R}} \mathrm{e}^{- \frac{t^2 \sigma^2}{2}}\mathrm{e}^{-ity} \widehat{F}(t)\mathrm{d}t \right) \mathrm{d}y \, \end{array} \]

where

first equality comes from the Tonelli-Fubini Theorem
second eqality comes from the integral representation for the Gaussian density
third equality comes from Tonelli-Fubini Theorem again
last equality follows by change of variable in the inner integral.

The quantity \(\left( \frac{1}{{2 \pi}}\int_{\mathbb{R}} \mathrm{e}^{- \frac{t^2 \sigma^2}{2}}\mathrm{e}^{-ity} \widehat{F}(t)\mathrm{d}t \right)\) is a version of the density of the distribution of \(Y = X + \sigma Z\) (why?). Note that it is obtained from the same inversion formula that readily worked for the Gaussian density.

Now we have to show that an inversion formula works for all probability distributions, not only for the smooth probability distributions obtained by adding Gaussian noise. We shall check that we can recover the distribution function from the Fourier transform.

Theorem 12.5 Let \(X\) be distributed according to \(P\), with cumulative distribution function \(F\) and characteristic function \(\widehat{F}\).

Then: \[ \lim_{\sigma \downarrow 0} \int_{-\infty}^u \left( \frac{1}{{2 \pi}}\int_{\mathbb{R}} \mathrm{e}^{-ity} \mathrm{e}^{- \frac{t^2 \sigma^2}{2}}\widehat{F}(t)\mathrm{d}t \right) \mathrm{d}y = F(u_-) + \frac{1}{2} P\{u\} \]

where \[ F(u_-) = \lim_{v \uparrow u} F(v) = P(-\infty, u)\, . \]

Proof. The proof consists in combining Proposition 12.7 and Proposition 12.8.

Note that Theorem 12.5 does not deliver directly the distribution function \(F\). Indeed, if \(F\) is not continuous, \(u \mapsto \widetilde{F}(u) = F(u_-) + \frac{1}{2} P\{u\}\), is not a distribution function. But the right-continuous modification of \(\widetilde{F}\): \(u \mapsto \lim_{v \downarrow u} \widetilde{F}(v)\) coincides with \(F\). We have established Theorem 12.4).

When the distribution function is absolutely continuous, Fourier inversion is simpler.

Theorem 12.6 Let \(X\) be distributed according to \(P\), with cumulative distribution function \(F\) and characteristic function \(\widehat{F}\). Assume that \(\widehat{F}\) is integrable (with respect to Lebesgue measure). Then:

\(P\) is absolutely continuous with respect to Lebesgue measure;
\(y \mapsto \frac{1}{{2 \pi}} \int_{\mathbb{R}} \widehat{F}(t) \mathrm{e}^{-ity} \mathrm{d}t\) is a uniformly continuous version of the density of \(P\).

Proof. Let \(X\) be distributed according to \(P\) with cumulative distribution function \(F\) and characteristic function \(\widehat{F}\). Let \(Z\) be independent from \(X\) and \(\mathcal{N}(0,1)\). Let \(x\) be a continuity point of \(F\).

\[ \lim_{\sigma \downarrow 0} P\Big\{ X + \sigma Z \leq x \Big\} = F(x) \]

\[\begin{array}{rl} \lim_{\sigma \downarrow 0} P\Big\{ X + \sigma Z \leq x \Big\} & = \lim_{\sigma \downarrow 0} \int_{-\infty}^x \left( \frac{1}{{2 \pi}}\int_{\mathbb{R}} \mathrm{e}^{- \frac{t^2 \sigma^2}{2}}\mathrm{e}^{-ity} \widehat{F}(t)\mathrm{d}t \right) \mathrm{d}y \\ & = \int_{-\infty}^x \frac{1}{{2 \pi}}\int_{\mathbb{R}} \lim_{\sigma \downarrow 0} \mathrm{e}^{- \frac{t^2 \sigma^2}{2}}\mathrm{e}^{-ity} \widehat{F}(t)\mathrm{d}t \mathrm{d}y \\ & = \int_{-\infty}^x \frac{1}{{2 \pi}}\int_{\mathbb{R}} \mathrm{e}^{-ity} \widehat{F}(t)\mathrm{d}t \mathrm{d}y \, \end{array} \]

where interversion of limit and integration is justified by dominated convergence.

We close this section by an alternative inversion formula.

Theorem 12.7 (Inversion formula) Let \(P\) be a probability distribution over \(\mathbb{R}\) with cumulative distribution function \(F\), then \[ \lim_{T \to \infty} \frac{1}{2\pi} \int_{-T}^T \frac{\mathrm{e}^{-it a} - \mathrm{e}^{-it b}}{it} \widehat{F}(t) \mathrm{d}t = F(b_-) - F(a) + \frac{1}{2} \Big(P\{b\} + P\{a\}\Big) \, . \]

The proof of Theorem 12.7) can be found in textbooks like (Durrett, 2010) or (Billingsley, 2012).

Corollary 12.4 Let \(\widehat{F}\) denote the characteristic function of the probability distribution \(P\), if \(\widehat{F}(t) = \mathrm{e}^{-\frac{t^2}{2}}\), then \(P\) is the standard univariate Gaussian distribution (\(\mathcal{N}(0,1)\)).

Corollary 12.5 Let \(\widehat{F}\) denote the characteristic function of probability distribution \(P\), if \(\widehat{F}(t) = \mathrm{e}^{i\mu t -\frac{\sigma^2 t^2}{2}}\), then \(P\) is the Gaussian distribution ( \(\mathcal{N}(\mu,\sigma^2)\) ).

Another important byproduct of the proof of injectivity of the characteristic function is Stein’s identity, an important property of the standard Gaussian distribution.

Theorem 12.8 (Stein’s identity) Let \(X \sim \mathcal{N}(0,1)\), and \(g\) be a differentiable function such that \(\mathbb{E}|g'(X)|< \infty\), then

\[ \mathbb{E}[g'(X)] = \mathbb{E}[Xg(X)] \, . \]

Conversely, if \(X\) is a random variable such that

\[ \mathbb{E}[g'(X)] = \mathbb{E}[X g(X)] \]

holds for any differentiable funtion \(g\) such that \(g'\) is integrable, then \(X \sim \mathcal{N}(0,1)\).

Proof. The direct part follows by integration by parts.

To check the converse, note that if \(X\) satisfies the identity in the Theorem, then for all \(t \in \mathbb{R}\), the functions \(t \mapsto \mathbb{E} \cos(tX)\) and \(t \mapsto \mathbb{E} \sin(tX)\) satisfy the differential equation \(g'(t) = t g(t)\) with conditions \(\mathbb{E} \cos(0X)=1\) and \(\mathbb{E} \sin(0X) =0\). This entails \(\mathbb{E} \mathrm{e}^{itX} = \exp\Big(-\frac{t^2}{2}\Big)\), that is \(X \sim \mathcal{N}(0,1)\)

12.3.5 Differentiability and integrability

Differentiability of the Fourier transform at \(0\) and integrability are intimately related.

Theorem 12.9 If \(X\) is \(p\)-integrable for some \(p \in \mathbb{N}\) then the Fourier transform of the distribution of \(X\) is \(p\)-times differentiable at \(0\) and the \(p^{\text{th}}\) derivative equals \(i^k \mathbb{E}X^k\).

Proof. The proof relies on a Taylor expansion with remainder of \(x \mapsto \mathrm{e}^{ix}\) at \(x=0\):

\[ \mathrm{e}^{ix} - \sum_{k=0}^n \frac{(ix)^k}{k!} = \frac{i^{n+1}}{n!} \int_0^x (x-s)^n \mathrm{e}^{is} \mathrm{d}s \, . \] The modulus of the right hand side can be upper-bounded in two different ways. \[ \frac{1}{n+!}\Big| \int_0^x (x-s)^n \mathrm{e}^{is} \mathrm{d}s \Big| \leq \frac{|x|^{n+1}}{(n+1)!} \] which is good when \(|x|\) is small. To handle large values of \(|x|\), integration by parts leads to \[ \frac{i^{n+1}}{n!} \int_0^x (x-s)^n \mathrm{e}^{is} \mathrm{d}s = \frac{i^{n}}{(n-1)!} \int_0^x (x-s)^{n-1} \left(\mathrm{e}^{is}-1\right) \mathrm{d}s \,. \] The modulus of the right hand side can be upper-bounded by \(2|x|^n/n!\).

Applying this Taylor expansion to \(x=t X\), using the pointwise upper bounds and taking expectations leads to \[\begin{array}{rl} \Big| \widehat{F}(t) - \sum_{k=0}^n \mathbb{E}\frac{(itX)^k}{k!} \Big| & \leq \mathbb{E} \Big[\min\Big( \frac{|tX|^{n+1}}{(n+1)!} ,2 \frac{|tX|^n}{n!}\Big)\Big] \\ & = \frac{|t|^n}{(n+1)!} \mathbb{E} \Big[\min\Big(|t||X|^{n+1} ,2 (n+1) |X|^n \Big)\Big] \, . \end{array} \] Note that the right hand side is well defined as soon as \(\mathbb{E}|X|^n < \infty\). Now, by dominated convergence, \[ \lim_{t \to 0} \mathbb{E} \Big[\min\Big(|t||X|^{n+1} ,2 (n+1) |X|^n \Big)\Big] = 0\, \] Hence we have established that if \(\mathbb{E}|X|^n < \infty\), \[ \widehat{F}(t) = \sum_{k=0}^n i^k \mathbb{E}X^k \frac{t^k}{k!} + o(|t|^n) \, . \]

In the other direction, the connection is not as simple: differentiability of the Fourier transform does not imply integrability. But the following holds.

Theorem 12.10 If the Fourier transform \(\widehat{F}\) of the distribution of \(X\) satisfies

\[ \lim_{h \downarrow 0} \frac{2 - \widehat{F}(h) - \widehat{F}(-h)}{h^2} = \sigma^2 < \infty \]

then \(\mathbb{E}X^2 = \sigma^2\).

Proof. Note that

\[ 2 - \widehat{F}(h) - \widehat{F}(-h) = 2\mathbb{E}\Big[1 - \cos(hX)\Big] \, , \]

and using Taylor with remainder formula for \(\cos\) at \(0\):

\[ 1 - \cos x = \int_0^x \cos(s) (x-s) \mathrm{d}s = x^ 2 \int_0^1 \cos(sx) (1-s) \mathrm{d}s \]

Note that \(\int_0^1 \cos(sx) (1-s) \mathrm{d}s\geq 0\) for all \(x \in \mathbb{R}\).

\[ \begin{array}{rl} \frac{2\mathbb{E}\Big[1 - \cos(hX)\Big]}{h^2} & = 2 \frac{\mathbb{E}\Big[ h^2 X^2 \int_0^1 \cos(shX) (1-s) \mathrm{d}s\Big]}{h^2} \\ & = 2 \mathbb{E}\Big[ X^2 \int_0^1 \cos(shX) (1-s) \mathrm{d}s\Big] \, . \end{array} \]

By Fatou’s Lemma:

\[ \sigma^2 = \lim_{h \downarrow 0} 2\mathbb{E}\Big[ X^2 \int_0^1 \cos(shX) (1-s) \mathrm{d}s\Big] \geq 2\mathbb{E}\Big[\liminf_{h \downarrow 0} X^2 \int_0^1 \cos(shX) (1-s) \mathrm{d}s \Big] \]

but for all \(x \in \mathbb{R}\), by dominated convergence,

\[ \liminf_{h \downarrow 0} x^ 2 \int_0^1 \cos(shx) (1-s) \mathrm{d}s = \frac{x^2}{2} \, . \]

Hence \[ \sigma^2 \geq \mathbb{E} X^2 \, . \]

The proof is completed by invoking Theorem 12.9).

12.3.6 Another application: understanding Cauchy distribution

Assume \(U\) is uniformly distributed over \(]0,1[\), let the real valued random variable \(X\) be defined by

\[ X = \tan\left(\frac{\pi}{2} (2 \times U -1)\right)\, . \]

As \(\tan\) is continuously increasing from \(-\pi/2\) to \(\pi/2\), the cumulative distribution function of the distribution of \(X\) is

\[ \begin{array}{rl} \mathbb{P}\{ X \leq x\} & = \mathbb{P}\left\{\tan\left(\frac{\pi}{2}(2U-1)\right) \leq x\right\} \\ & = \mathbb{P}\left\{U \leq \frac{1}{2} + \frac{1}{\pi}\arctan(x) \right\} \\ & = \frac{1}{2} + \frac{1}{\pi}\arctan(x) \end{array} \]

for \(x \in \mathbb{R}\).

As \(\arctan\) has derivative \(x \mapsto \frac{1}{1+x^2}\), the cumulative distribution function is absolutely continuous with density:

\[\frac{1}{\pi} \frac{1}{1 + x^2}\]

This is the density of the Cauchy distribution.

Note that \(\mathbb{E}(X)_+ = \mathbb{E} (X)_- = \mathbb{E}|X| =\infty\). The Cauchy distribution is not integrable.

Now, assume \(X_1, X_2,, \ldots, X_n\) are i.i.d. and Cauchy distributed. Let \(Z = \sum_{i=1}^n X_i/n\). How is \(Z\) distributed? We might compute the convolution power of the Cauchy density. It turns out that starting from the characteristic function is much more simple.

We refrain from computing directly the characteristic function of the Cauchy distribution. We take a roundabout.

Let \(Y\) be distributed according to Laplace distribution, that is with density \(y \mapsto \frac{1}{2} \exp(-|y|)\) for \(y \in \mathbb{R}\). The random variable \(Y\) is symmetric (\(Y \sim -Y\)). Let \(\widehat{F}_Y\) denote the characteristic function of (the distribution of) \(Y\).

\[ \begin{array}{rl} \widehat{F}_Y(t) & = \mathbb{E}\mathrm{e}^{tY} \\ & = \mathbb{E}\cos(tY) \\ & = \int_0^{\infty} \mathrm{e}^{-y} \cos(ty) \mathrm{d}y \\ & = \left[- \mathrm{e}^{-y} \cos(ty)\right]_0^\infty - t \int_{0}^\infty \mathrm{e}^{-ty} \sin(ty) \mathrm{d}y \\ & = 1 - t \int_{0}^\infty \mathrm{e}^{-y} \sin(ty) \mathrm{d}y \\ & = 1 -t \left[- \mathrm{e}^{-y} \sin(ty)\right]_0^\infty - t^2 \int_0^\infty \mathrm{e}^{-y} \cos(ty) \mathrm{d}y \\ & = 1 - t^2 \widehat{F}_Y(t) \end{array} \]

where we have performed integration by parts twice.

The characteristic function \(\widehat{F}_Y\) satisfies

\[\widehat{F}_Y(t) = \frac{1}{1+ t^2}\, ,\]

up to \(\frac{1}{\pi}\), this is the density of the Cauchy distribution.

\[ \begin{array}{rl} \widehat{F}_X(t) & = \mathbb{E}\mathrm{e}^{itX}\\ & = \int_{-\infty}^{\infty} \frac{1}{\pi} \frac{1}{1+x^2} \cos(tx)\mathrm{d}x \\ & = \frac{2}{\pi} \int_0^\infty \cos(tx) \widehat{F}_Y(x) \mathrm{d}x \\ & = 2 \times \frac{1}{2\pi} \int_{-\infty}^{\infty} \mathrm{e}^{-itx} \widehat{F}_Y(x) \mathrm{d}x \\ & = 2 \times \frac{1}{2} \mathrm{e}^{-|t|} = \mathrm{e}^{-|t|} \end{array} \]

where we have used the inversion formula.

Now, the characteristic function of the distribution of \(Z\) is \[\widehat{F}_Z(t) =\left(\mathrm{e}^{-\frac{|t|}{n}}\right)^n= \widehat{F}_X(t)\]

which means \(Z \sim X\).

The basic tools of characteristic functions theory allow us to

compute the characteristic function of the Laplace distribution
compute the characteristic function of the Cauchy distribution by inversion
compute the characteristic function of sums of independant Cauchy random variables
show that the Cauchy distribution is \(1\)-stable.

Remark 12.4. The density of the Laplace distribution is not differentiable at \(0\), this is reflected in the fact that its Fourier transform (the characteristic function of the Laplace distribution) is not integrable.

Conversely the lack of integrability of the Cauchy distribution is reflected in the non-differentiability of its characteristic function at \(0\).

Exercise 12.8 Check that \(\int_0^1 \cos(sx) (1-s) \mathrm{d}s\geq 0\) for all \(x \in \mathbb{R}\).

Hint: Check that \(t \mapsto \int_0^1 \cos(tw) (1-w) \mathrm{d}w\) is the characteristic function of the tent distribution which has density \((1-|w|)\) over \([-1,1]\). Check that this characteristic function is the squared modulus of another characteristic function.

12.4 Bibliographic remarks

Wilf (2005) explores the interplay between combinatorics, algorithm analysis and generating function theory.

Widder (2015) is a classic reference on Laplace transforms. Laplace transforms play an important role in Point Process Theory, and Extreme Value Theory, to name a few fields of application.

The first part of Chapter 9 from Pollard (2002) describes characteristic functions as Fourier transforms. Properties and applications of characteristic functions are thoroughly discussed in (Durrett, 2010), (Billingsley, 2012).

Billingsley, P. (2012). Probability and measure. John Wiley & Sons, Inc., Hoboken, NJ.

Durrett, R. (2010). Probability: Theory and examples. Cambridge University Press.

Pollard, D. (2002). A user’s guide to measure theoretic probability (Vol. 8, p. xiv+351). Cambridge University Press, Cambridge.

Widder, D. V. (2015). Laplace transform (PMS-6). Princeton university press.

Wilf, H. S. (2005). Generatingfunctionology. AK Peters/CRC Press.