7 Product distributions

In this lesson, we construct product measures. We start with two measured spaces \((\mathcal{X}, \mathcal{F}, \mu)\) and \((\mathcal{Y}, \mathcal{G}, \nu)\).

Our goal is to build a measure space \((\mathcal{X}\times \mathcal{Y}, \mathcal{H}, \rho)\) and two measurable functions \(X : \mathcal{X}\times \mathcal{Y} \to \mathcal{X}\) and \(Y : \mathcal{X}\times \mathcal{Y} \to \mathcal{Y}\) with the additional requirements that

\[\mu = \rho \circ X^{-1} \quad \text{and} \quad \nu = \rho \circ Y^{-1}\]

as well as

\[\rho(A \times B) = \mu(A) \times \nu(B) \qquad \forall A \in \mathcal{F}, B\in \mathcal{G} \,.\]

Note that requiring \(X\) and \(Y\) to be measurable is prematurate: we have not defined the \(\sigma\)-algebra \(\mathcal{H}\) over \(\mathcal{X}\times \mathcal{Y}\).

7.1 Product \(\sigma\)-algebras

In order to achieve our goal, we first define a \(\sigma\)-algebra \(\mathcal{H}\) of subsets of \(\mathcal{X} \times \mathcal{Y}\). We use the so-called product \(\sigma\)-algebra.

Definition 7.1 (Product \(\sigma\)-algebra) Let \((\mathcal{X}, \mathcal{F})\) and \((\mathcal{Y}, \mathcal{G})\) be two measurable spaces, the product \(\sigma\)-algebra \(\mathcal{F} \otimes \mathcal{G}\) is the \(\sigma\)-algebra of subsets of \(2^{\mathcal{X} \times \mathcal{Y}}\) that is generated by the so-called rectangles:

\[\Big\{ A \times B : A \in \mathcal{F}, B \in \mathcal{G}\Big\} \, .\]

In words, \[\mathcal{F} \otimes \mathcal{G} = \sigma\left(\mathcal{F} \times \mathcal{G}\right)\]

Proposition 7.1 The product \(\sigma\)-algebra makes the functions \(X\) and \(Y\) (sometimes called coordinate projections) measurable.

Exercise 7.1 Check Proposition 7.1.

Exercise 7.2 Check that the Borel \(\sigma\)-algebra over \(\mathbb{R}^2\) (the \(\sigma\)-algebra generated by open subsets of \(mathbb{R}^2\)) can be described as a product \(\sigma\)-algebra (the \(\sigma\)-algebra generated by the cartesian product of \(\mathcal{B}(\mathbb{R})\) with itself). In words \[\mathcal{B}(\mathbb{R}^2) = \sigma\left(\mathcal{B}(\mathbb{R}) \times \mathcal{B}(\mathbb{R})\right)\]

7.2 Product measures

Once we are equipped with the product \(\sigma\)-algebras, we can proceed to the definition of product measures.

Recall the definition of \(\sigma\)-finite measures from Section 2.6).

A measure \(\mu\) on \((\Omega, \mathcal{F})\) is \(\sigma\)-finite iff there exists \((A_n)_n\) with \(\Omega \subseteq \cup_n A_n\) and \(\mu(A_n) < \infty\) for each \(n\).

Finite measures (this encompasses probability measures) are \(\sigma\)-finite. Lebesgue measure is \(\sigma\)-finite. The counting measure on \(\mathbb{R}\) is not \(\sigma\)-finite.

Theorem 7.1 Let \((\mathcal{X}, \mathcal{F}, \mu)\) and \((\mathcal{Y}, \mathcal{G}, \nu)\) be two measured spaces where \(\mu,\nu\) are \(\sigma\)-finite.

Then there exists a unique \(\sigma\)-finite measure \(\alpha\) on \(\mathcal{X} \times \mathcal{Y}\) endowed with the product \(\sigma\)-algebra \(\mathcal{F} \otimes \mathcal{G} = \sigma(\mathcal{F} \times \mathcal{G})\) that satisfies

\[\alpha (A \times B) = \mu(A) \times \nu(B)\qquad \forall A \in \mathcal{F}, B \in \mathcal{G} \, .\]

Moreover, for all \(E \in \mathcal{F} \otimes \mathcal{G}\),

for each \(x \in \mathcal{X}\), \(y \mapsto \mathbb{I}_E(x,y)\) is \(\mathcal{G}\)-measurable;
\(x \mapsto \int_{\mathcal{Y}} \mathbb{I}_E(x,y) \, \mathrm{d} \nu(y)\) is \(\mathcal{F}\)-measurable;
for each \(y \in \mathcal{Y}\), \(x \mapsto \mathbb{I}_E(x,y)\) is \(\mathcal{F}\)-measurable;
\(y \mapsto \int_{\mathcal{X}} \mathbb{I}_E(x,y) \, \mathrm{d}\mu(x)\) is \(\mathcal{G}\)-measurable,

and the following holds:

\[\begin{align*} \int_{\mathcal{X}\times \mathcal{Y}} \mathbb{I}_E \, \mathrm{d}\alpha & = \int_{\mathcal{X}} \Big(\int_{\mathcal{Y}} \mathbb{I}_E(x,y) \, \mathrm{d} \nu(y)\Big) \, \mathrm{d}\mu(x) \\ & = \int_{\mathcal{Y}} \Big( \int_{\mathcal{X}} \mathbb{I}_E(x,y) \, \mathrm{d}\mu(x)\Big) \,\mathrm{d} \nu(y) \end{align*}\]

where the three integrals are either finite or infinite.

Measure \(\alpha\) is called a product measure, it is sometimes denoted by \(\mu \otimes \nu\).

Remark 7.1. Assuming that both \(\mu\) and \(\nu\) are \(\sigma\)-finite is essential. Choose \(\mu\) as the counting measure on \([0,1]\) and \(\nu\) as the Lebesgue measure on \([0,1]\). Consider the diagonal \(E = \{(x,x) : x \in [0,1]\}\). The set \(E\) belongs to \(\mathcal{B}(\mathbb{R}) \otimes \mathcal{B}(\mathbb{R}) = \mathcal{B}(\mathbb{R}^2)\) (check this). But interchanging the order of integration leads to different results:

\[\begin{array}{rl} 1 & = \int_{[0,1]} \Big(\int_{[0,1]} \mathbb{I}_E (x,y) \, \mathrm{d}\mu(x)\Big) \, \mathrm{d}\nu(y) \\ 0 & = \int_{[0,1]} \Big(\int_{[0,1]} \mathbb{I}_E (x,y) \, \mathrm{d}\nu(y)\Big) \, \mathrm{d}\mu(x) \end{array}\]

Theorem 7.1 contains three statements:

existence of a measure over \((\mathcal{X} \times \mathcal{Y}, \mathcal{F} \otimes \mathcal{G})\) that satisfies the product property over rectangles;
uniqueness of this measure;
the possibility of computing the measure of \(E \in \mathcal{F} \otimes \mathcal{G}\) by iterated integration in arbitrary order.

The first statement (existence) is proved using an extension theorem, the second statement (unicity) follows from a monotone class argument (Theorem 2.4)): rectangles form a generating \(\pi\)-class, so the case where both \(\mu\) and \(\nu\) are finite measure is settled. If either \(\mu\) or \(\nu\) is just \(\sigma\)-finite, consider restrictions to rectangles with finite measure, and proceed by approximation. The third statement trivially holds for rectangles.

Remark 7.2. If \(\mu, \nu\) are probability measures, then the product measure \(\mu \otimes \nu\) is a probability measures, it is called a product probability measure.

7.3 Tonelli-Fubini theorem

In this section, we consider product measures that are built from \(\sigma\)-finite measures as in Theorem 7.1). The Tonelli-Fubini Theorem shows that (under mild conditions) integration with respect to a product measure reduces to iterated integration over the component measures.

Theorem 7.2 (Tonelli-Fubini) Let be \(( \mathcal{X}, \mathcal{A})\) and \((\mathcal{Y}, \mathcal{B})\) two measurable spaces, \(\mu\) and \(\nu\) two \(\sigma\)-finite measures on these spaces, \(\mu \otimes \nu\) the product measure, and \(f\) a \(\mathcal{A} \otimes \mathcal{B}\)-measurable real function such as \(\int |f| \mathrm{d} \mu \otimes \nu < 0\). The the following properties are satisfied:

\(\forall x \in \mathcal{X}, \hspace{1em} y \mapsto f (x, y)\) is \(\mathcal{B}\)-measurable.
The function \(x \mapsto \int_{\mathcal{Y}} f (x, y) \mathrm{d}\nu(y)\) is \(\mathcal{A}\)-measurable, finite \(\mu\)- almost everywhere and \[\int_{\mathcal{X} \times \mathcal{Y}} f \mathrm{d} \mu \otimes \nu = \int_{\mathcal{X}} \left[ \int_{\mathcal{Y}} f (x, y) \mathrm{d} \nu (y) \right] \mathrm{d} \mu (x)\]

Proof. Proof can foud in

The proof consists in establishing the statement for larger and larger classes of measurable functions.

Note first that Theorem 7.1 settles the case for indicators of measurable subsets of \(\mathcal{X}\times \mathcal{Y}\).

From this observation, using linearity, simple positive functions are handled. Then settling the case of non-negative measurable functions over \(\mathcal{X}\times \mathcal{Y}\) uses a monotone convergence argument (Theorem 3.1).

The general case is handled by decomposing the measurable function into the sum of a positive part and a negative part.

The following characterization of the expectation of non-negative random variables as the integral of the tail function is a simple consequence of the Tonelli-Fubini Theorem.

Proposition 7.2 (IPP formula) Let \(X\) be a non-negative real-valued random variable, then

\[\mathbb{E}X = \int_0^\infty P\{ X > t \} \mathrm{d}t\]

Proof. \[\begin{array}{rl} \mathbb{E}X & = \int_{\Omega} X(\omega) \, \mathrm{d}P(\omega) \\ & = \int_{\Omega} \Big( \int_{[0,\infty)} \mathbb{I}_{X(\omega)> t} \mathrm{d}t \Big)\, \mathrm{d}P(\omega) \\ & = \int_{[0,\infty)} \Big( \int_{\Omega} \mathbb{I}_{X(\omega)> t} \, \mathrm{d}P(\omega) \Big) \mathrm{d}t \\ & = \int_{[0,\infty)} \Big( P\{ \omega : X(\omega) > t \} \Big) \mathrm{d}t \end{array}\]

7.4 Joint distributions, independence and product distributions

Let the two random variables \(X, Y\) map \((\Omega, \mathcal{F})\) to \((\mathcal{X}, \mathcal{G})\) and \((\mathcal{Y}, \mathcal{H})\). Equip \((\Omega, \mathcal{F})\) with probability distribution \(P\). Let \(Q_X = P \circ X^{-1}\) and \(Q_Y = P \circ Y^{-1}\) be the two image distributions (called the marginal distributions). We may define a mapping \(Z: \Omega \to \mathcal{X} \times \mathcal{Y}\) by \(Z(\omega) = (X(\omega), Y(\omega))\), this mapping is \(\mathcal{F}/\sigma(\mathcal{G}\times \mathcal{H})\) mesurable.

Let \(Q\) be the joint distribution of \(Z = (X,Y)\) under \(P\), that is the probability distribution over \(\mathcal{X} \times \mathcal{Y}\) endowed with \(\sigma(\mathcal{G}\times \mathcal{H})\) that is uniquely defined by

\[Q( A \times B) = P\Big\{ \omega: X(\omega) \in A, Y(\omega) \in B \Big\} \, .\]

Note that \(Q\) is not necessarily a product distribution.

The next (trivial) proposition tells us that two random variables are independent iff their joint distribution is a product distribution (in fact the product distribution defined by the two marginal distributions).

\[X \perp\!\!\!\perp Y \text{ under } P \Longleftrightarrow Q = Q_X \otimes Q_Y \, ,\]

in words, \(X\) and \(Y\) are independent iff their joint distribution is the product of their marginal distributions.

Proof. TODO: FIX THIS

7.5 Independence of collections of \(\sigma\)-algebras

In many applications, independence between two \(\sigma\)-algebras or a finite collection of \(\sigma\)-algebras is not enough. This is the case when deriving or using laws of large numbers. We have to deal with a countable collection of independent random variables. In words, we have to work with a countable collection of \(\sigma\)-algebras and we need to elaborate a notion of a countable collection of independent \(\sigma\)-algebras.

Let \((\Omega, \mathcal{F}, P)\) be a probability space. Let \(\mathcal{G_1}, \ldots, \mathcal{G}_n, \ldots\) be a countable colletion of sub-\(\sigma\)-algebras.

The collection \(\mathcal{G_1}, \ldots, \mathcal{G}_n, \ldots\) is said to be independent under \(P\)

every finite sub-collection is independent under \(P\).

Example 7.1 Consider the uniform probability distribution over \([0,1]\), define \(X_1, X_2, \ldots\) by

\[X_n(\omega) = \operatorname{sign}\Big(\sin\big(2^{n+1} \pi \omega \big)\Big)\]

then \(X_1, \ldots, X_n, \ldots\) form a countable independent collection of random variables.

7.6 Infinite product spaces

In many modeling scenarios (random walks, branching processes, asymptotic statistics, …), we rely on the availability of an infinite collection of independent random variables. While it is (relatively) easy to come up with the notion of finite product probability spaces, the notion of infinite product probability spaces is more puzzling. And this remains true even if the individual components are finite probability spaces (for example \(\{0, 1\}\), equiped with powerset and uniform distribution).

Thinks of \(\Omega_i = \{0,1\}\) and each \(P_i\) has the balanced Bernoulli distribution. Let \(\omega\) be an infinite sequence of \(o\) and \(1\), \(\{\omega\} = \prod_{i=1}^\infty \{ \omega_i \}\) is an infinite Cartesian product of events with probability \(1/2\). What should be its probability in the infinite product probability space? Is there a way to assign probabilities in a consistent way? If the answer is positive, is there a unique way to perform this operation?

Definition 7.2 (Cylinder \(\sigma\)-algebra) Let \((\Omega_n, \mathcal{F}_n)_n\) be a countable collection of measurable spaces, the cylinder \(\sigma\)-algebra is the \(\sigma\)-algebra of subsets of \(\prod_{n=1}^\infty \Omega_n\) that is generated by subsets of the form:

\[\prod_{n=1}^m A_n \times \prod_{n=m+1}^\infty \Omega_n \qquad\text{with } A_n \in \mathcal{F}_n \text{ for } n \leq m\]

where \(m\) is any integer. The subsets are called finite-dimensional rectangles or cylinders.

Observe that cylinders form a \(\pi\)-class.

If each \((\Omega_n, \mathcal{F}_n)\) is endowed with a probability distribution, assigning a probability to cylinders looks straightforward: \[\mathbb{P} \left( \prod_{n=1}^m A_n \times \prod_{n=m+1}^\infty \Omega_n \right) = \prod_{n=1}^m P_n(A_n) \times \prod_{n=m+1}^\infty P_n(\Omega_n) = \prod_{n=1}^m P_n(A_n) \, .\]

The question is: does \(\mathbb{P}\) extends to the cylinder \(\sigma\)-algebra? If an extension exists, is it unique? The answer is yes.

Theorem 7.3 (Extension theorem (simple version)) Let \((\Omega_n, \mathcal{F}_n, P_n)_n\) be a countable collection of probability spaces. Then there exists a unique probability distribution \(\mathbb{P}\) on the cylindrical \(\sigma\)-algebra that satisfy:

\[\mathbb{P} \left( \prod_{n=1}^m A_n \times \prod_{n=m+1}^\infty \Omega_n \right) = \prod_{n=1}^m P_n(A_n)\]

for every finite sequence \(A_1, \ldots, A_m\) in \(\mathcal{F}_1 \times \ldots \times \mathcal{F}_m\).

7.7 Bibliographic remarks

This material covering this lesson can be found in any book on measure and integration theory. Section 4.4 from (Dudley, 2002) is dedicated to product measures.

Complete proofs of the Tonelli-Fubini Theeorem can be found in (Dudley, 2002).

The existence theorem for infinite product probabilities is from Section 8.2 from (Dudley, 2002). A full proof of the Theorem can be found there.

Dudley, R. M. (2002). Real analysis and probability (Vol. 74, p. x+555). Cambridge: Cambridge University Press.