7 Product distributions
In this lesson, we construct product measures. We start with two measured spaces \((\mathcal{X}, \mathcal{F}, \mu)\) and \((\mathcal{Y}, \mathcal{G}, \nu)\).
Our goal is to build a measure space \((\mathcal{X}\times \mathcal{Y}, \mathcal{H}, \rho)\) and two measurable functions \(X : \mathcal{X}\times \mathcal{Y} \to \mathcal{X}\) and \(Y : \mathcal{X}\times \mathcal{Y} \to \mathcal{Y}\) with the additional requirements that
\[\mu = \rho \circ X^{-1} \quad \text{and} \quad \nu = \rho \circ Y^{-1}\]
as well as
\[\rho(A \times B) = \mu(A) \times \nu(B) \qquad \forall A \in \mathcal{F}, B\in \mathcal{G} \,.\]
Note that requiring \(X\) and \(Y\) to be measurable is prematurate: we have not defined the \(\sigma\)-algebra \(\mathcal{H}\) over \(\mathcal{X}\times \mathcal{Y}\).
7.1 Product \(\sigma\)-algebras
In order to achieve our goal, we first define a \(\sigma\)-algebra \(\mathcal{H}\) of subsets of \(\mathcal{X} \times \mathcal{Y}\). We use the so-called product \(\sigma\)-algebra.
7.2 Product measures
Once we are equipped with the product \(\sigma\)-algebras, we can proceed to the definition of product measures.
Recall the definition of \(\sigma\)-finite measures from Section 2.6).
A measure \(\mu\) on \((\Omega, \mathcal{F})\) is \(\sigma\)-finite iff there exists \((A_n)_n\) with \(\Omega \subseteq \cup_n A_n\) and \(\mu(A_n) < \infty\) for each \(n\).
Finite measures (this encompasses probability measures) are \(\sigma\)-finite. Lebesgue measure is \(\sigma\)-finite. The counting measure on \(\mathbb{R}\) is not \(\sigma\)-finite.
Remark 7.1. Assuming that both \(\mu\) and \(\nu\) are \(\sigma\)-finite is essential. Choose \(\mu\) as the counting measure on \([0,1]\) and \(\nu\) as the Lebesgue measure on \([0,1]\). Consider the diagonal \(E = \{(x,x) : x \in [0,1]\}\). The set \(E\) belongs to \(\mathcal{B}(\mathbb{R}) \otimes \mathcal{B}(\mathbb{R}) = \mathcal{B}(\mathbb{R}^2)\) (check this). But interchanging the order of integration leads to different results:
\[\begin{array}{rl} 1 & = \int_{[0,1]} \Big(\int_{[0,1]} \mathbb{I}_E (x,y) \, \mathrm{d}\mu(x)\Big) \, \mathrm{d}\nu(y) \\ 0 & = \int_{[0,1]} \Big(\int_{[0,1]} \mathbb{I}_E (x,y) \, \mathrm{d}\nu(y)\Big) \, \mathrm{d}\mu(x) \end{array}\]
Theorem 7.1 contains three statements:
- existence of a measure over \((\mathcal{X} \times \mathcal{Y}, \mathcal{F} \otimes \mathcal{G})\) that satisfies the product property over rectangles;
- uniqueness of this measure;
- the possibility of computing the measure of \(E \in \mathcal{F} \otimes \mathcal{G}\) by iterated integration in arbitrary order.
The first statement (existence) is proved using an extension theorem, the second statement (unicity) follows from a monotone class argument (Theorem 2.4)): rectangles form a generating \(\pi\)-class, so the case where both \(\mu\) and \(\nu\) are finite measure is settled. If either \(\mu\) or \(\nu\) is just \(\sigma\)-finite, consider restrictions to rectangles with finite measure, and proceed by approximation. The third statement trivially holds for rectangles.
Remark 7.2. If \(\mu, \nu\) are probability measures, then the product measure \(\mu \otimes \nu\) is a probability measures, it is called a product probability measure.
7.3 Tonelli-Fubini theorem
In this section, we consider product measures that are built from \(\sigma\)-finite measures as in Theorem 7.1). The Tonelli-Fubini Theorem shows that (under mild conditions) integration with respect to a product measure reduces to iterated integration over the component measures.
Proof. Proof can foud in
The proof consists in establishing the statement for larger and larger classes of measurable functions.
Note first that Theorem 7.1 settles the case for indicators of measurable subsets of \(\mathcal{X}\times \mathcal{Y}\).
From this observation, using linearity, simple positive functions are handled. Then settling the case of non-negative measurable functions over \(\mathcal{X}\times \mathcal{Y}\) uses a monotone convergence argument (Theorem 3.1).
The general case is handled by decomposing the measurable function into the sum of a positive part and a negative part.
The following characterization of the expectation of non-negative random variables as the integral of the tail function is a simple consequence of the Tonelli-Fubini Theorem.
Proof. \[\begin{array}{rl} \mathbb{E}X & = \int_{\Omega} X(\omega) \, \mathrm{d}P(\omega) \\ & = \int_{\Omega} \Big( \int_{[0,\infty)} \mathbb{I}_{X(\omega)> t} \mathrm{d}t \Big)\, \mathrm{d}P(\omega) \\ & = \int_{[0,\infty)} \Big( \int_{\Omega} \mathbb{I}_{X(\omega)> t} \, \mathrm{d}P(\omega) \Big) \mathrm{d}t \\ & = \int_{[0,\infty)} \Big( P\{ \omega : X(\omega) > t \} \Big) \mathrm{d}t \end{array}\]
7.4 Joint distributions, independence and product distributions
Let the two random variables \(X, Y\) map \((\Omega, \mathcal{F})\) to \((\mathcal{X}, \mathcal{G})\) and \((\mathcal{Y}, \mathcal{H})\). Equip \((\Omega, \mathcal{F})\) with probability distribution \(P\). Let \(Q_X = P \circ X^{-1}\) and \(Q_Y = P \circ Y^{-1}\) be the two image distributions (called the marginal distributions). We may define a mapping \(Z: \Omega \to \mathcal{X} \times \mathcal{Y}\) by \(Z(\omega) = (X(\omega), Y(\omega))\), this mapping is \(\mathcal{F}/\sigma(\mathcal{G}\times \mathcal{H})\) mesurable.
Let \(Q\) be the joint distribution of \(Z = (X,Y)\) under \(P\), that is the probability distribution over \(\mathcal{X} \times \mathcal{Y}\) endowed with \(\sigma(\mathcal{G}\times \mathcal{H})\) that is uniquely defined by
\[Q( A \times B) = P\Big\{ \omega: X(\omega) \in A, Y(\omega) \in B \Big\} \, .\]
Note that \(Q\) is not necessarily a product distribution.
The next (trivial) proposition tells us that two random variables are independent iff their joint distribution is a product distribution (in fact the product distribution defined by the two marginal distributions).
\[X \perp\!\!\!\perp Y \text{ under } P \Longleftrightarrow Q = Q_X \otimes Q_Y \, ,\]
in words, \(X\) and \(Y\) are independent iff their joint distribution is the product of their marginal distributions.
Proof. TODO: FIX THIS
7.5 Independence of collections of \(\sigma\)-algebras
In many applications, independence between two \(\sigma\)-algebras or a finite collection of \(\sigma\)-algebras is not enough. This is the case when deriving or using laws of large numbers. We have to deal with a countable collection of independent random variables. In words, we have to work with a countable collection of \(\sigma\)-algebras and we need to elaborate a notion of a countable collection of independent \(\sigma\)-algebras.
Let \((\Omega, \mathcal{F}, P)\) be a probability space. Let \(\mathcal{G_1}, \ldots, \mathcal{G}_n, \ldots\) be a countable colletion of sub-\(\sigma\)-algebras.
The collection \(\mathcal{G_1}, \ldots, \mathcal{G}_n, \ldots\) is said to be independent under \(P\)
if
every finite sub-collection is independent under \(P\).
Example 7.1 Consider the uniform probability distribution over \([0,1]\), define \(X_1, X_2, \ldots\) by
\[X_n(\omega) = \operatorname{sign}\Big(\sin\big(2^{n+1} \pi \omega \big)\Big)\]
then \(X_1, \ldots, X_n, \ldots\) form a countable independent collection of random variables.
7.6 Infinite product spaces
In many modeling scenarios (random walks, branching processes, asymptotic statistics, …), we rely on the availability of an infinite collection of independent random variables. While it is (relatively) easy to come up with the notion of finite product probability spaces, the notion of infinite product probability spaces is more puzzling. And this remains true even if the individual components are finite probability spaces (for example \(\{0, 1\}\), equiped with powerset and uniform distribution).
Thinks of \(\Omega_i = \{0,1\}\) and each \(P_i\) has the balanced Bernoulli distribution. Let \(\omega\) be an infinite sequence of \(o\) and \(1\), \(\{\omega\} = \prod_{i=1}^\infty \{ \omega_i \}\) is an infinite Cartesian product of events with probability \(1/2\). What should be its probability in the infinite product probability space? Is there a way to assign probabilities in a consistent way? If the answer is positive, is there a unique way to perform this operation?
Observe that cylinders form a \(\pi\)-class.
If each \((\Omega_n, \mathcal{F}_n)\) is endowed with a probability distribution, assigning a probability to cylinders looks straightforward: \[\mathbb{P} \left( \prod_{n=1}^m A_n \times \prod_{n=m+1}^\infty \Omega_n \right) = \prod_{n=1}^m P_n(A_n) \times \prod_{n=m+1}^\infty P_n(\Omega_n) = \prod_{n=1}^m P_n(A_n) \, .\]
The question is: does \(\mathbb{P}\) extends to the cylinder \(\sigma\)-algebra? If an extension exists, is it unique? The answer is yes.
7.7 Bibliographic remarks
This material covering this lesson can be found in any book on measure and integration theory. Section 4.4 from (Dudley, 2002) is dedicated to product measures.
Complete proofs of the Tonelli-Fubini Theeorem can be found in (Dudley, 2002).
The existence theorem for infinite product probabilities is from Section 8.2 from (Dudley, 2002). A full proof of the Theorem can be found there.