23  Linear Transformations and the Multivariate Normal

Linear transformations play a central role in probability and statistics, especially when working with multivariate normal distributions. In earlier sections, we saw how a linear transformation of two independent standard normals can create correlation. Here, we generalise that idea and show how any multivariate normal distribution can be constructed from a standard one using a matrix transformation.

Why This Matters

Linear transformations of multivariate normals underpin:

This section forms the conceptual bridge between probability theory and practical statistical modelling.

23.1 Linear Transformations of Random Vectors

Let
\[ \mathbf{X} = (X_1, X_2, \dots, X_k)^\top \] be a random vector, and let \(A\) be a fixed \(m \times k\) matrix. A linear transformation of \(\mathbf{X}\) is

\[ \mathbf{Y} = A\mathbf{X} + \mathbf{b}, \] where \(\mathbf{b}\) is a constant vector.

This transformation:

  • rotates, stretches, or compresses the space,
  • shifts the mean by \(\mathbf{b}\),
  • and reshapes the covariance structure through the matrix \(A\).

23.2 The Multivariate Normal Distribution

A random vector \(\mathbf{X}\) is multivariate normal if every linear combination of its components is normally distributed.

If
\[ \mathbf{X} \sim N(\boldsymbol{\mu}, \Sigma), \] then:

  • \(\boldsymbol{\mu}\) is the mean vector,
  • \(\Sigma\) is the covariance matrix (symmetric and positive‑definite).

The density is

\[ f_{\mathbf{X}}(\mathbf{x}) = \frac{1}{(2\pi)^{k/2} |\Sigma|^{1/2}} \exp\!\left( -\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^\top \Sigma^{-1}(\mathbf{x}-\boldsymbol{\mu}) \right). \]

A key property makes the multivariate normal uniquely convenient:

\[ \mathbf{X} \sim N(\boldsymbol{\mu}, \Sigma) \quad \Longrightarrow \quad A\mathbf{X} + \mathbf{b} \sim N(A\boldsymbol{\mu} + \mathbf{b},\; A\Sigma A^\top). \]

This means:

  • linear transformations of normals are still normal (preserve normality),
  • the mean transforms linearly,
  • the covariance transforms quadratically.

This is why the multivariate normal is so widely used: it behaves perfectly under linear operations.

Example: Linear Transformation of a Bivariate Normal

We simulate \(\mathbf{X}\sim N(\mu ,\Sigma )\) with mean vector and covariance matrix:

\[ \mu = (1,2)^{\top}, \qquad \Sigma =\left( \begin{matrix}4&1\\ 1&1\end{matrix}\right). \]

The entries encode \(\text{Var}(X_1) = 4\), \(\text{Var}(X_2) = 1\) and \(\text{Cov}(X_1,X_2) = 1\).

Then, apply \(\mathbf{Y}=A\mathbf{X}+b\) with

\[ A =\left( \begin{matrix}1&0\\ -1&2\end{matrix}\right), \qquad b = (0, 1)^{\top} \]

Compare the scatterplots.

set.seed(1234)

library(MASS)

# Original distribution
mu  <- c(1, 2)
Sigma <- matrix(c(4, 1,
                  1, 1), 2, 2)

n <- 5000
X <- mvrnorm(n, mu = mu, Sigma = Sigma)

# Linear transformation
A <- matrix(c(1, 0,
             -1, 2), 2, 2)
b <- c(0, 1)

# Matrix algebra
Y <- t(A %*% t(X)) + matrix(b, n, 2, byrow = TRUE)

par(mfrow = c(1, 2))

plot(X[,1], X[,2], pch = 16, cex = 0.4, col = "steelblue",
     main = "Original X ~ N(mu, Sigma)",
     xlab = "X1", ylab = "X2")

plot(Y[,1], Y[,2], pch = 16, cex = 0.4, col = "tomato",
     main = "Transformed Y = A X + b",
     xlab = "Y1", ylab = "Y2")

  • The original cloud has a certain tilt and spread.
  • The transformed cloud is rotated and stretched exactly as predicted by \(A\Sigma A^{\top }\).

23.3 Constructing a Multivariate Normal via Cholesky

If \(\Sigma\) is a covariance matrix, we can write:

\[ \Sigma = LL^\top, \]

where \(L\) is the Cholesky factor (lower triangular).
If \(\mathbf{Z} \sim N(\mathbf{0}, I)\), then

\[ \mathbf{X} = L\mathbf{Z} + \boldsymbol{\mu} \]

has distribution \(N(\boldsymbol{\mu}, \Sigma)\).

This generalises earlier 2D construction:

\[ Y = \rho Z_1 + \sqrt{1-\rho^2}\, Z_2, \]

which is exactly the Cholesky factor of \(\begin{pmatrix}1 & \rho \\ \rho & 1\end{pmatrix}\).

Example: Constructing Correlated Normals via Cholesky

Define the covariance matrix

\[\Sigma =\left( \begin{matrix}1&\rho \\ \rho &1\end{matrix}\right) =\left( \begin{matrix}1&0.8\\ 0.8&1\end{matrix}\right).\]

Both variables \(X\) and \(Y\) have variance 1, and covariance \(\rho\).

set.seed(1234)

n <- 5000
rho <- 0.8

# Covariance matrix
Sigma <- matrix(c(1, rho,
                  rho, 1), 2, 2)

# Cholesky factor: L^T L (upper triangular)
L <- chol(Sigma)

# Generate 2n independent N(0,1) values
Z <- matrix(rnorm(2*n), n, 2)

# Construct X = Z L^T (linear transformation)
X <- Z %*% L

par(mfrow = c(1, 2))

plot(Z[,1], Z[,2], pch = 16, cex = 0.4, col = "gray60",
     main = "Z ~ N(0, I)",
     xlab = "Z1", ylab = "Z2")

plot(X[,1], X[,2], pch = 16, cex = 0.4, col = "firebrick",
     main = "X = Z L^T (Correlated Normals)",
     xlab = "X1", ylab = "X2")

  • \(Z\) is a round cloud (independent normals).
  • \(X\) becomes a tilted ellipse with correlation \(\rho =0.8\).
  • This is how Cholesky induces covariance.

Example: General k‑Dimensional Construction

Simulate \(\mathbf{X}\sim N(\mu ,\Sigma )\) using Cholesky.

Define a mean vector and a covariance matrix:

\[ \mu =(0,\; 1,\; 2)^{\top }, \qquad \Sigma =\left( \begin{matrix}1&0.5&0.2\\ 0.5&2&0.3\\ 0.2&0.3&1\end{matrix}\right). \]

set.seed(1234)

# 3D mean and covariance
mu <- c(0, 1, 2)
Sigma <- matrix(c(1, 0.5, 0.2,
                  0.5, 2, 0.3,
                  0.2, 0.3, 1), 3, 3)

n <- 5000

# Cholesky factor L^T
L <- chol(Sigma)

# Standard normals
Z <- matrix(rnorm(3*n), n, 3)

# Construct X = Z L^T + mu
X <- Z %*% L + matrix(mu, n, 3, byrow = TRUE)

# Quick diagnostic: sample covariance
round(cov(X), 2)
     [,1] [,2] [,3]
[1,] 0.98 0.54 0.21
[2,] 0.54 1.98 0.31
[3,] 0.21 0.31 1.00
  • The sample covariance of \(X\) matches \(\Sigma\).
  • The construction works in any dimension.
set.seed(1234)

# 3D mean and covariance
mu <- c(0, 1, 2)
Sigma <- matrix(c(1, 0.5, 0.2,
                  0.5, 2, 0.3,
                  0.2, 0.3, 1), 3, 3)

n <- 5000

# Cholesky factor L^T
L <- chol(Sigma)

# Standard normals
Z <- matrix(rnorm(3*n), n, 3)

# Construct X = Z L^T + mu
X <- Z %*% L + matrix(mu, n, 3, byrow = TRUE)

# 3D scatterplot
library(scatterplot3d)

scatterplot3d(X[,1], X[,2], X[,3],
              pch = 16, cex.symbols = 0.4,
              color = "steelblue",
              main = "3D Scatterplot of Multivariate Normal Sample",
              xlab = "X1", ylab = "X2", zlab = "X3")

  • A 3D elliptical cloud whose shape reflects the covariance matrix.
  • The cloud is tilted and stretched in directions determined by \(\Sigma\).
  • This visually reinforces the idea that \(X=LZ+\mu\).