2 Probability Tools for Simulation

Simulation is grounded in probability theory. When we generate artificial data, compute averages, or estimate probabilities through repetition, we are relying on well-established probabilistic principles. A simulation algorithm may be computational, but its justification is mathematical.

This section revisits the key ideas required for simulation-based reasoning.

2.1 Random Variables

A random variable is a numerical function defined on the outcome of a random experiment. Formally, it is a mapping from a sample space \(\Omega\) to the real numbers. Conceptually, it allows us to translate qualitative randomness into quantitative structure.

If a die is rolled, the outcome is an element of the sample space \(\{1,2,3,4,5,6\}\). Defining \(X\) as “the number shown on the die” converts this outcome into a random variable. Similarly, in a queueing system, we may define random variables representing arrival times, service durations, waiting times, or the number of customers in the system.

In simulation, random variables are the fundamental building blocks. Every stochastic model is constructed by specifying how certain random variables behave and how they interact. Once these variables are generated computationally, the system evolves according to deterministic rules applied to those random inputs.

Random variables are typically classified as either discrete or continuous.

A discrete random variable takes countable values, e.g., the number of arrivals in an hour, the number of failures in a batch, or the number of heads in ten coin tosses.

A continuous random variable takes values in an interval of the real line, e.g., waiting times, lifetimes of components, and measurement errors.

This distinction determines how probabilities and expectations are computed, and it shapes the simulation methods used to generate the variables.

2.2 Probability Distributions (PMF / PDF / CDF)

The behaviour of a random variable is described by its probability distribution.

For a discrete random variable \(X\), the distribution is specified by its probability mass function (PMF),

\[p(x) = P(X = x),\]

which assigns a probability to each possible value. These probabilities must satisfy

\[\sum_x p(x) = 1.\]

For a continuous random variable, the distribution is described by a probability density function (PDF), denoted \(f(x)\), satisfying

\[ f(x) \ge 0 \quad \text{and} \quad \int_{-\infty}^{\infty} f(x)\,dx = 1. \]

Probabilities are computed via integration:

\[P(a \le X \le b) = \int_a^b f(x)\,dx.\]

Both discrete and continuous random variables can be described using the cumulative distribution function (CDF),

\[F(x) = P(X \le x).\]

For continuous random variables, the CDF is related to the density by

\[F(x) = \int_{-\infty}^{x} f(t)\,dt.\]

The CDF plays a crucial role in simulation. Many simulation techniques, including inverse transform sampling, rely on transforming Uniform(0,1) random variables through the inverse of the CDF to generate draws from more complex distributions.

2.3 Expectation and Variance

One of the primary goals of simulation is to estimate expectations. The expectation, or mean, represents the long-run average value of a random variable.

For a discrete random variable,

\[E[X] = \sum_x x\,p(x).\]

For a continuous random variable,

\[E[X] = \int x\,f(x)\,dx.\]

Expectation is linear. For constants a and b,

\[E[aX + bY] = aE[X] + bE[Y].\]

This property is especially important in simulation when combining random components of a model.

The variance measures the variability of a random variable around its mean:

\[\operatorname{Var}(X) = E[(X - E[X])^2].\]

An equivalent computational formula is

\[\operatorname{Var}(X) = E[X^2] - (E[X])^2.\]

In simulation, variance determines how much Monte Carlo estimates fluctuate across replications and therefore influences how many repetitions are required to achieve stable results.

Example 1: Discrete Case (Binomial)

Let \[X \sim \text{Binomial}(n=10, p=0.3).\]

Theoretical values: \[E[X] = np = 3, \qquad \operatorname{Var}(X) = np(1-p) = 2.1.\]

set.seed(123)

# Simulate
x_sim <- rbinom(10000, size = 10, prob = 0.3)

# Simulated statistics
cat("Simulated mean:", mean(x_sim),
    ", simulated variance:", var(x_sim))

Simulated mean: 2.9899 , simulated variance: 2.072205

# Theoretical values
cat("Theoretical mean:", 10 * 0.3,
    ", theoretical variance:", 10 * 0.3 * (1 - 0.3))

Theoretical mean: 3 , theoretical variance: 2.1

Example 2: Continuous Case (Exponential)

Let \[X \sim \text{Exp}(\lambda = 2).\]

Theoretical values: \[E[X] = \frac{1}{\lambda} = 0.5, \qquad \operatorname{Var}(X) = \frac{1}{\lambda^2} = 0.25.\]

set.seed(123)

# Simulate
x_sim <- rexp(10000, rate = 2)

# Simulated statistics
cat("Simulated mean:", mean(x_sim),
    ", simulated variance:", var(x_sim))

Simulated mean: 0.5018906 , simulated variance: 0.2499411

# Theoretical values
cat("Theoretical mean:", 1/2,
 ", theoretical variance:", 1 / (2^2))

Theoretical mean: 0.5 , theoretical variance: 0.25

2.4 Law of Large Numbers

The theoretical foundation of simulation is the Law of Large Numbers (LLN).

Suppose \(X_1, X_2, \dots, X_n\) are independent and identically distributed random variables with mean \(\mu\). The sample average

\[\bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i\]

converges to \(\mu\) as \(n \to \infty\).

This result justifies Monte Carlo estimation. When we simulate a system repeatedly and compute the average outcome, we are approximating an expected value. Increasing the number of replications improves the stability of this estimate.

Example 1: Demonstrate the LLN with coin tosses

Let

\[X \sim \text{Binomial}(n=1, p=0.5).\]

set.seed(123)
x <- rbinom(5000, size = 1, prob = 0.5)   # 1 = Head, 0 = Tail
running_mean <- cumsum(x) / seq_len(5000)

plot(running_mean, type = "l",
     xlab = "Number of tosses (n)",
     ylab = "Running mean (proportion of heads)",
     main = "LLNs: Proportion of Heads",
     )

abline(h = 0.5, lwd = 2)  # true mean

Example 2: Demonstrate the LLN with Exp(1)

Let

\[X \sim \text{Exp}(1).\]

set.seed(123)
x <- rexp(5000, rate = 1)
running_mean <- cumsum(x) / seq_len(5000)

plot(running_mean, type = "l",
     xlab = "Number of samples (n)",
     ylab = "Running mean",
     main = "LLNs: Exponential(1)",
     )

abline(h = 1, lwd = 2)  # true mean

2.5 Central Limit Theorem

The Law of Large Numbers tells us that the sample average eventually stabilises near the true mean. However, it does not describe how the sample mean fluctuates for large but finite sample sizes. The Central Limit Theorem (CLT) fills this gap.

Let \(X_1, X_2, \dots, X_n\) be independent and identically distributed random variables with mean \(\mu\) and variance \(\sigma^2 < \infty\). Define the sample mean

\[\bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i.\]

The Central Limit Theorem states that as \(n \to \infty\),

\[\sqrt{n}(\bar{X}_n - \mu)\]

converges in distribution to a Normal distribution with mean \(0\) and variance \(\sigma^2\).

Equivalently, for sufficiently large \(n\),

\[\bar{X}_n \approx \mathcal{N}\left(\mu, \frac{\sigma^2}{n}\right).\]

Several important observations follow:

The original distribution of \(X\) does not need to be Normal.
Only the existence of a finite variance is required.
The variability of the sample mean decreases at rate \(1/n\).
The approximation improves as \(n\) increases.

For simulation, this result is fundamental. When we estimate an expectation using Monte Carlo methods, we compute a sample average. The CLT tells us that the error of this estimate is approximately Normally distributed for large \(n\), which allows us to construct confidence intervals and quantify simulation precision.

Example: To see the CLT in action, we simulate from a distribution that is clearly not Normal — the Exponential distribution.

Let

\[X \sim \text{Exp}(1).\]

This distribution is strongly right-skewed.

We repeatedly compute sample means of size \(n = 30\), and examine their distribution.

set.seed(123)

# Parameters
n <- 30          # sample size
R <- 10000       # number of repetitions

# Store sample means
sample_means <- numeric(R)

for (i in 1:R) {
  x <- rexp(n, rate = 1)
  sample_means[i] <- mean(x)
}

# Histogram of sample means
hist(sample_means,
     probability = TRUE,
     breaks = 40,
     col = "lightblue",
     main = "Distribution of Sample Means (n = 30)",
     xlab = "Sample Mean")

# Overlay Normal approximation
curve(dnorm(x, mean = 1, sd = 1/sqrt(n)),
      col = "red",
      lwd = 2,
      add = TRUE)

The red curve represents the Normal distribution predicted by the CLT:

\[\bar{X}_n \approx \mathcal{N}\left(1, \frac{1}{n}\right).\]

Even though the underlying data are skewed, the distribution of the sample mean is approximately symmetric and bell-shaped.

To emphasise the transformation:

par(mfrow = c(1,2))

# Original exponential distribution
hist(rexp(10000, rate = 1),
     probability = TRUE,
     breaks = 40,
     col = "salmon",
     main = "Exponential(1)",
     xlab = "X")

# Distribution of sample means
hist(sample_means,
     probability = TRUE,
     breaks = 40,
     col = "lightblue",
     main = "Sample Means (n = 30)",
     xlab = "Mean")

The first plot is heavily skewed, while the second is approximately Normal. This is the Central Limit Theorem at work.

2.6 Comparison: LLNs vs CLT

Feature	Law of Large Numbers (LLN)	Central Limit Theorem (CLT)
Main Question	Does the sample mean converge to the true mean?	How does the sample mean fluctuate around the true mean?
Statement	\(\bar{X}_n \to \mu\) as \(n \to \infty\)	\(\sqrt{n}(\bar{X}_n - \mu) \Rightarrow \mathcal{N}(0, \sigma^2)\)
Focus	Convergence	Distribution of the error
Type of Result	Deterministic limit result	Distributional approximation
What It Guarantees	Sample average gets closer to \(\mu\)	Sample average is approximately Normal for large \(n\)
Rate Information	Does not specify rate	Variance shrinks at rate \(\sigma^2/n\)
Required Conditions	i.i.d., finite mean	i.i.d., finite variance
Role in Simulation	Justifies Monte Carlo estimation	Justifies confidence intervals & standard errors
Practical Meaning	More simulations → more stable estimate	More simulations → smaller uncertainty