21  Python Workshop Solutions

This workshop consists of two parts:

Part 1: Inverse Transform Sampling

Exercise 1: Precision Method

A mobile game awards players with different types of rewards when a treasure chest is opened. The reward type follows the probability distribution below.

Reward Type \(x\) Coins Gems Energy Rare Item
\(P(X=x)\) 0.50 0.25 0.15 0.10
  1. Construct the cumulative distribution function (CDF) for this random variable and present it in a table.
import numpy as np
import pandas as pd

# Define outcomes and probabilities
x = np.array(["Coins", "Gems", "Energy", "Rare Item"])
p = np.array([0.50, 0.25, 0.15, 0.10])

# Check probabilities sum to 1
p.sum()

# Construct cumulative probabilities
F = np.cumsum(p)

# Create table for PMF and CDF
pd.DataFrame({
    "outcome": x,
    "p": p,
    "F": F
})
outcome p F
0 Coins 0.50 0.50
1 Gems 0.25 0.75
2 Energy 0.15 0.90
3 Rare Item 0.10 1.00
  1. Intervals for the precision method

From the cumulative probabilities,

  • Coins: \(0 < U \le 0.50\)
  • Gems: \(0.50 < U \le 0.75\)
  • Energy: 0\(.75 < U \le 0.90\)
  • Rare Item: \(0.90 < U \le 1.00\)

So the rule is

\[ X = \begin{cases} \text{Coins} & \text{if } 0 < U \le 0.50 \\ \text{Gems} & \text{if } 0.50 < U \le 0.75 \\ \text{Energy} & \text{if } 0.75 < U \le 0.90 \\ \text{Rare Item} & \text{if } 0.90 < U \le 1.00 \end{cases} \]

# Show intervals in a table
lower = np.concatenate(([0], F[:-1]))   # F[:-1] excludes the last item in F
upper = F

pd.DataFrame({
    "outcome": x,
    "lower": lower,
    "upper": upper
})
outcome lower upper
0 Coins 0.00 0.50
1 Gems 0.50 0.75
2 Energy 0.75 0.90
3 Rare Item 0.90 1.00
  1. Classify the given random numbers
u_given = np.array([0.12, 0.63, 0.44, 0.91, 0.27])

# Find the first cumulative probability >= u
result_given = np.array([x[np.where(u <= F)[0][0]] for u in u_given])

pd.DataFrame({
    "u": u_given,
    "reward": result_given
})
u reward
0 0.12 Coins
1 0.63 Gems
2 0.44 Coins
3 0.91 Rare Item
4 0.27 Coins
  • for u in u_given applies the rule to each random number
  • u <= F checks which cumulative probabilities are at least as large as u
  • np.where(u <= F)[0] returns the indices where this is true
  • [0] takes the first such index

Equivalently:

def find_outcome(u):
    i = np.where(u <= F)[0][0]
    return x[i]

result_given = np.array([find_outcome(u) for u in u_given])

pd.DataFrame({
    "u": u_given,
    "reward": result_given
})
u reward
0 0.12 Coins
1 0.63 Gems
2 0.44 Coins
3 0.91 Rare Item
4 0.27 Coins
  1. R function to simulate one reward
def simulate_reward():
    u = np.random.uniform(0, 1)
    return x[np.where(u <= F)[0][0]]

# Example: simulate 20 rewards
np.random.seed(123)
np.array([simulate_reward() for _ in range(20)])
array(['Gems', 'Coins', 'Coins', 'Gems', 'Gems', 'Coins', 'Rare Item',
       'Gems', 'Coins', 'Coins', 'Coins', 'Gems', 'Coins', 'Coins',
       'Coins', 'Gems', 'Coins', 'Coins', 'Gems', 'Gems'], dtype='<U9')
  1. Simulate 10,000 rewards and compare empirical probabilities
np.random.seed(123)

samples = np.array([simulate_reward() for _ in range(10000)])

# Empirical probabilities
empirical = pd.Series(samples).value_counts(normalize=True).reindex(x)
empirical

# Compare theoretical and empirical probabilities
comparison = pd.DataFrame({
    "outcome": x,
    "theoretical": p,
    "empirical": empirical.values
})

comparison
outcome theoretical empirical
0 Coins 0.50 0.4965
1 Gems 0.25 0.2588
2 Energy 0.15 0.1493
3 Rare Item 0.10 0.0954

You can also look at the differences:

comparison["difference"] = comparison["empirical"] - comparison["theoretical"]
comparison
outcome theoretical empirical difference
0 Coins 0.50 0.4965 -0.0035
1 Gems 0.25 0.2588 0.0088
2 Energy 0.15 0.1493 -0.0007
3 Rare Item 0.10 0.0954 -0.0046

The empirical probabilities should be close to the theoretical probabilities, though not exactly equal, because the simulation uses a finite number of draws. With 10,000 replications, the discrepancies are usually small.

  1. Why can the precision method be inefficient for large k?

A simple implementation checks the cumulative probabilities one by one until it finds the correct interval. When the number of outcomes k is large, this search can become slow, especially if repeated many times. More efficient methods or built-in algorithms are often preferred for large discrete distributions.

Optional: more compact R version using np.searchsorted()

def simulate_reward2(n=1):
    u = np.random.uniform(0, 1, size=n)
    idx = np.searchsorted(F, u, side="left")
    return x[idx]

np.random.seed(123)
simulate_reward2(10)

# 10,000 simulated rewards
samples2 = simulate_reward2(10000)
pd.Series(samples2).value_counts(normalize=True).reindex(x)
Coins        0.4964
Gems         0.2586
Energy       0.1495
Rare Item    0.0955
Name: proportion, dtype: float64

One small note: np.searchsorted() finds the insertion position of each value in the cumulative probability array, which corresponds exactly to the first index where \(F_i \ge u\). This makes it a compact Python version of the precision method.

Note

The precision method is essentially the discrete version of the inverse CDF method.

Exercise 2: Load on a bridge (CD 2014)

Consider the pdf for the random variable \(X\) which is the magnitude (in newtons) of a dynamic load on a bridge, given as

\[ f(x)= \left(\frac{1}{8} + \frac{3}{8} x \right), \quad 0 \le x \le 2 \]

\[ f(x)=0, \quad \text{otherwise.} \]

  1. Calculate the exact mean and variance.

\[ \boxed{E(X) = \int_x x f(x)\,dx, \quad \mathrm{Var}(X)= E(X^2) - [E(X)]^2} \]

First, check the pdf:

\[ \int_0^2 f(x)\, dx = \int_0^2 \left(\frac{1}{8} + \frac{3}{8} x \right)\,dx = 1 \rightarrow \text{ valid PDF.} \]

\[ \begin{aligned} E(X) &= \int_0^2 x f(x)\,dx \\ &= \int_0^2 x \left(\frac{1}{8} + \frac{3}{8} x \right)\,dx \\ &= \left[\frac{1}{16} x^2 + \frac{1}{8} x^3\right]_0^2 \\ \therefore E(X) &= \frac{5}{4} = 1.25 \\ E(X^2) &= \int_0^2 x^2 f(x)\,dx \\ &= \int_0^2 x^2 \left(\frac{1}{8} + \frac{3}{8} x \right)\,dx \\ &= \left[\frac{1}{24} x^3 + \frac{3}{32} x^4\right]_0^2 \\ E(X^2) & = \frac{11}{6} \approx 1.8333 \\ Var(X) &= E(X^2) - [E(X)]^2 \\ &= \frac{11}{6} - \left(\frac{5}{4}\right)^2 \\ \therefore Var(X) &= \frac{13}{48} \approx 0.2708 \end{aligned} \]

Alternatively,

import sympy as sp

x = sp.symbols('x')

# pdf
f = 1/sp.Integer(8) + 3*x/8

# E(X)
EX = sp.integrate(x * f, (x, 0, 2))

# E(X^2)
EX2 = sp.integrate(x**2 * f, (x, 0, 2))

# Variance
VarX = EX2 - EX**2

print("E(X) =", EX)
print("E(X^2) =", EX2)
print("Var(X) =", sp.simplify(VarX))
E(X) = 5/4
E(X^2) = 11/6
Var(X) = 13/48
  1. Write a program to simulate values from this distribution using the inverse cdf method. Calculate the mean and variance and compare with the exact results.
  • Step 1. Firstly, we need to find the cdf \(F(x).\) \[ \begin{aligned} F(x) &= \int_0^x f(t)\, dt \\ &= \int_0^x \left(\frac{1}{8} + \frac{3}{8} t \right)\,dt \\ F(x) &= \frac{x}{8} + \frac{3x^2}{16} \end{aligned} \]
import sympy as sp

# Define variables
x = sp.symbols('x')

# pdf
f = 1/sp.Integer(8) + 3*x/8

# cdf
F = sp.integrate(f, (x, 0, x))

print("F(X) =", F)
F(X) = 3*x**2/16 + x/8
  • Step 2. Set \(u = F(x)\) and solve for \(x\) \[ \begin{aligned} u &= \frac{x}{8} + \frac{3x^2}{16} \\ 16u &= 3x^2 + 2x \\ 0 &= 3x^2 + 2x - 16u \\ x &= \frac{-2 \pm \sqrt{2^2 - 4(3)(-16u)}}{2(3)} \\ &= \frac{-1 \pm \sqrt{1+48u} }{3} \\ \therefore x &= \frac{-1 + \sqrt{1+48u} }{3}, \quad 0 \leq x \leq 2. \end{aligned} \]
# Define variables
x = sp.symbols('x')
u = sp.symbols('u')

# Define equation: F = u
equation = sp.Eq(F, u)

# Solve for x
solution = sp.solve(equation, x)
print(solution)
[-sqrt(48*u + 1)/3 - 1/3, sqrt(48*u + 1)/3 - 1/3]
  • Step 3. Simulate \(f(x)\) from \(U \sim \text{Uniform}(0,1)\)
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1234)
u = np.random.rand(10000)

# Inverse CDF
x = (-1 + np.sqrt(1 + 48*u)) / 3

# Histogram
plt.hist(x, bins=40, density=True, alpha=0.6)

# Overlay true pdf
t = np.linspace(0, 2, 200)
pdf = 1/8 + 3*t/8
plt.plot(t, pdf, linewidth=2)

plt.title("Simulated Distribution")
plt.show()

# Compare mean and variance
print("Theoretical mean =", EX, ", Empirical mean =", np.mean(x))
print("Theoretical variance =", VarX, ", Empirical variance =", np.var(x, ddof=1))

Theoretical mean = 5/4 , Empirical mean = 1.2491798412911999
Theoretical variance = 13/48 , Empirical variance = 0.27390211327069786

The simulated distribution closely matches the theoretical probability density function, as shown by the histogram and overlaid curve Additionally, the empirical mean and variance from the simulated data are very close to the exact values \(E(X)=\frac{5}{4}\) and \(\mathrm{Var}(X)=\frac{13}{48}\), confirming that the simulation is accurate.

Exercise 3: Simulating the distribution of waiting time - Transportation

The article “A Model of Pedestrians’ Waiting Times for Street Crossings at Signalised Intersections” (Transportation Research, 2013: 17–28) suggested that under some circumstances the distribution of waiting time \(X\) could be modeled with the following pdf:

\[ f(x)= \frac{\theta}{\tau}\left( 1- \frac{x}{\tau}\right)^{\theta - 1}, \quad 0 \le x < \tau \]

\[ f(x)=0, \quad \text{otherwise.} \]

  1. Write a function to simulate values from this distribution, implementing the inverse cdf method. Your function should have three inputs: the desired number of simulated values \(n\) and values for the two parameters for \(\theta\) and \(\tau.\)
  • Step 1. Determine the cdf:

    \[ F(x) = \int_0^x f(t) dt = \int_0^x \frac{\theta}{\tau}\left( 1- \frac{t}{\tau}\right)^{\theta - 1} dt \]

    Using integration by substitution method, let \[ w=\left( 1- \frac{t}{\tau}\right), \quad dw=-\frac{1}{\tau}\,dt \]

    \[ \begin{aligned} F(x) &= \int_0^x \left( 1- \frac{t}{\tau}\right)^{\theta - 1}\left(\frac{\theta}{\tau}\right) dt \\ &= \int_1^{(1-x/\tau)} w^{\theta-1}(-\theta)\,dw \\ &= \left[-w^\theta\right]_1^{(1-x/\tau)} \\ \therefore F(x) &= 1 - (1-x/\tau)^{\theta} \end{aligned} \]

  • Step 2. Set \(u=F(x)\) and solve for x

    \[ u= 1 - (1-x/\tau)^{\theta} \quad \rightarrow \quad x= \tau(1 - (1- u)^{1/\theta}) \]

  • Step 3. Write a function that accepts 3 inputs: \(n,\theta,\tau\)

def waittime(n, theta, tau):
    u = np.random.rand(n)
    x = tau * (1 - (1 - u)**(1/theta))
    return x
  1. Use your function in part (a) to simulate 10,000 values from this wait time distribution with \(\theta=4\) and \(\tau=80.\) Estimate \(E(X)\) under these parameter settings.
np.random.seed(1234)
x = waittime(10000, 4, 80)
print("Estimated E(X):", np.mean(x))
print("Estimated Var(X):", np.var(x, ddof=1))
Estimated E(X): 16.000890300264242
Estimated Var(X): 169.98408083749962
  1. How close is your estimate to the correct value of the exact \(E(X)=16\) and \(\mathrm{Var}(X)=170.67\)?

The empirical mean and variance from the simulated data are very close to the exact mean and variance, confirming that the simulation is accurate.


Part 2: Acceptance-Rejection Sampling

Exercise 4: Simulating Beta(2,2) distribution using Uniform distribution (Templ 2016)

  1. Describe the steps. Hints: The pdf of \(Beta(\alpha,\beta)\) is \[ f(x)= \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)} x^{\alpha-1}(1-x)^{\beta-1}, \quad 0 \le x \le 1 \]

    The aim is to simulate \(n\) values from a Beta(2,2) distribution with pdf such that by substituting \(\alpha=2, \beta=2\) in the above, we have \[ f(x)= 6x(1-x), \quad 0 \le x \le 1. \]

    Let \(g(x)\) be Uniform(0,1) such that \[ g(x)= 1, \quad 0 \le x \le 1. \]

    and the ratio \(f/g\) is bounded, i.e., there exists a constant \(M\) such that \[ \frac{f(x)}{g(x)} = 6x(1-x) \le M, \quad \forall x. \]

    \[ \frac{d}{dx}6x(1-x) = 6 - 12x = 0 \]

    In this case, the function is maximised at \(x=0.5, f(x)=1.5\) so that we choose \(M = 1.5.\)

    Therefore, the general rule is

    \[ U \leq \frac{f(Y)}{Mg(Y)} = \frac{6y(1-y)}{1.5(1)} = 4y(1-y) \]

    Proceed as follows:

  1. Generate \(Y \sim g \sim \text{Uniform}(0,1).\)

  2. Generate a standard uniform variate, \(U \sim \text{Uniform}(0,1).\)

  3. If \(u\le 4y(1-y),\) then assign \(x=y\) (i.e., “accept” the candidate). Otherwise, discard or reject \(y\) and return to step 1.

  • These steps are repeated until \(n\) candidate values have been accepted.
  • The resulting accepted values \(x_1, . . ., x_n\) form a simulation of a random variable with the original pdf, \(f(x).\)
  1. Check visually the plots of \(f(x)\) and \(Mg(x).\)
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import beta

x = np.linspace(0, 1, 500)

# f(x): Beta(2,2)
f = beta.pdf(x, a=2, b=2)

# Mg(x) = 1.5
Mg = 1.5 * np.ones_like(x)

plt.plot(x, f, linewidth=2, label="f(x) = Beta(2,2)")
plt.plot(x, Mg, linestyle="--", linewidth=2, label="Mg(x) = 1.5")

plt.xlabel("x")
plt.legend()
plt.title("f(x) vs Mg(x)")
plt.show()

  1. Perform Accept-Reject method to simulate Beta(2,2) distribution from Uniform distribution. Calculate the acceptance rate. Plot the simulating values of a Beta(2,2) using the accep rejection approach.
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1234)

n = 10000
x = []

while len(x) < n:
    m = n - len(x)
    y = np.random.rand(m)
    u = np.random.rand(m)
    
    accept = u <= 4 * y * (1 - y)
    x.extend(y[accept])

x = np.array(x[:n])

plt.hist(x, bins=40, density=True, alpha=0.6, label="Simulated")

t = np.linspace(0, 1, 500)
plt.plot(t, 6*t*(1-t), linewidth=2, label="True Beta(2,2)")

plt.title("Acceptance-Rejection Sampling")
plt.xlabel("x")
plt.legend()
plt.show()

Exercise 5: Simulating the half-normal distribution using Exponential

The half-normal distribution is defined as:

\[f(x) = \sqrt{\frac{2}{\pi}} \exp (-\frac{x^2}{2}), \quad x \ge 0.\]

Consider simulating values from this distribution using the accept–reject method with a candidate distribution of an Exponential random variable with parameter \(\lambda = 1\)

\[g(x)= e^{-x}, \quad x \ge 0.\]

  1. Find the inverse cdf corresponding to \(g(x).\)

    If \(X \sim Exp(\lambda=1)\) then the pdf and cdf are \[ g(x)= e^{-x}, \quad G(x)= 1- e^{-x}. \]

    Then set \(u=G(x)\) and then solve for \(x\):

    \[ \begin{aligned} u = G(x) &= 1- e^{-x} \\ e^{-x} &= 1-u \\ -x &= \ln(1-u) \\ x &= G^{-1}(u) = -\ln(1-u). \end{aligned} \]

  2. Find the smallest majorisation constant \(M\) so that \(f(x)/g(x)\le M\) for all \(x \ge 0.\)

    The ratio \(f(x)/g(x)\) is equivalent to

    \[ \frac{f(x)}{g(x)}= \frac{\sqrt{\frac{2}{\pi}} \exp (-\frac{x^2}{2})}{\exp(-x)}=\sqrt{\frac{2}{\pi}} \exp (-\frac{x^2}{2}+x). \]

    Use the first derivative test,

    \[ \frac{d}{dx} \left(\exp (-\frac{x^2}{2}+x) \right)=\exp (-\frac{x^2}{2}+x)(1-x)=0. \]

    The absolute maximum on the interval \([0, \infty)\) occurs at \(x = 1\).

    \[ \frac{f(x)}{g(x)} \le \sqrt{\frac{2}{\pi}} \exp (-\frac{1^2}{2}+1)=\sqrt{\frac{2}{\pi}} \exp(1/2)=\sqrt{\frac{2e}{\pi}}=M \]

  3. On the average, how many candidate values will be required to generate 10,000 “accepted” values?

The expected number of candidate values required to generate a single accepted value is \(M,\) so the expected number required to generate 10,000 x-values is \(10,000M \approx 13,155.\)

  1. Write a program to construct 10,000 values from a half-normal distribution.
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(1234)

n = 10000
M = np.sqrt(2 * np.exp(1) / np.pi)

x = []

while len(x) < n:
    m = n - len(x)                      # how many more needed
    y = np.random.exponential(scale=1, size=m)  # Exp(1)
    u = np.random.rand(m)                      # Uniform(0,1)
    
    fg_ratio = np.sqrt(2 / np.pi) * np.exp(-y**2 / 2 + y)
    accept = u <= fg_ratio / M
    
    x.extend(y[accept])


# Trim to exactly n samples
x = np.array(x[:n])

plt.hist(x, bins=40, density=True, alpha=0.6, label="Simulated")

t = np.linspace(0, 4, 500)
true_pdf = np.sqrt(2 / np.pi) * np.exp(-t**2 / 2)

plt.plot(t, true_pdf, linewidth=2, label="Half-Normal PDF")

plt.title("Acceptance-Rejection Sampling")
plt.xlabel("x")
plt.legend()
plt.show()