8  Workshop Activities

This workshop consists of two parts:

By the end of this workshop, you should be able to:

  1. Use built‑in distribution functions in R (e.g., r*, d*, p*, q*) or scipy.stats package for Python to simulate and evaluate common distributions.
  2. Visualise and interpret univariate and joint distributions, using histograms/density overlays and scatterplots/contours.

From this workshop onward, no R Notebook or Jupyter Notebook templates will be provided.

You are expected to practice creating and rendering HTML documents independently, as this is an essential component of all the assessments in this unit.

If you find that R Notebook or Jupyter Notebook is somewhat limited in producing polished reports, you may consider using Quarto.

Quarto is a modern and more powerful publishing system (and successor to R Markdown) that supports dynamic content using R and Python. It can be used to create reproducible, production-quality articles, presentations, dashboards, websites, blogs, and books in formats such as HTML, PDF, MS Word, ePub, and more. For instance, this book and all the lecture slides are developed using Quarto.

You can install Quarto and use it directly within RStudio or VS Code.

8.1 Part 1: Univariate distributions in simulation

Exercise 1: Geometric Distribution

Let \[X \sim \text{Geometric}(p),\] where \(p = 0.3\).

  1. Simulate 10,000 observations.
  2. Plot the empirical PMF (barplot of relative frequencies).
  3. Overlay the theoretical PMF.
  4. Compute:
    • empirical mean and variance
    • theoretical mean and variance
  5. Explain:
    • Why is the geometric distribution memoryless? Verify: \(P(X>s+t | X>s)=P(X>t)\)
    • How does changing \(p\) affect skewness?
  6. Estimate \(P(X > 5)\) analytically and via simulation. Compare results.

Exercise 2: Gamma Distribution

Let \[X \sim \text{Gamma}(\alpha = 3, \beta = 2)\] (Use shape–rate parameterisation.)

  1. Simulate 10,000 observations.
  2. Plot histogram with theoretical density overlay.
  3. Compute empirical vs theoretical mean and variance.
  4. Investigate how changing \(\alpha\) affects:
    • skewness
    • tail behaviour

Exercise 3: Beta Distribution

Let \[X \sim \text{Beta}(2,5).\]

  1. Simulate 10,000 observations.
  2. Plot histogram and density.
  3. Compute mean and variance.
  4. Repeat for:
    • Beta(0.5, 0.5)
    • Beta(5, 5)
  5. How do the shape parameters affect:
    • symmetry?
    • concentration?
    • boundary behaviour?

Exercise 4: Multinomial Distribution

Let

\[(X_1, X_2, X_3) \sim \text{Multinomial}(n=20, p=(0.2,0.5,0.3))\]

  1. Simulate 5,000 independent multinomial experiments.
  2. Compute:
    • sample means of each component
    • covariance matrix
  3. Verify: \(E[X_i] = np_i\)
  4. Verify that components are negatively correlated.
  5. Visualisation
    • Scatterplot of \(X_1\) vs \(X_2\)
    • Comment on dependence structure.
  6. Conceptual Question: Why must multinomial components be dependent?

8.2 Part 2: Joint distributions and dependence

Exercise 5

Explain why dependence matters in simulation. Give 3 examples.

Exercise 6: Visualising Dependence

Simulate two scenarios:

Case A: Independent variables

Let: \[ X \sim \mathcal{N}(0,1), \quad Y \sim \mathcal{N}(0,1) \]

Generate independently.

Case B: Dependent variables

Define:

\[ Y = 0.8X + \sqrt{1-0.8^2} Z \]

where \(Z \sim \mathcal{N}(0,1)\) independent of \(X\).

  1. Generate 5,000 observations for each case.
  2. Produce scatterplots.
  3. Compute correlation.
  4. Compare visually and numerically.

Exercise 7: Constructing a Joint Distribution (Discrete)

Let:

\[ P(X=0,Y=0)=0.2, \quad P(X=0,Y=1)=0.3, \] \[ P(X=1,Y=0)=0.1, \quad P(X=1,Y=1)=0.4 \]

  1. Verify probabilities sum to 1.
  2. Compute marginal distributions.
  3. Check if independent.
  4. Compute covariance manually.
  5. Simulate 10,000 draws from this joint distribution.
  6. Compare empirical and theoretical covariance.

Exercise 8: Zero Correlation ≠ Independence

Let:

\[ X \sim \mathcal{N}(0,1) \]

Define:

\[ Y = X^2 \]

  1. Simulate 10,000 observations.
  2. Compute correlation.
  3. Plot scatterplot.
  4. Explain why they are dependent despite near-zero correlation.
Exercise 9: Conditional Simulation

Let:

\[ X \sim \text{Gamma}(2,1) \]

Given \(X=x,\)

\[ Y|X=x \sim \text{Poisson}(x) \]

  1. Simulate 5,000 pairs.
  2. Plot scatterplot.
  3. Estimate:
    • (E[Y])
    • Compare with theoretical value using law of total expectation.

Hint:

\[ E[Y] = E[E(Y|X)] \]

Exercise 10: Simulating Bivariate Normal

Construct a portfolio simulation:

Let:

  • \(X_1, X_2\) be correlated returns (bivariate normal).
  • Portfolio return:

\[ R = 0.6X_1 + 0.4X_2 \]

Tasks:

  1. Simulate 10,000 returns.
  2. Estimate variance.
  3. Plot scatterplot and explain.
  4. Explain how ignoring dependence affects risk estimation.

Exercise 11

(Jones et al., 2014, Chapter 14, Exercise 23, p. 282)

A diagnostic test is used to determine whether or not a person has a certain disease. If the test is positive, then it is assumed the person has the disease; if negative, that they do not have it. However, the test is not 100% accurate.

  • If a diseased person is tested, it gives a negative result 5% of the time (a false negative).
  • When testing a person free of the disease, it gives a false positive 10% of the time.
  • Suppose we choose someone at random from a population in which only 1 person in 50 has the disease.
  1. Find the probability that their test result is positive.
  2. Find the probability that their test result is misleading.
  3. Find the probability that they actually have the disease if they test positive.

Exercise 12

(Jones et al., 2014, Chapter 14, Exercise 24, p. 282)

There are two bus lines which travel between towns \(A\) and \(B\).
Bus line \(A\) runs late 20% of the time, while bus line \(B\) runs late 50% of the time. You travel three times as often by line \(A\) as you do by line \(B\). On a certain day you arrive late. What is the probability that you used bus line \(B\) that day?

Exercise 13

(Jones et al., 2014, Chapter 14, Exercise 26, p. 283)

The dice game craps is played as follows. The player throws two dice, and if the sum is seven or eleven, then he wins. If the sum is two, three, or twelve, then he loses. If the sum is anything else, then he continues throwing until he either throws that number again (in which case he wins) or he throws a seven (in which case he loses). Calculate the probability that the player wins.

Extra Exercises

Distribution Identification

For each of the functions below check if it is a valid cumulative distribution function (cdf), probability density function (pdf) or a probability mass function (pmf). For valid cdf’s, compute the corresponding pdf or pmf. Also for each valid pdf or pmf compute the corresponding cdf.

\[ f_X(x) = \begin{cases} 6x - 6x^2 & 0 \le x \le 1 \\ 0 & \text{Otherwise} \end{cases} \]

\[ F_X(x) = \begin{cases} 0 & x < 0 \\ 0.2 & 0 < x \le 1 \\ 0.3 & 1 < x \le 2 \\ 1 & x > 2 \end{cases} \]

Scenario Analysis

For each of the scenarios below identify

  • Random Variable
  • Distribution and paramenters
  • Probability (mass) density function
  • Probability of interest

You may use direct calculations, use R or any other computing system.

  1. A husband has 6 tasks on his to-do list and a wife has 10 tasks on her to-do list. Four tasks are randomly picked up. We are interested in the number of tasks the wife will have to do. What is the probability that wife would have to do all four tasks.

  2. A couple likes to play chess together. Male is not good at the game and has only 10% chance of winning. The stubborn couple decides to play until male wins two games. We are interested in number of games couple would have to play. What is the probability that couple would play at least 15 games?

  3. A student is taking a true/false test that consists of 10 questions. The student has approximately 80% change of getting any individual question correct. We are interested in the number of questions a student answers correctly. What is the probability that student gets all ten questions right?

  4. A police officer has found that approximately 0.1% of the vehicles he pulls over fail the alcohol test. He is interested in the number of vehicles that will fail the alcohol test from next 100 vehicles he pulls over. What this the probability that 5 vehicles will fail the alcohol test from next 100 vehicles he pulls over?

  5. At a certain manufacturing company, approximately 2% of the products are defective. We are interested in the number of products that need to be checked before we hit rst defective item. What is the probability that third product checked will be rst defective item?

  6. The amount of time one spends in a bank is exponentially distributed with mean 10 minutes. What is the probability that the customer will spend more than 15 minutes in the bank?

  7. The smiling times of babies, in seconds, follow a uniform distribution between 0 and 23 seconds, inclusive. What is the probability that a randomly selected eight week old baby smiles between 2 and 18 seconds?

  8. A utility industry consultant predicts a cutback in the utility industry by a percentage speci ed by a Beta distribution with parameters \(\alpha = 1\) and \(\beta = 0.25\), in ve year period. What is the probability that Department of Hydrology will down size by 10% to 30%?

Joint Distributions and Dependence

  1. Let \((X_1, X_2)\) and \((Y_1, Y_2)\) be two discrete random vectors with the following probability functions:

Joint pmf of \((X_1, X_2)\):

\(x_1 \backslash x_2\) \(-1\) \(1\)
\(0\) \(1/6\) \(1/6\)
\(1/2\) \(1/3\) \(1/3\)

Joint pmf of \((Y_1, Y_2)\):

\(y_1 \backslash y_2\) \(-1\) \(1\)
\(0\) \(1/3\) \(0\)
\(1\) \(1/6\) \(1/4\)
\(2\) \(0\) \(1/4\)

Show that \(X_2\) and \(Y_2\) are identically distributed.

  1. Let \(X \sim N(7, 3^2)\), \(Y \sim N(5, 2^2)\) and \(\mathrm{Cor}(X,Y) = -0.2\).
  • Specify the joint probability density function for \((X,Y)\).
  • Find the distribution for \(Z = 3X + 4Y\), hence find \(P[Z > 50]\).
  • Find the distribution for \(Z = X - Y\), hence find \(P[Z < 0]\).
  • Specify the distribution of \(Z = 3 + 4X + 2Y\).
  • Specify \(E(X \mid Y)\).

References

Jones, O., Maillardet, R., & Robinson, A. (2014). Introduction to scientific programming and simulation using R (2nd ed.). Chapman & Hall/CRC. https://doi.org/10.1201/b17079