Central Limit Theorem and Normal Approximations

6 minute read

Published: August 15, 2024

This post discusses the classical Central Limit Theorem and demonstrates its usage through the Normal approximation of the Binomial distribution with a Shiny app.

What is the Central Limit Theorem?

The classical Central Limit Theorem (CLT) is a theorem that states that, when certain conditions are met, the sampling distribution of a (normalized) sample mean converges to a (standard) normal distribution, even if the original variables are not normally distributed.

The formal definition of the classical CLT is as follows:

Central Limit Theorem: Let $X_1, X_2, \ldots, X_n$ be independent and identically distributed random variables from some distribution with $\text{E}(X_i) = \mu$ and $\text{Var}(X_i) = \sigma^2 < \infty$. Denote $\bar{X}_n$ as the sample mean of the $n$ random variables. Then

\[\sqrt{n}(\bar{X}_n - \mu) \xrightarrow{d} \text{N}(0, \sigma^2),\]

which effectively means that as $n$ gets larger, the distribution of $\bar{X}_n$ gets more “normal”. This “classical” form of the CLT is actually known as the Lindeberg–Lévy CLT. It is important to note that there are other CLTs with different/relaxed assumptions regarding aspects of independence or identically distributed random variables, but they are beyond the scope of this post.

Why is the CLT important?

Because the CLT uses known population parameters such as $\mu$ and $\sigma$, which are almost never known in practice when working with real data, the CLT may initially seem like a purely theoretical result without much application. However, when used in conjunction with other important ideas, such as Slutsky’s theorem and the Delta method, the CLT allows us to approximate distributions of many kinds of statistics of interest, including means and variances.

In short, the CLT is the basis of many statistical procedures and is often used to

construct confidence intervals
perform a variety of statistical tests
understand and approximate the distribution of various statistics

Example: Binomial distribution

A simple example that demonstrates the CLT is to use the normal distribution to approximate the binomial distribution.

To understand the binomial distribution, let’s first define a Bernoulli random variable. A random variable $X$ follows the Bernoulli distribution if:

$X$ can be either 1 (“success”) or 0 (“failure”)
$p$ is the probability of success

When we have a collection of independent and identically distributed Bernoulli random variables $X_1, X_2, \ldots, X_n$, then the sum $Y = \sum_{i=1}^n X_i$ follows a binomial distribution. Specifically, $Y \sim \text{Binomial}(n, p)$. Simply put, the binomial distribution is a probability distribution around the total number of successes $Y$ out of $n$ trials, where each trial has a probability of success $p$.

The probability mass function of the binomial distribution is given by

\[\text{P}(Y = y) = {n\choose{y}} p^y (1-p)^{n-y}.\]

Consider rolling a fair die. If we define “success” as rolling a 5, then there is a 1/6 probability of success on each roll. Similarly, if “success” is rolling an even number, then $p = 3/6 = 1/2$.

rolling a die only once ($n = 1$) $\implies$ we have a Bernoulli random variable.
rolling a die multiple times ($n > 1$) $\implies$ we have a binomial random variable.

Suppose we roll a fair die 100 times. What’s the probability that we roll a 5 at least 25 times? To compute this, we need to use the following information:

$p$ = 1/6 (success probability on each roll (rolling a 5))
$n$ = 100 (times rolling the die)
$y = 25, 26, \ldots, 100$ (how many 5’s we want out of 100 rolls of a die)

We know that the probability of rolling a 5 exactly 25 times is $\text{P}(Y = 25) = {100 \choose{25}} (1/6)^{25} (5/6)^{75} \approx 0.01$. However, we want to know about rolling a 5 either $25, 26, \ldots,$ up to 100 times. We can calculate this probability as

\[\begin{align*} \text{P}(\text{atleast 25 5's in 100 die rolls}) &= \text{P}(Y = 25) + \text{P}(Y = 26) + \ldots + \text{P}(Y = 100)\\ &= \sum_{i=25}^{100} \text{P}(Y = i)\\ &= \sum_{i=25}^{100} {n \choose{i}} p^i (1-p)^{n - i}\\ &= \sum_{i=25}^{100} {100 \choose{i}} (1/6)^i (1-(1/6))^{100 - i} = 0.0217. \end{align*}\]

So, there’s approximately a 2.17% chance of rolling at least 25 5’s (or 1’s, 2’s, 3’s, 4’s, or 6’s, since they all have the same probability) in 100 rolls of a fair die.

As you can imagine, this expression becomes more difficult to compute as we $n$ gets larger, especially because we have factorials to compute due to the ${n \choose{y}}$ component. Computationally, this isn’t as big of a problem now as it was decades ago. However, one way to get around computing all those factorials and sums is to use the normal approximation to the binomial distribution.

We can use the CLT to obtain the an approximate distribution for $Y$ as a $\text{N}(np, np(1-p))$ since $\text{E}(Y) = np$ and $\text{Var}(Y) = np(1-p)$.

Now, we use the approximate $\text{N}(16.67, 13.89)$ probability distribution to compute $P(\text{Y} \ge 25)$, which yields a probability of 0.0127. This value is somewhat close to 0.0217, but it’s not quite right.

we can improve normal approximations using a continuity correction
- simply compute $P(\text{Y} \ge 24.5)$, which gives a probability of 0.0178, which is much closer to 0.0217.

It turns out the as we increase $n$ or as $p$ gets closer to 0.5, the normal approximation to the binomial distribution gets better and better. However, if $p$ is close to 0 or 1, or if $n$ is small, the approximation is not very good.

There are several rules of thumb when using the normal approximation to the binomial distribution. Two simple ones are:

$np > 5$
$n(1-p) > 5$

Visualizing the normal approximation

I made a simple Shiny app to visually demonstrate the effect of changing $n$ and $p$ on the normal approximation to the binomial distribution. The normal approximation to the Poisson distribution is also included in the app.

See the figure below for an example. We see that when the rules of thumb are violated, the normal approximation doesn’t quite match up to the binomial distribution. However, as $n$ increases, the approximation improves, even for values of $p$ close to 0 or 1.

If you want to test it out, I’ve embedded the shiny app below. It can also be accessed in full-screen at this website.

Share on

Twitter Facebook LinkedIn

Taylor Grimm, Ph.D.

Central Limit Theorem and Normal Approximations

What is the Central Limit Theorem?

Why is the CLT important?

Example: Binomial distribution

Visualizing the normal approximation

Share on

You May Also Enjoy

Modern Data Science Tools: Getting Started with Databricks and Spark

Neural Networks with `torch`: Revisiting the Abalone Data

Analysis Walkthrough: Supervised Classification with Bank Churn Data

Hypothesis Testing: Error Types and Power

Taylor Grimm, Ph.D.

What is the Central Limit Theorem?

Why is the CLT important?

Example: Binomial distribution

Visualizing the normal approximation

Share on

You May Also Enjoy

Modern Data Science Tools: Getting Started with Databricks and Spark

Neural Networks with torch: Revisiting the Abalone Data

Analysis Walkthrough: Supervised Classification with Bank Churn Data

Hypothesis Testing: Error Types and Power

Neural Networks with `torch`: Revisiting the Abalone Data