Common Multivariate Control Charts

10 minute read

Published:

This post introduces basic overviews and examples of two of the most common multivariate statistical process monitoring (MSPM) methods: the $T^2$ and MEWMA control charts.

Code to produce this blog post can be found in this GitHub repository.


Both methods discussed below are multivariate statistical methods for fault detection. To understand the specifics of each method, we establish the following notation:

  • $\mathbf{x}_t = (x_{1t}, x_{2t}, \ldots, x_{pt})$ represents the values of $p$ different variables observed at time $t$
  • $t \in \{1, 2, \ldots, n\}$. That is, there are $n$ time points (observations) where the $p$ variables are observed
  • $\boldsymbol{\mu} = (\mu_1, \mu_2, \ldots, \mu_p)$ is the in-control (IC) mean vector
  • $\boldsymbol{\Sigma}$ is the IC covariance matrix
  • $\boldsymbol{\hat{\mu}}$ and $\boldsymbol{\widehat{\Sigma}}$ are estimates of the mean vector and covariance matrix, respectively

Hotelling’s $T^2$ Chart

The $T^2$ control chart was developed by Harold Hotelling1 and was the first commonly used approach for multivariate statistical process monitoring (MSPM). The $T^2$ chart is simple, easy to use, and has been shown to perform well at detecting large shifts, but not at detecting small to moderate shifts or gradual drifts.23

To monitor new data for abnormalities, we can construct the $T^2$ chart by computing

\[T^2_t = (\mathbf{x}_t - \boldsymbol{\mu})' \boldsymbol{\Sigma}^{-1} (\mathbf{x}_t - \boldsymbol{\mu})\]

and checking if

\[T^2_t > \chi^2_{1-\alpha, p},\]

where $\chi^2_{1-\alpha, p}$ is the $1-\alpha$ quantile of the $\chi^2$ distribution with $p$ degrees of freedom.

The problem with the chart described above is that it assumes the IC mean and covariance matrix, $\boldsymbol{\mu}$ and $\boldsymbol{\Sigma}$, are known. This is almost never true.

For more realistic problems, we instead compute

\[T^2_t = (\mathbf{x}_t - \boldsymbol{\hat{\mu}})' \boldsymbol{\widehat{\Sigma}}^{-1} (\mathbf{x}_t - \boldsymbol{\hat{\mu}})\]

and check if

\[T^2_t > \frac{p(n^2 - 1)}{n(n-p)}F_{1-\alpha, p, n-p}.\]

Observations where the monitoring statistic $T^2$ exceed the upper threshold are flagged as out-of-control (OC), and we say that a fault occurred or is ongoing at that time.

Note: The $T^2$ statistic relies upon data only at time $t$ and therefore does not incorporate information from previous points in time.

Assumptions

The $T^2$ chart relies on two main assumptions:

  • independence
  • multivariate normality

In many processes, data is collected at a high frequency over time, violating the assumption of independence. Additionally, many processes with computed or complex variables may violate the assumption of multivariate normality due to heavy-tailed or skewed data.

Mild violations of these assumptions can still yield adequate fault detection results with the $T^2$ chart, but more severe violations typically result in poor performance4.


Multivariate Exponentially Weighted Moving Average (MEWMA) Chart

The MEWMA chart5 is a popular approach to monitoring multivariate data that excels at detecting small to moderate shifts.

In practice, the MEWMA is constructed by first computing

\[\mathbf{q}\_t = \lambda(\mathbf{x}\_t - \boldsymbol{\hat\mu}) + (1-\lambda)\mathbf{q}_{t-1},\]

where

  • $\lambda$ is a smoothing parameter, often selected to be between 0.05 and 0.2
    • smaller values of $\lambda$ are better for detecting smaller shifts
  • $\mathbf{q}_0$ is set to 0.

Then, the chart classifies observations as OC when

\[p_{\text{MEWMA}} = \mathbf{q}\_t' \boldsymbol{\widehat{\Sigma}}^{-1}_{\mathbf{q}_t} \mathbf{q}_t > h,\]

where

  • $h$ is a threshold selected to ensure a desired IC average run length (ARL), which is the expected number of observations before the chart is expected to falsely classify an observation as OC when the process is IC
  • $\boldsymbol{\widehat{\Sigma}}_{\mathbf{q}_t} = [\lambda(1 - (1-\lambda))^{2t}]\boldsymbol{\widehat{\Sigma}}$, which converges to $(\lambda/(2-\lambda))\boldsymbol{\widehat{\Sigma}}$ asymptotically.

Notice that if we select $\lambda = 1$, the MEWMA chart is equivalent to the $T^2$ chart.

Note: The MEWMA monitoring statistic relies on the data both at time $t$ and at previous time points, allowing this method to better detect small accumulations over time than the $T^2$ chart.

Assumptions

The assumptions of the MEWMA chart are similar to the $T^2$ chart:

  • independence
  • multivariate normality

The MEWMA chart is fairly robust to violations of independence and can even perform well under moderate deviations from multivariate normality if $\lambda$ is sufficiently small6.


Examples: Detecting Shifts in Means

Examples below demonstrate the performance of each chart when detecting three types of faults (shifts) in mean vectors during Phase II:

  • isolated faults
  • sustained faults
  • transient faults

Isolated faults are when the means shift at isolated points in time (random outliers).

Sustained faults are when the means shift at some point in time and then remain shifted afterward.

Transient faults are brief periods of time where the means have shifted but return to the IC mean after a few observations.

For each example below, we first generate 500 Phase I observations from a 3-dimensional multivariate standard normal distribution. These observations are used to estimate the IC mean vector and covariance matrix.

Then, the $T^2$ chart and the MEWMA chart (with $\lambda = 0.1$) are both used to monitor 200 new observations during Phase II.

Example: Detecting Isolated Faults

To monitor isolated faults, we generate the following Phase II data:

  • $\mathbf{x}_t \sim \text{N}_3(\mathbf{0}, \mathbf{I}), \quad t \in \{1, 2, \ldots, 200\}$

Then, we randomly replace 5 of the 200 points with draws from a shifted distribution.

Here, the randomly selected points are $t = 34, 66, 135, 138,$ and $179$. So, we have the following draws for our isolated faults (outliers):

  • $\mathbf{x}_t \sim \text{N}_3(\mathbf{3}, \mathbf{I}), \quad t \in \{34, 66, 135, 138, 179\}$

We want to see if the $T^2$ and MEWMA charts can detect these random isolated outliers among the rest of the observations, all of which are IC.

The $T^2$ and MEWMA charts for this data are given in the figure below.

Notice the following:

  • all outliers are immediately detected by the $T^2$ chart and are flagged as OC
  • not all outliers are detected by the MEWMA chart
    • the outlier at osbervation 138 was detected because another outlier recently occurred at 135

Recall that the $T^2$ monitoring statistic relies solely on the values of the data at time $t$, while the MEWMA chart relies on data at time $t$ and previous time points. As a result:

  • isolated faults will often go undetected unless $\lambda$ is selected to be large
    • (small values of $\lambda$ give less weight to the outlier when computing the monitoring statistic)

Example: Detecting a Sustained Fault

To introduce a sustained fault, we generate the following data:

  • $\mathbf{x}_t \sim \text{N}_3(\mathbf{0}, \mathbf{I}), \quad t \in \{1, 2, \ldots, 50\}$
  • $\mathbf{x}_t \sim \text{N}_3(\mathbf{2}, \mathbf{I}), \quad t \in \{51, 52, \ldots, 200\}$

So, we are trying to detect the change in the means from 0 to 2, where all means have shifted the same magnitude in the same direction.

The $T^2$ and MEWMA charts for this data are given in the figure below. Notice that the monitoring statistics for both methods quickly jump above the threshold once the shift occurs at observation index 50.

We see that

  • both charts detect the shift almost instantly
  • the $T^2$ chart does not consistently remain above the threshold after the shift
    • observations with $T^2$ values below the threshold are not flagged even though the mean vector has shifted
  • the MEWMA chart remains above the threshold after the shift
    • all observations after the shift are flagged as OC

In summary, both the $T^2$ and MEWMA charts are able to immediately detect the shift in the mean vector. Furthermore, the $T^2$ chart does not classify all observations after the shift as OC, but the MEWMA chart does.

Example: Detecting Transient Faults

To introduce transient faults, we generate the following data:

  • $\mathbf{x}_t \sim \text{N}_3(\mathbf{0}, \mathbf{I}), \quad t \in \{1, 2, \ldots, 50, 53, 54, 55, \ldots, 73, 77, 78, 79, \ldots, 130, 141, 142, 143, \ldots, 200\}$
  • $\mathbf{x}_t \sim \text{N}_3(\mathbf{2}, \mathbf{I}), \quad t \in \{51, 52, 53, 74, 75, 76, 131, 132, 133, \ldots, 140\}$

In other words, there is a shift in the mean vector from $(0, 0, 0)’$ to $(2, 2, 2)’$ starting at $t = 51$ and $t = 73$ that lasts for 3 observations and a shift at $t = 131$ that lasts 10 observations. All other observations are drawn from the IC distribution.

Plots of the $T^2$ and MEWMA charts for the data described above are shown below.

We see that

  • the $T^2$ chart detects the transient faults immediately, but the fault at $t = 131$ is not fully flagged as OC
  • the MEWMA chart detects the first and third transient faults, but not the shift at $t = 73$
    • the chart flags all observations as OC during the shift at $t = 131$

Conclusion

In summary, both the $T^2$ and MEWMA charts are great tools for detecting faults in multivariate processes. The $T^2$ chart performs better at quickly detecting large or isolated faults, while the MEWMA chart performs better at detecting small to moderate faults over time.

Both the $T^2$ and MEWMA charts assume independence and multivariate normality. In the examples above, observations were generated according to these assumptions. However, if either of these assumptions is violated, adjustments are required to achieve desirable fault detection performance.

Some examples of adjustments include:

  • (dependence in the data): fitting the $T^2$ and MEWMA charts to detrended values from a regression, machine learning, or time series model fit
  • (non-normality): using a nonparametric control chart or a nonparametric estimate for the threshold (control limit).

  1. Hotelling, H. (1947). Multivariate quality control. In Eisenhart, C., Hastay, M. W., and Wallis, W., editors, Techniques of Statistical Analysis, pages 111–184. McGraw-Hill, New York 

  2. Sullivan, J. H. and Woodall, W. H. (1996). A comparison of multivariate control charts for individual observations. Journal of Quality Technology, 28(4):398–408. 

  3. Bersimis, S., Psarakis, S., and Panaretos, J. (2007). Multivariate statistical process control charts: An overview. Quality and Reliability Engineering International, 23(5):517–543. 

  4. Alfaro, J.-L. and Ortega, J.-F. (2012). Robust Hotelling’s $T^2$ control charts under non-normality: The case of $t$-Student distribution. Journal of Statistical Computation and Simulation, 82(10):1437–1447 

  5. Lowry, C. A., Woodall, W. H., Champ, C. W., and Rigdon, S. E. (1992). A multivariate exponentially weighted moving average control chart. Technometrics, 34(1):46 

  6. Stoumbos, Z. G. and Sullivan, J. H. (2002). Robustness to non-normality of the multivariate EWMA control chart. Journal of Quality Technology, 34(3):260–276