top of page
Identifying Non-Normal Data

In the world of Lean Six Sigma, a methodology aimed at improving business processes by minimizing waste and variability, understanding the nature of your data is crucial. When it comes to hypothesis testing, the assumption of data normality underpins many statistical methods. However, real-world data often deviate from this idealized normal distribution, leading to non-normal data characteristics. Identifying non-normal data is a pivotal step in selecting the appropriate tools and techniques for analysis. This article delves into the characteristics of non-normal data, providing insights on how to identify such distributions effectively within the context of Lean Six Sigma.


Understanding Non-Normal Data

Before identifying non-normal data, it's essential to comprehend what normal data looks like. A normal distribution, often depicted as a bell curve, is symmetrical with most observations clustering around the mean, and the frequencies gradually decreasing as you move away from the center.

Non-normal data, by contrast, do not follow this pattern. They may exhibit skewness, where the distribution leans towards the left or right, or kurtosis, where the data show peakedness or flatness relative to a normal distribution. Recognizing these characteristics is the first step in identifying non-normal data.


Characteristics of Non-Normal Data


1. Skewness

Skewness measures the asymmetry of the distribution of data. A positively skewed distribution has a tail that stretches towards the right, indicating more frequent occurrence of lower values. Conversely, a negative skew indicates a tail extending to the left, suggesting higher values are more common.

The chart above illustrates skewness in distributions alongside a normal distribution for comparison:

  • The Blue curve represents a Normal Distribution, which is symmetrical around its mean, showing no skewness.

  • The Green curve represents a Right-Skewed Distribution, where the tail extends more to the right, indicating that there are more values on the lower end of the scale.

  • The Red curve represents a Left-Skewed Distribution, with the tail stretching towards the left, suggesting a concentration of higher values with fewer lower values.


2. Kurtosis

Kurtosis describes the peakedness or flatness of a distribution. High kurtosis means the data have heavy tails or outliers, indicating a peaked distribution. Low kurtosis, on the other hand, suggests a flatter distribution with light tails.

The chart above illustrates the concept of kurtosis on different distributions. It features three curves:

  • The blue curve represents a Normal Distribution, which has a kurtosis close to 0, indicating a distribution that closely follows the bell curve with no heavy tails.

  • The red curve is for a Leptokurtic Distribution (High Kurtosis), characterized by heavy tails and a sharper peak compared to the normal distribution. This indicates more data in the tails and a higher likelihood of outliers.

  • The green curve shows a Platykurtic Distribution (Low Kurtosis), which has lighter tails and a flatter peak. This suggests less extreme values (outliers) than a normal distribution.


3. Multimodality

This refers to distributions with more than one peak (mode). It suggests the presence of multiple subgroups within the data set, each with its own central tendency.

Here are three charts illustrating different distributions:

  • The first chart shows a normal distribution with 1 mode. This is a classic bell curve, centered around a single peak, indicative of a unimodal normal distribution.

  • The second chart represents a normal distribution with 2 modes. This distribution combines two different normal distributions, resulting in a bimodal pattern where there are two distinct peaks.

  • The third chart illustrates a normal distribution with 3 modes. This is achieved by combining three normal distributions, each with its own mean and standard deviation, creating a trimodal distribution with three separate peaks.


4. Outliers

Outliers are data points that deviate significantly from the rest of the data. They can distort the distribution shape, making it appear non-normal.

The chart above illustrates a distribution with outliers. The main body of the distribution appears normal, centered around zero, representing the bulk of the data. However, you can see a distinct set of outliers far from this central cluster, around the value of 10. These outliers significantly deviate from the rest of the data, demonstrating how they can distort the overall shape of the distribution and potentially mislead analysis if not properly accounted for or handled.


Identifying Non-Normal Data


Visual Inspection

One of the simplest methods to identify non-normal data is through visual inspection of graphs such as histograms, box plots, or Q-Q plots. These graphical representations can reveal skewness, kurtosis, multimodality, and outliers at a glance.


Statistical Tests

Several statistical tests can help confirm the non-normality of data. The Shapiro-Wilk test, Anderson-Darling test, and Kolmogorov-Smirnov test are commonly used to assess the normality of a distribution. These tests provide a p-value, where a low p-value (typically <0.05) indicates that the data do not follow a normal distribution.


Descriptive Statistics

Examining descriptive statistics such as the mean, median, mode, skewness, and kurtosis coefficients can offer insights into the distribution's shape. Significant differences between the mean and median, or high skewness and kurtosis values, suggest non-normality.


Implications for Lean Six Sigma Projects

Identifying non-normal data is crucial in Lean Six Sigma projects for several reasons:

  • Appropriate Analysis Techniques: Non-normal data often require non-parametric statistical methods, which do not assume a normal distribution.

  • Process Improvement Strategies: Understanding data distribution helps in diagnosing process issues and tailoring improvement strategies effectively.

  • Reliable Decision Making: Correctly identifying the nature of the data ensures that decisions based on statistical analyses are valid and reliable.


In conclusion, the ability to identify non-normal data is fundamental in Lean Six Sigma for ensuring the accuracy and effectiveness of hypothesis testing and data analysis. By recognizing the characteristics of non-normal data and employing the right tools for identification, practitioners can navigate the complexities of real-world data, leading to more informed decision-making and successful process improvement initiatives.

Curent Location

/412

Article

Rank:

Identifying Non-Normal Data

285

Section:

LSS_BoK_3.5 - Hypothesis Testing with Non-Normal Data

B) Non-Normal Data Characteristics

Sub Section:

Previous article:

Next article:

bottom of page