top of page
Basic Statistical Concepts

In the world of Lean Six Sigma, understanding basic statistical concepts is crucial for analyzing data, making informed decisions, and driving continuous improvement in processes. Statistics offer a mathematical foundation to quantify variability, assess performance, and identify areas of improvement. This article explores essential statistical concepts fundamental to Lean Six Sigma practices.

1. Mean (Average)

The mean is a basic statistical measure that represents the central tendency or average of a set of numbers. It is calculated by summing all the values in a dataset and then dividing by the number of observations. In Lean Six Sigma, the mean is used to determine the central performance of a process.

2. Median

The median is the middle value in a dataset when it is ordered from smallest to largest. If there is an even number of observations, the median is the average of the two middle numbers. Unlike the mean, the median is not affected by extremely high or low values, making it a useful measure of central tendency when dealing with skewed distributions.


The chart above illustrates the Mean and Median within a skewed distribution, emphasizing their roles in identifying the central tendency of a process in Lean Six Sigma:

  • The histogram shows the skewed distribution of the dataset, which is common in real-world data. This skewness affects how we interpret the central tendency.

  • The Mean (red dashed line) is influenced by the skewness and outliers in the distribution, pulling it towards the tail. This illustrates how the mean can sometimes give a misleading representation of "central" in skewed distributions.

  • The Median (green dash-dot line), on the other hand, marks the middle value of the dataset, effectively splitting it into two equal halves. It is less affected by skewness and outliers, providing a more accurate reflection of the dataset's central tendency in skewed distributions.


3. Mode

The mode refers to the most frequently occurring value in a dataset. There can be one mode (unimodal), two modes (bimodal), or more modes (multimodal) in a dataset. The mode is particularly useful in Lean Six Sigma for identifying common defects or the most frequent causes of process variations.


4. Range

The range is the simplest measure of variability. It is calculated by subtracting the smallest value in a dataset from the largest value. While easy to compute, the range can be sensitive to outliers and may not always provide a complete picture of a process's variability.



5. Standard Deviation

Standard deviation is a more comprehensive measure of variability. It quantifies the amount of dispersion or spread in a dataset relative to the mean. A low standard deviation indicates that the data points are close to the mean, while a high standard deviation suggests a wide spread of values. In Lean Six Sigma, standard deviation plays a crucial role in process capability and control chart analysis.


6. Variance

Variance measures the average squared deviations from the mean, providing another perspective on data dispersion. It is the square of the standard deviation. Although variance is not as intuitively understood as standard deviation, it is important in statistical analysis for comparing variability between different datasets.


7. Probability Distributions

Probability distributions describe how the values of a random variable are distributed. Common distributions in Lean Six Sigma include the normal distribution (bell curve), binomial distribution, and Poisson distribution. Understanding these distributions helps in modeling process behaviors and making predictions.



8. Hypothesis Testing

Hypothesis testing is a statistical method used to make decisions using data. It involves testing an assumption (hypothesis) about a population parameter. In Lean Six Sigma, hypothesis testing is used to determine whether a process change has statistically significant effects on performance.


9. Confidence Intervals

A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, based on a given set of sample data. It provides a measure of uncertainty around the mean measurement. Confidence intervals are pivotal in Lean Six Sigma for assessing the reliability of process improvements.


The chart overlays the 95% CI on the normal distribution curve representing the sampling distribution of the mean.

10. Control Charts

Control charts are used to monitor process performance over time. They help identify trends, shifts, or any unusual patterns in the process, facilitating timely interventions before defects occur. Understanding variability through standard deviation is key to constructing and interpreting control charts.


11. Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) is a statistical technique used to compare the means of three or more samples to understand if at least one of the sample means significantly differs from the others. This is useful in Lean Six Sigma for analyzing the impact of different factors or treatments on a process to identify which factors significantly affect the process outcome.

The chart displays an ANOVA analysis using boxplots for three groups (A, B, C), highlighting differences in their means and variances. ANOVA tests if these differences are statistically significant, crucial for identifying impactful factors in Lean Six Sigma processes.

12. Non-parametric Tests

In situations where data do not meet the assumptions required for parametric tests (such as normal distribution), non-parametric tests provide an alternative for hypothesis testing. These tests are useful in Lean Six Sigma for analyzing ordinal or nominal data, or when the sample size is too small to reliably estimate the population parameters.

13. Parametric Test

Parametric tests are statistical techniques that assume the dataset has a specific distribution, usually a normal distribution. These tests are used when the data meet certain conditions or assumptions, such as the scale of measurement, distribution, and homogeneity of variances. Parametric tests are more powerful than their non-parametric counterparts when these assumptions hold true, as they are more likely to detect true effects or differences when they exist. This increased power comes from utilizing the parameters (mean and standard deviation) of the distribution they assume the data to follow.


Conclusion

Basic statistical concepts form the backbone of Lean Six Sigma methodology, enabling practitioners to analyze data effectively, understand process behavior, and make decisions based on empirical evidence. Mastery of these concepts leads to better problem-solving, process improvements, and, ultimately, enhanced organizational performance. Lean Six Sigma's emphasis on data-driven decision-making underscores the importance of statistical knowledge in achieving operational excellence.

Videos

(I recommend to watch the full playlist)


Curent Location

/412

Article

Rank:

Basic Statistical Concepts

176

Section:

LSS_BoK_2.2 - Six Sigma Statistics

A) Introduction to Six Sigma Statistics

Sub Section:

Previous article:

Next article:

bottom of page