top of page
Basic Concepts and Terminology

Lean Six Sigma, a methodology focused on improving process efficiency and quality, often employs statistical tools to understand and solve problems. Hypothesis testing is one of these tools, playing a crucial role in decision-making processes. When data does not follow a normal distribution, which is common in real-world applications, specific approaches and considerations are necessary. Here, we'll explore the basic concepts and terminology related to hypothesis testing with non-normal data, under the umbrella of Lean Six Sigma.

Introduction to Hypothesis Testing

Hypothesis testing is a statistical method used to make inferences about a population based on sample data. The goal is to determine whether there is enough evidence to support a specific claim or hypothesis about the population characteristics. It is a critical tool in Lean Six Sigma projects for validating improvement efforts and making data-driven decisions.

Basic Concepts

1. Hypothesis

  • Null Hypothesis (H0): This is a statement of no effect or no difference. It's the hypothesis that researchers aim to test against and is assumed true until evidence suggests otherwise.

  • Alternative Hypothesis (H1 or Ha): This represents what the researcher wants to prove. It is a statement of effect or difference, suggesting that the observed outcomes are not due to chance.

2. Type I and Type II Errors

  • Type I Error (α): Occurs when the null hypothesis is wrongly rejected. It's also known as a "false positive."

  • Type II Error (β): Happens when the null hypothesis is wrongly accepted. It's referred to as a "false negative."

The balance between these errors is crucial in hypothesis testing, as minimizing one error increases the risk of the other. In Lean Six Sigma, decisions on acceptable levels of these errors depend on the project's specific context and the potential impact of decisions.

3. P-Value

The p-value measures the probability of observing the sample results (or more extreme) if the null hypothesis is true. A low p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, leading to its rejection in favor of the alternative hypothesis.

4. Test Statistics

This is a standardized value derived from sample data, used to decide whether to reject the null hypothesis. The choice of test statistic depends on the data type and whether it follows a normal distribution.

Hypothesis Testing with Non-Normal Data

When dealing with non-normal data, traditional parametric tests (which assume a normal distribution) might not be appropriate. Instead, Lean Six Sigma practitioners might use non-parametric tests that do not require normality assumptions. Examples include:

1. Mann-Whitney U Test

  • Purpose: Compares differences between two independent groups. It's utilized when comparing two unrelated samples to determine if they come from the same distribution, especially useful for ordinal data or non-normally distributed interval data.



2. Wilcoxon Signed-Rank Test

  • Purpose: Tests differences between two related samples. This test is applied to paired samples to assess whether their population mean ranks differ; it's the non-parametric counterpart to the paired t-test.



3. Kruskal-Wallis H Test

  • Purpose: Compares more than two independent groups. Serving as the non-parametric version of the one-way ANOVA, it tests if there's a difference in the median values of three or more independent groups.



4. Friedman Test

  • Purpose: Compares three or more paired groups. It acts as the non-parametric alternative to the one-way ANOVA with repeated measures, suitable for analyzing matched or paired samples across multiple groups.



5. Spearman's Rank Correlation Coefficient

  • Purpose: Assesses the strength and direction of association between two ranked variables. It's applied when the data do not meet Pearson's correlation coefficient assumptions, providing insight into monotonic relationships.



6. Kendall's Tau

  • Purpose: Measures the correlation between two ranked variables, offering an alternative to Spearman's rank correlation coefficient. It's particularly effective for data sets with many tied ranks. This test is not included in the body of knowledge for Black Belt.



7. Chi-Square Test of Independence

  • Purpose: Evaluates the independence of two categorical variables across different populations. Commonly used for categorical data, it also applies to non-normal data in testing associations.



8. Fisher's Exact Test

  • Purpose: Examines the independence of two categorical variables, similar to the Chi-Square Test but more accurate for small sample sizes or uneven data distributions.



9. Mood's Median Test

  • Purpose: Compares the medians of two or more groups. It provides an alternative to the Kruskal-Wallis H Test when distribution shape assumptions are violated, focusing on median comparisons.



10. Cochran's Q Test

  • Purpose: Tests differences in proportions across three or more matched or paired groups. This test is a non-parametric alternative to the repeated measures ANOVA for binary data. This test is not included in the body of knowledge for Black Belt.



11. Run Test

  • Purpose: Tests for randomness in a data sequence. It assesses whether the sequence of elements (e.g., highs and lows, positives and negatives) appears random, useful for detecting non-random patterns in data.


Terminology Specific to Non-Normal Data

  • Distribution-Free: Refers to tests that do not assume any specific distribution of the data.


  • Rank-Based Methods: Many non-parametric tests rank the data and compare these ranks instead of the actual data values.


  • Significance Level (α): The threshold used to decide whether the p-value indicates a statistically significant result. It is chosen before conducting the hypothesis test and often set at 0.05.



Conclusion

In Lean Six Sigma, understanding the basics of hypothesis testing with non-normal data is essential for analyzing processes and making improvements. By grasping these concepts and terminologies, practitioners can effectively apply statistical methods to drive quality and efficiency, even when dealing with challenging data distributions. This knowledge enables the selection of appropriate tests and the interpretation of results, ensuring that decisions are both data-driven and robust.

Curent Location

/412

Article

Rank:

Basic Concepts and Terminology

282

Section:

LSS_BoK_3.5 - Hypothesis Testing with Non-Normal Data

A) Introduction to Hypothesis Testing

Sub Section:

Previous article:

Next article:

bottom of page