Goodness-of-Fit Test
The Chi-Square Goodness-of-Fit test is a non-parametric hypothesis test used within the Lean Six Sigma framework to compare an observed distribution to an expected probability distribution. This test is instrumental in determining how well theoretical distributions, such as the normal, binomial, or Poisson distributions, fit the observed data. Its application spans various industries and processes, making it a versatile tool in the Lean Six Sigma toolkit for process improvement and quality control.
Purpose of the Chi-Square Goodness-of-Fit Test
The primary purpose of the Chi-Square Goodness-of-Fit test is to assess the discrepancy between observed frequencies and the frequencies expected under a specific hypothesis. It serves as a critical tool for validating process changes, distribution assumptions, and adherence to customer specifications in quality management projects.
When to Use the Chi-Square Goodness-of-Fit Test
Distribution Testing: To verify if data follows a specific distribution, such as normal, exponential, or uniform distribution.
Categorical Data Analysis: When analyzing categorical data across different categories to see if they follow a hypothesized distribution.
Fit for Proportions: To test if the observed proportions of categories match the expected proportions.
How to Conduct a Chi-Square Goodness-of-Fit Test
Define Hypotheses:
Null Hypothesis (H0): There is no significant difference between the observed and expected frequencies.
Alternative Hypothesis (H1): There is a significant difference between the observed and expected frequencies.
Collect Data: Gather observed frequency data across different categories or groups.
Determine Expected Frequencies: Calculate the expected frequencies based on the hypothesized distribution or proportion.
4. Calculate the Chi-Square Statistic: The Chi-Square statistic x^2 is calculated using the formula:
Where Oi is the observed frequency for category i and Ei is the expected frequency for category i.
5. Determine Degrees of Freedom: The degrees of freedom (df) for the test is df=n−1, where n is the number of categories.
6. Find the Critical Value: Refer to the Chi-Squared distribution table to find the critical value for the given degrees of freedom and significance level (α, typically 0.05).
7. Make a Decision: If the Chi-Square statistic is greater than the critical value, reject the null hypothesis. Otherwise, do not reject the null hypothesis.
Applications in Lean Six Sigma Projects
Process Improvement: Identifying discrepancies between current and desired process performance.
Quality Control: Testing if product defects follow an expected distribution, aiding in root cause analysis.
Customer Satisfaction: Analyzing customer feedback categories against expected outcomes.
Conclusion
The Chi-Square Goodness-of-Fit test is a powerful statistical tool in Lean Six Sigma projects for testing hypotheses about the distribution of observed data. By comparing observed frequencies against expected frequencies, practitioners can make informed decisions about process improvements, quality control, and customer satisfaction strategies. Understanding and applying this test correctly can significantly contribute to the success of Lean Six Sigma initiatives.
Scenario
The Chi-Square Goodness-of-Fit test is a non-parametric test used to determine if a sample data matches a population. For this example, let's consider a real-life scenario where a company wants to test if the distribution of colors of the marbles they produce matches their target distribution.
A company produces marbles in four colors: Red, Blue, Green, and Yellow. The target production distribution is 25% for each color. After one week of production, they sampled 100 marbles, getting the following distribution:
Red: 20
Blue: 30
Green: 25
Yellow: 25
Objective: To test if the observed distribution matches the expected (target) distribution.
Step 1: Set the Hypotheses
Null Hypothesis (H0): The color distribution of the marbles follows the target distribution.
Alternative Hypothesis (H1): The color distribution of the marbles does not follow the target distribution.
Step 2: Calculate Expected Frequencies
Given the target distribution is equal for all colors (25% each), the expected frequency for each color in a sample of 100 marbles is:
Expected frequency = Total sample size * Probability of each category
Expected frequency for each color = 100 * 25% = 25
So, the expected frequencies are:
Red: 25
Blue: 25
Green: 25
Yellow: 25
Step 3: Compute the Chi-Square Statistic
The Chi-Square statistic (χ2) is calculated using the formula:
where Oi is the observed frequency and Ei is the expected frequency for each category.
Summing these values gives the total Chi-Square statistic:
χ2=1+1+0+0=2
Step 4: Determine the Critical Value
The critical value depends on the significance level (α) and the degrees of freedom (df). Degrees of freedom for the goodness-of-fit test are calculated as: df=Number of categories−1=4−1=3
Assuming a significance level of α=0.05, you would consult a Chi-Square distribution table to find the critical value for df=3.
Step 5: Make the Decision
If the calculated χ2 statistic is greater than the critical value from the Chi-Square distribution table, we reject the null hypothesis. Otherwise, we do not have enough evidence to reject it.
For df=3 and α=0.05, the critical value from the Chi-Square table is approximately 7.815 (see table below). Since our calculated χ2 value of 2 is less than 7.815, we do not reject the null hypothesis.
Conclusion
Based on the Chi-Square Goodness-of-Fit test, there is not enough evidence to conclude that the color distribution of the marbles produced by the company does not follow the target distribution.