Back to homepage

The "Square Q Test"

A different approach to visualizing the Tukey HSD Post-HOC Test

Background

T-Test

Say you have two samples, and you want to determine if they come from the same population, i.e. are they "different". You could just compare their means and if they are different then you are good to go... right? Well what if they are pretty close? How close is close enough?

To test this we have the t-test. We can test if two samples are significantly different from one another.

In this example we see that the two-sample T-Test confirms that these two samples $x$ and $y$ are different!

ANOVA

This is great, except that what if there are more than two samples? Then we can use Analysis of Variance (ANOVA) to determine if at least one of the samples is different.

The result of this ANOVA shows that there is a significant difference somewhere in the data, but $y$ and $z$ were generated with the same mean! We need post-hoc analysis in order to determine which are different.

Experiment Wise Error Rate

You may be tempted to use the t-test to compare each of the groups, however, with an error rate of 0.05 specified, after just 3 comparisons the error rate increases to 0.143. This is a big problem. As we add more groups and need to make more comparisons, the error rate of all of the tests together increases very quickly. This is called the experiment wise error rate (EWER). What we need is a test which maintains a specified error rate with all of the tests.

$$ EWER = 1 - (1 - \alpha)^{m} $$

Tukey Honest Significant Differences

We can use the Tukey Honest Significant difference test to determine which groups are different and maintain a specified EWER. The following plot will show which are different. The comparisons are denoted on the y-axis and the difference in the means is denoted on the x-axis. The error bars on either side are determined by the Q-distribution. If the error bars cross the 0 line, then there is not as significant differnce. If they do not then there is.

As we can see above, $x$ and $y$ are different, $x$ and $z$ are different, but $z$ and $y$ are not (as expected).

The Problem.

Obviously we don't HAVE to use a plot to conduct the Tukey HSD, but personally I like to be able to visualize the differences, I think it helps to understand the data. The issue with this plot, can be seen below.

The problem should be clear in the Tukey HSD plot. The number of comparisons is $\binom{N}{2}$ where $N$ is the number of groups. But as we add more groups the number of comparisons grows quickly and the data become harder to visualize.

The Solution: The Square-Q Plot

Here is my solution to the problem, the Square-Q plot.

How to Read It.

The above chart might look kind of intimidating at first, there is a lot going on and there are a lot of colors.

Considerations

The bottom line for Tukey HSD is that the absolute difference between the means needs to be larger than the range specified by the Q distribution and scaled by the Mean Squared Error (MSE) and the sample sizes of the groups being compared.

$$ | \bar{X}_i - \bar{X}_j | > Q_{\alpha,k,N-k} \sqrt{\frac{MSE}{2}(\frac{1}{n_i} + \frac{1}{n_j})} $$

That is where this graph can run into problems, if the sample sizes are widely different then for each comparison the range will be different. For this test it uses the minimum sample size for each comparison which guarantees that any difference found by this test will also be found by a Tukey HSD, however if the sample sizes are too widely different and the means are close to the edge of the range then there is potential for some differences to be missed.

$$ Q_{\alpha,k,N-k} \sqrt{\frac{MSE}{2}(\frac{1}{n_i} + \frac{1}{n_{min}})} \geq Q_{\alpha,k,N-k} \sqrt{\frac{MSE}{2}(\frac{1}{n_i} + \frac{1}{n_j})} $$

but if all sample sizes $n_i$ are the same, then these quantities are equal.

Conclusion

This different approach to visualizing the the Tukey HSD, while not quite as precise as the Tukey HSD, provides valuable insight into categorical data and enables conclusions to be drawn quickly, and identifies areas which require further analysis.