Tukey came up with a simple regression fit that involved removing the middle third of the data. So you have some precedent for this approach.

I would not be as critical as some of the others on the list. Sometimes a categorical variable is easier to interpret. A lot of dietary research, for example, looks at the highest quintile of fat consumption and compares it to the lowest quintile. I can visualize those two groups pretty well. Furthermore, categorization mitigates some of the problems caused by measurement error.

If I were doing it myself, I would almost never dichotomize. But I wouldn’t be too upset if someone else did it, especially if the data set were already quite large.

Steve Simon, ssimon@cmh.edu, Standard Disclaimer.

Related Article: Preacher, K. J., Rucker, D. D., MacCallum, R. C., & Nicewander, W. A. (2005). Use of the extreme groups approach: A critical reexamination and new recommendations. Psychological Methods, 10, 178-192.

The Extreme Groups Approach (EGA) involves the investigation of the relationship between continuous variable X and criterion variable Y. Although X is continuous, the researcher elects to obtain data on only those cases which have high or low values of X. Assuming Y is also continuous, Y is then correlated with those extreme values of X. When the relationship between X and Y is linear, this method can actually be more powerful than correlating the full range of X with Y, holding sample size constant. Note that if you obtain data on the full range of X and then throw out the middle scores, you are not holding sample size constant and are likely to lose power by the loss of cases. Past research has indicated that power is likely to be greatest if you select those cases in the upper and lower quartiles of X. Preacher et al. remind us that power is not everything. Unless we are uninterested in the relationship between Y and intermediate levels of X, it seems more sensible to relate the full range of X to Y.

The authors show that EGA will result in upwardly biased estimates of the size of the effect (association between full range of X and Y), which can be, but are not likely to be, adjusted to remove (some of) such bias – after all, what researcher wants to do more arithmetic just to make her findings appear less impressive – only the honest researcher who isn’t all that interested in getting published.

Apparently some researchers try to justify EGA by arguing that it enhances the reliability of their measurement of X by eliminating the less reliable measurements in the middle of the distribution. It is not, however, generally true that measurements in the middle will be less reliable (quite the contrary is expected), and any apparent increase in the reliability of the measurement of X is an artifact of EGA.

After scolding colleagues for dichotomizing a continuous variable, I have sometimes been told that the dichotomization is justified because it estimates an underlying dichotomous characteristic (Type A versus Type B personality, for example).

EGA is sometimes justified because the researcher is interesting in studying interaction or moderator effects and is not aware that these can be studied without categorizing continuous variables. I must confess that I have done this myself, not