Dichotomizing Continuous Variables: A Bad Idea
I posted the following query on the EDSTAT-L list early in 2003:
When interested in the relationship between two continuous variables, some researchers will dichotomize one of them prior to analysis. I generally discourage such dichotomization, but the practice is common. A colleague asked me today about the practice of dichotomizing by a median split (top half versus bottom half) versus the practice of using only the tails (bottom third versus top third, for example). That is, if you are going to dichotomize a continuous subject variable and compare the resulting two groups on a second continuous variable, even though that is not generally a good idea, is it more useful (less destructive) to use a median split (upper half vs lower half) or to compare the tails (such as upper third versus lower third)?" I suggested to my colleague that this would depend, in part, on the form of the relationship between the two continuous variables (not necessarily strictly linear), and reminded him that throwing out the middle of the distribution would reduce N and thus might reduce power too. I vaguely recall having read an article or two on this matter long ago (not the recent articles on why not to dichotomize, but rather on how best to do it if you feel you must), but cannot put my finger on the article(s). Can any of you all?
Here are some of the interesting responses I got:
Dennis Roberts quickly made several comments disparaging the practice of such dichotomization, including:
Why toss away information from the data?
If you use top 1/3 and bottom 1/3 ... you are also throwing data away ... which is worse than just lowering the information value of it.
David Howell noted:
There is an excellent paper on median splits by MacCallum et al. in Psychological Methods, 2002, 7, 19-40.
There is also a equally good paper by Julie Irwin and Gary McClelland in a marketing journal. It either just came out or it is in press. (I couldn't find it with PsycInfo, so it may not be out yet.)
Both papers agree with Karl's advice. "When you think about a median split, DON'T."
Gary McClelland added much detail: