X hits on this document

Word document

Dichotomizing Continuous Variables: A Bad Idea - page 4 / 6





4 / 6

psychometric stance. Median splits lead to an underestimation of the variable's SD one splits. That restriction of range leads to an underestimate of the effect size r. Using the upper and the lower thirds leads to enhancement of range (the SD of the variable split gets larger than using all data points) and to an overestimate of the effect size r.

(BTW:The calculation of Gary McClelland in his last posting saying that "the extreme thirds will reduce the expected r^2 to 79% of what it would have been"  therefore must be wrong, probably due to not using the continuous information of all remaining data points). The multiplier, say S, which is biasing the estimate is basically a function of the quotient of the SD of  the restricted/enhanced SD in relationship to the original SD and the correlation r (don't have the exact equation handy, but it can be found in Hunter&Schmidt' textbook about meta-analysis or in the good old Gulliksen).

Who wants to have confidence intervals around estimates, which one knows are being biased right from the start?

The latter strategy was and unfortunately is still very popular with experimental psychologists, because the extreme group (high and low tail) strategy leads to higher effect sizes, which often lead despite the loss of df's to significant results and the beloved significance stars in their papers. Hans Eysenck loved that strategy and used it often. Later others using the full distributions found that his results could not be replicated or resulted in much lower effect sizes, no wonder given these psychometric facts.

This phenomenon can also used to explain why qualitatively oriented researchers often do not believe our quantitative results after using better measurement. Experienced practitioners in educational, clinical and other settings often contrast a couple of extreme cases against each other and get thus the impression of a large effect (in terms of Cohen's d or r). One can demonstrate with contrasting a couple of cases above + 2-3 SD's against a couple of cases below - 2-3 SD's that the biasing factor might be larger than 4.This means where  the true effect might be a small one only (r=.10) the practitioner's impression is related to a medium to large sized one r >.40, which cannot be generalized.( I' ve published about that phenomenon recently, but in German only)

So your colleague and others should always use all the information given, whenever it is available.

Steve Simon was less critical of dichotomization that were others:

There’s a trade-off here. By removing the middle third, you increase the separation of the two groups, which is good, while at the same time reducing the sample size, which is bad. Usually the trade-off is good.

It’s not too hard to show that the loss of information is related to the correlation between the original variable and a new variable which equals -1, 0, or +1 depending on which third of the data you are in. For most data sets, this is slightly better than the correlation between the original variable and a new variable which equals -1 for the first half and +1 for the second half.

Document info
Document views10
Page views10
Page last viewedTue Oct 25 18:02:10 UTC 2016