Splitting a Predictor at the Upper Quarter or Third and the Lower Quarter or Third

David K. Park, and Andrew Gelman

American Statistical Association, November 2008

Abstract

A linear regression of y on x can be approximated by a simple difference: the average values of y corresponding to the highest quarter or third of x, minus the average values of y corresponding to the lowest quarter or third of x. A simple theoretical analysis, similar to analyses that have been done in psychometrics, shows this comparison to perform reasonably well, with 80%– 90% efficiency compared to the regression if the predictor is uniformly or normally distributed. By discretizing x into three categories, we claw back about half the efficiency lost by the commonly used strategy of dichotomizing the predictor. We illustrate with the example that motivated our research: an analysis of income and voting which we had originally performed for a scholarly journal but then wanted to communicate to a general audience.

Links

Newsletter

Email *

Example: Yes, I would like to receive emails from CU Global Thought. (You can unsubscribe anytime)

Constant Contact Use.

By submitting this form, you are granting: The Committee on Global Thought, Columbia University, 91 Claremont Avenue, New York, NY, 10027, permission to email you. You may unsubscribe via the link found at the bottom of every email. (See our Email Privacy Policy (http://constantcontact.com/legal/privacy-statement) for details.) Emails are serviced by Constant Contact.