How can a regression model be any use if you don't know the function you are trying to get the paramters for?

I saw a piece of research that said that mothers who breast fed their children were less likely to suffer diabetes in later life. The research was from a survey of some 1000 mothers and controlled for miscellaneous factors and a loglinear model was used.

Now does this mean that they reckon all the factors that determine the likelihood of diabetes fir in a nice function (exponential presumably) that translates neatly into a linear model with logs and that whether the woman breast fed turned out to be statistically significant?

I'm missing something I'm sure but, how the hell do they know the model?

Epidemiology and biostatistics are both large, sophisticated fields. There are many obstacles to making valid (but always uncertain) deductions, but a lot of smart people have thought in sophisticated ways about pitfalls, and ways around them, to find what's really going on. Of course many people who don't understand the statistics and methodology publish their conclusions anyway --- i.e. there's no answer to your question without an expert appraisal of the actual study. If you just want to know how (probable) conclusions can be reached in principle, you need to study the subjects.
–
Bill ThurstonJan 2 '11 at 23:04

2

I would add that there may be two distinct questions here, one of the functional form of their model and the other concerning causal interpretations of the resulting estimates. On the first point there has been tons of really good work on nonparametric function estimation, assuming only that the function to be estimate lives in some "nice" function space perhaps. The second point in general is much more subtle. A good place for a mathematically sophisticated take on these topics is Cosma Shalizi's blog, e.g. cscs.umich.edu/~crshalizi/notebooks/regression.html
–
R HahnJan 2 '11 at 23:15

As you can see from the link, there is a vast literature on log-linear models, and they appear to have some predictive power, since they are used in actuarial applications, where there is some money at stake. This does not mean that the particular model you are alluding to makes any sense.

PS: If you want to convince yourself of the dangers of using (or trusting) linear models, just look at the recent financial meltdown.