RSS syndicated content from Statalist, a discussion listserv generated by users of StataCorp Stata statistical analysis software. Subscribe to the RSS feed using the links provided in the sidebar. You must join the listserv in order to respond to, or add, new posts. For more information on Statalist, click on the 'Statalist FAQ' link in the sidebar.

Wednesday, March 08, 2006

Re: st: RE: how to choose between geographical identifiers??

I'm not a geographer, but I think this is an interesting question.
You could just regress wage on a full set of dummies twice, once for
LAD and once for TTWA, and compare the R-squared values, though that
is unlikely to convince you or anyone else that one division is more
useful than another. I guess I would start by calculating mean and
standard deviation of log wage for each LAD and TTWA, and population
for each, and then I would make two graphs of the StdDevs against the
means with marker size given by population, just to get a sense of
what kind of variation in wages the divisions capture. A picture can
give you a better sense of the data than numerous tabular results,
sometimes.

I think your criterion is really a kind of entropy-minimizing one,
since you don't want to have geocode categories to 8 decimal places
(one category for each worker produces very little variation within
cells, and a lot of categories) or a country identifier (one cell with
a lot of variation within cell). So the size of the grid, in terms of
population in each LAD/TTWA, is important, not just how homogenous
people are within each LAD/TTWA.

I'll be interested in what others with more experience in this area
have to say on how they would approach this problem. Nick--how would
you measure minimal structure in residuals here?

On 3/8/06, Nick Cox <n.j.cox@durham.ac.uk> wrote:
> I am a geographer but I don't know much about (what is
> usually called human) geography. I regarded it as my main field
> of interest between 1968 and 1969, but no longer. There aren't
> many geographers on this list, I think.
>
> However, your question is not really geographical. I guess
> from this that you are using lots of dummies in each case
> and for once the answer is whichever set of dummies gives
> you a better model, according to your criteria of model
> excellence (my favourite criterion is usually minimal
> structure in residuals).
>
> In broad terms both LADs and TTWAs are fairly heterogeneous
> as both spring from a idea of an area functioning together
> rather than formal similarity of anything. So knowing the
> area might not help enormously in predicting wage. But
> whichever spatial subdivision has a finer mesh should
> prove better.
>
> Nick
> n.j.cox@durham.ac.uk
>
> Ada Ma
>
> > I have a bunch of wage observations and all the observations are
> > attached with two geographical identifiers - local authority districts
> > (LADs) and travel to work areas (TTWAs). I want to find out how wages
> > vary across different areas in UK.
> >
> > Now I can run wage estimations using either one of the two categorical
> > variables as explanatory variable. I would however like to find out
> > which categorical variable fits the data better. How do I compare the
> > two sets of results given that the explanatory variables are quite
> > different?
> >
> > Could you recommend what kind of tests I should use and if you are a
> > geographer, could you tell me are there any criteria that are used by
> > geographers to choose between different definitions of geographies
> > (regions, as opposed to LADs, as opposed to TTWAs, etc.)
>
> *
> * For searches and help try:
> * http://www.stata.com/support/faqs/res/findit.html
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>