Revolting Teachers :-)

Well since you ask - according to my calcs a random sample size of 700 drawn from a population of 3.8m gives a confidence interval of almost 99.2%, so yes. Even if we only look at the half that responded it would give a confidence interval of 94%, both calculated at an error margin of 5%.

Or if you prefer - at a confidence interval of 99% the error margin for samples of 350, 500, 600 and 700 are 6.88%, 5.76%, 5.26% and 4.87% respectively.

Of course these numbers are actually rather silly and meaningless - all it is telling you is that any sample in the hundreds will be enough and the focus should be placed on assuring that it is random (ie that there are no systemic biases in the way it is collected that would influence the result). But the sample size is far more than adequate to the task as hand.

I guess Andy and I would have to agree to differ on that. In my environment it is regarded as a technical faux pas to use a Gaussian distribution to model any parameter which has truncated bounds.

That, I suppose, is the difference between social science and precision engineering.

In the social sciences you have to estimate or extrapolate most of your data in any case, so it probably doesn't matter much if you also guess at using a distribution which makes the math a bit easier. Even if your forecast of what is going to happen next year is wrong as a result, this doesn't actually matter very much either.

It's a bit different when you're using your data to invent a new kind of aeroplane. "It probably won't crash very often, on average" isn't really good enough for an aeroplane - but in the social sciences it's the best you're ever going to do and so you'll live with it.

Alfred E Neuman wrote:

Aren't there fees associated with being in a union? Around here there are, and if you're not earning a lot, that can be seen as a luxury.

Yes there are, but they are not large. A qualified teacher in full time employment pays £14.47 per month to belong to the union that I do, and two thirds of that amount is allowable against tax. (As a sort of quid pro quo for the government abolishing the professional body for teachers, HMRC now accepts that the teaching unions are in part professional bodies. Anecdotally, a lot of teachers don't know this and so pay more tax than they need to.)

The lowest possible salary for a qualified teacher in full time employment is £22,467 per annum, and rather few actually earn precisely that figure. So that £14.47 per month really isn't a lot of money.

crissdee wrote:

If only 2 or 3 of the workforce are in the union, and the other 47 are not, wouldn't it make your bargaining position somewhat weaker?

It might, but that is the fault of the 47, not the two or three.

It is unheard of for a company in Britain to offer different pay raises to union members and non-union members solely for that reason. Whether it would even be legal to do that is a bit grey.

Inevitably it's a bit of a guess - social sciences statistics again - but it has been estimated (Blanchflower, 2002) that wages in unionized workplaces in Britain are about 7% higher than in otherwise comparable non-union workplaces. Given that union fees are unlikely ever to be anything like 7% of wages, surely it makes sense to join up.

OK, so there is another option. Don't join the union, but happily accept the pay raise that the union won its members all the same. You actually can't do this in Canada because of something called the Rand Formula, which means in effect that in a unionized workplace you pay union dues whether or not you belong to the union. You can do it in Britain, but there are words which I shall not use here for people who do.

It's a bit different when you're using your data to invent a new kind of aeroplane. "It probably won't crash very often, on average" isn't really good enough for an aeroplane - but in the social sciences it's the best you're ever going to do and so you'll live with it.

Actually, that's exactly how airworthiness is for aeroplanes.

I don't see anything wrong with using a normal distribution for salary in the way you did - I'm not sure it's true, but the method is fine. You aren't going to ever have 0.5 people (unless you want to bring job sharing into it) so you can set the lower bound where 1 whole person = 0%, which will then means there is also an upper bound. You won't get 0.001 of a person getting a 300% raise or whatever.

The collection of the data on average pay raises was nothing like so scientific. Rather, a bunch of companies were asked what pay raises they had given in the last year. Half of those companies didn't reply, and the half that did reply were taken at their word.

They probably only considered those employees who had actually been with the company throughout the year, and probably didn't take any notice of whether they were full time or part time.

Yes, you can poke the methodology full of as many holes as you care to. Many companies don't like talking about pay, and in the absence of any compulsion for them to do so this is the best we can do. In these circumstances, Andy and I are fairly comfortable with the slight mathematical liberty of assuming a normal distribution. It would - I am told - be possible to post up several pages of mathematical justification which would mean nothing to most of us and bore the small remainder, but the bottom line is that this assumption won't make the poor data very much worse.

You can model any data with whatever distribution you choose, all that will vary is the validity of the predictions you draw from that model. Gaussian distributions are popular in many fields because they can be summed (which others can't), making data collation simpler, but they have limitations.

A Gaussian distribution is only truly valid when it is symmetrical, and it becomes progressively less valid as the asymmetry or "skewness" increases (in fact I *think* I remember my maths prof saying disdainfully that the whole concept of "skewness" as a parameter was an applied maths bodge to which no true mathematician would give house room!). One of the core parameters people extract from a distribution model is the "standard deviation" - Suze did it above where she referred to remaining within a number of SDs of a mean - and therein lies the rub. If there is significant skewness it actually means the variance characteristics are not the same for the upper and lower tails of the distribution, so they should have *different* SDs.

In the statistical process control (SPC) world they get around this with a fudge by calculating the upper and lower process capability coefficients as separate values (CPKu and CPKl). For SPC purposes this fudge works, but again requires significant care when one is massively different to the other.

If you look at this case the mean is around 3.3% and the median is probably (a guess) slightly lower, but there would probably be a negligible number below zero. OTOH the upper tail will still have a significant number of entries at 8% or even 10% (another guess, based on what I observe around me and the idea that even nowadays people still get promoted). So the varience in the upper tail is likely to be three or more times that in the lower tail, rendering the predictions of the model rather dubious - it's not a good fit. But a *log-normal* distribution probably fits the data very well.

Not sure which part of "airworthiness" Cornixt is referring to - failure rates (and thus fault trees and safety calcs) use exponential distributions or Weibull (or Weibull-Duane hybrids) where a possible part-life issue is being explored, maintenance tasks and availabilities use Poisson. The only bit I can think of that uses Gaussian would be the SPC in manufacturing/test - am I missing something? (not having a pop - genuinely interested because we operate in different parts of the aviation field).

Not sure which part of "airworthiness" Cornixt is referring to - failure rates (and thus fault trees and safety calcs) use exponential distributions or Weibull (or Weibull-Duane hybrids) where a possible part-life issue is being explored, maintenance tasks and availabilities use Poisson. The only bit I can think of that uses Gaussian would be the SPC in manufacturing/test - am I missing something? (not having a pop - genuinely interested because we operate in different parts of the aviation field)

I wasn't meaning that Gaussian distributions are used in airworthiness (I can't think of a case where they would be, but they might), just that "statistically unlikely to crash" is the measure used rather than any firm "won't crash because we calculated the science".

I wasn't meaning that Gaussian distributions are used in airworthiness (I can't think of a case where they would be, but they might), just that "statistically unlikely to crash" is the measure used rather than any firm "won't crash because we calculated the science".

We use both, surely? The ultiimate safety case is based on an ALARP argument around an acceptable probability of death, serious injury or significant property damage ("they are unlikely"). But the various supporting pillars of this argument are a mixture of structral and fatigue calcs ("it won't break because the tested structural margin is "x" and the parameter variance is only <<x") and operational calcs ("we won't run out of fuel because our worst-case fuel consumption says we need 10,000lbs of fuel and we're carrying 12,000lbs", or "we have airspace management that keeps gaps of several feet between aeroplanes at all times") etc.

The RCM methodology uses the statistical (probability) calcs to predict a component life, but also uses the P-F interval (time between degradation becoming observable and the component becoming unable to do its job - calculating the science) to calculate the inspection intervals that ensure degradation doesn't reach dangerous extents before it is seen and rectified.

So it's a bit of both as appropriate to the particular element really, isn't it?

Some engineering safety stuff is based on making failures very unlikley (reliability calcs, fault-tree analysis etc), but other parts are about engineering failure modes out of the system (fail-safe items, operating limits, intrinsicly fire-safe design etc). Having multiple sources of compressed air reduces the probability of losing cabin pressure at altitude. Having a "plug" door design prevents decompression due to a door latch or hinge failure*.