>On Mon, 4 Feb 2013 12:55:36 -0800 (PST), em.derenne@gmail.com wrote:>>>Hi->>I haven't taken stats in a few years and recently there have been a lot thrown around my work place, including the attached graph (and raw data). I realize that low R2 mean that the linear regression is not a good fit, but

[snip, rest of post; start of my previous reply]

>The highest 5 outcome scores are all in the first half of the graph, >and the very highest one is near the beginning. Does that>seem important? That's most of the effect. >

I discovered that I can re-format the downloaded chart, eventhough it is read-only, in order to show a logarithmic spacingfor the Y axis. The data are pretty well distributed, from a minimum of 2 to a maximum of 17000.

That gives charts that look a lot like charts in the report thatDavid Jones gives a link to.... so, taking the log of the measuresdoes give a *statistical* model that has errors that are muchbetter behaved, and one that someone else seems to be using(presuming these are the same data).

>On the other hand, very few people would say that any >time series is properly tested by a simply linear regression>when there are autocorrelation effects... which there almost>always are. >>As to the size of the effect, and how few cases it depends>on -- I'm "pretty sure" that the trend becomes n.s. if you>remove the top 5 points; "probably" for the top 3, and>"maybe" for removing the top one alone.

I can imagine that it is a useful statement, to be able to saythat certain REALLY high levels are no longer being reached.But it needs, I think, a regression on the log-transformed values in order to make a proper statement on the (lack of)trend in that pollution control.