I wonder if anyone can suggest good sources of basic information on how to calculate meaningful statistics. Let me explain:

I work in human-resources data analysis. I don't have a particularly strong statistics background, and I generally don't need one -- we mostly report on counts, sums, averages, and percentages. However, I occasionally run into trouble selecting meaningful numerators and denominators. For example, in calculating the percentage of employees who attended classes last year, should the denominator be the average number of employees over the year, the total number of employees at any point in the year, or some other figure?

I see lots of books explaining the math behind statistics, and of course this forum provides copious sources on information design, but I'm not sure where to look for this sort of practical information. I'd appreciate any suggestions.

Instead of just a single number that answers a single question, it will often be useful to show several standardizations (denominators in this case) answering several questions. Make it clear what each number means.
Zeisel is particularly helpful on this matter.

So your reading assignment is Ziesel, Hill, and Fisher! And, of course, other reports using these types of data.

-- Edward Tufte

Mr. Levy mentions several indicators of interest to his organization. If these are the right indicators, and "right" changes with circumstances, what is important is how they change over time. I suspect he also encounters managers who like to compare this month to last month, or this month this year to this month last year, rather than examining all the data points. I have learned much about statistical calculations useful to business and other organizations from the works of Dr. Donald J. Wheeler. I am not a statistician nor even mathematically inclined, but I do need to understand how process data change and whether the change is due to special causes or to variation inherent in the process. The fundamental question is "How are we doing?"

For Mr. Levy, I would suggest Wheeler's "Understanding Variation" first, followed by "Making Sense of Data". I wish I could walk down the hall and hand him my copies! But he will have to go to www.spcpress.com and check out the goods. Lots of free articles are available so you can get the feel of Dr. Wheeler's style before buying a book. I also recommend a paper by Davis Balestracci called "Data Sanity - Statistical Thinking Applied to Everyday Data", initially published and sold by the Statistics Division of the American Society for Quality but downloadable now at the Deming Electronic Network web page.

I would like to recommend a number of other books related to process measurement and quality improvement that Mr. Levy might find illuminating if he has not encountered them already, but will do so directly via email if he (or anyone else) is interested (sbyers@wirb.com).

These six principles of a well constructed rate might be helpful. They represent an ideal, and trade-offs need to be made. Some of the text reflects my work at a state education agency.

1. Includes in the denominator only those items that can show up in the numerator.

2. Includes in the numerator only those items that are also in the denominator.

3. It is simple. It can be easily explained to the public, legislators, and board members (has face validity). You can explain why this rate differs from another one.

4. It is technically sound. It has the support of researchers and statisticians. It accounts for sources of bias (unusual conditions that skew the rate if not accounted for).

5. It is valid in the eyes of those for whom the rate is produced. It is accepted by them as reflecting what they do (the items included in the numerator and denominator represent what the rate is said to measure).

6. It can be aggregated to higher levels of organization in a way that makes sense.

7. It is neutral in its effect. It measures an event fairly and does not have a subjective value judgment built in.

Another way to think about "framing the right question" is to consider Dr. Deming's (in Out of the Crisis and elsewere) admonition about operational definitions. "An operational definition is one that reasonable men can agree on. An operational definition is one that people can do business with. An operational definition of safe, round, reliable, or any other quality must be communicable, with the same meaning to vendor as to purchaser, same meaning yesterday and today to the production worker." Further, there is "No exact value; no true value." That is, every measurement process is subject to variation.

I am not sure about meaningful statistics, but I did see a good meaningless statistic the other night (ironically on the BBC's Test the Nation), and I quote, "1 in 50 people have an IQ in the top 2%..."