Students take a test, then they learn the material, then they take the test again. We want to show the improvements in the students who have pre- and post-test data.

This graph shows the results, but it's very misleading because in this case, only one student (of 27) actually took the post test, and he got a 100%. But I think most people looking at this would think that it means that the whole class got 100%.

So: how can I present this information, but also including the fact that only 17 kids took the pre-test, and only 1 of them took the post-test? The graph thus needs to show three groups:

Kids who took neither test and have no data, really, other than to establish the population size

Kids who took the pre-test and how they scored on it

Kids who took the pre-test AND post-test and how they scored on each, in such a way that the comparison is indicated

It seems to me you're trying to show multiple information in one graph (how many students took the test and how much was their score). You might need to either show multiple graphs, or clearly outline how many people took the pre and post test (I know you have that, but make it more obvious).
– Majo0odFeb 8 '16 at 17:46

You may well be right. Any suggestions about how best to do that?
– Joshua FrankFeb 8 '16 at 17:52

You can split up the information into multiple graphs: how many students took the post test, what's the average of the test score for everyone who took it.
– Majo0odFeb 8 '16 at 17:55

Information regarding students that didn't take the test might be more suited to a report or textbox accompanying the chart, and not be charted themselves.
– Rick HendersonFeb 10 '16 at 15:35

I need to think about whether that captures the data, but it's a helpful start, thanks!
– Joshua FrankFeb 8 '16 at 22:13

+1 a good solution, and I think you can simplify it further by removing the colours because the category labels already differentiate the groups, whereas in the previous solution it needs to be labelled to avoid confusion
– Michael Lai♦Feb 8 '16 at 22:32

@mikryz: just curious, what application did you use to make that image? Was it a dataviz sketching tool, or just something image oriented like Photoshop?
– Joshua FrankFeb 9 '16 at 19:11

Question... How would it look like if it's out of 20 people and that one person got, say... 10%? I don't think this method is scalable, and by glancing the user will not know how much of a percentage a person got. Also what if two people took the post test and both got 10%? How does that look like?
– Majo0odFeb 10 '16 at 10:49

"We want to show the improvements in the students who have pre- and post-test data."

If that is the ultimate aim of this graph, to show the difference over two tests, then there's no point in graphically showing those who've taken only one part of the test or none of the tests - these can be summarised in a corner/outside the chart with some text (i.e. not started - 17, not completed - 12) And that in turn makes for less stuff to plot which is good.

So take your 1 complete data point :-) (ok, this'll be better once you have more data) and then choose how to show it. One way is a scatterplot - y-axis pre-score, x-axis post score, draw on a diagonal, anyone to the right of the diagonal does better post-score, anyone to the left does worse. If you want to show the difference in the whole group then you're back to a bar chart, one bar showing average/s.d. pre-score, the other post-score. Though when you're down to drawing only a couple of data points it's time to think about whether it's better to just communicate the outcome with a number. ("post is 5+-2% better than pre" etc)

The reason to show the kids is to indicate the population size and the proportion, even if small, of those who took the post test, in which case we also want to show how they did.
– Joshua FrankFeb 9 '16 at 19:12

You could try a scatterplot where the data points include a third value (the number of students who scored that grade) as the size of the points (a bubble chart) but it might become too confusing.
– Rick HendersonFeb 10 '16 at 15:48

The best tip about designing graphs that I can provide is to have a very and simple message that you can see from looking at the visual representation of the data, otherwise it defeats the purpose of clarity in communication and becomes a visual design exercise. Having too much information often means you have to look at other graphs anyway because it is hard to understand or misleading to look at the same graph.

Also think in terms of the most common scenarios of the data that you are likely to get, and the patterns/trends you want to look for (or are important) when you are design the graph, so that the most meaningful design can be implemented. When you have a lot of information a compromise is often required to juggle the different types or volume of data. If what you have with the difference between people that take the pre-test/post-test is often very similar then it is not so much of a concern. The number of people that take the test is the issue when it comes to averages, regardless of what type of test it is, so that's an issue with when it is meaningful to display averages and how to compare them.

Probably the best way to start is to create single view of the data, then decide which combinations of the data gives the most insightful visual representation of the information rather than how to combine them all to start with. Also keep in mind that you may want to make other types of comparisons once you start accumulating more data over time, and you may want to see historical comparisons and maybe the same graph isn't useful for that so you need to redesign to make the visual presentation consistent.

I've been an instructor for over 18 years and have been doing pre-post test results for over 8 years. An instructor is most likely concerned with improvements between the pre-test and the post-test, and any other demographical data (how many took one or both etc.) would be of secondary interest.
Here is a simple chart that I use when I explain the importance to students:

After thinking about it for about a year, I realize that a simple average doesn't provide me with enough information. For instance, I am interested in how many students score below the average, fail, perform to mastery, etc. So I would want a more complete grade distribution chart.

You don't mention the format of your charting, but if drilling down was available then the distribution could shown by clicking on either bar.

If drilling down isn't available then the grades distributions could be displayed below the chart you first display or in another section of the report.

Also of interest could be the number of students who perform to a mastery level on the pre-test.

In our current report, we do have categories for grade distribution (Fluent, Proficient, etc.) but I think that showing these is part of why the chart is so confusing. It may be trying to do too much, or maybe it's just doing it poorly.
– Joshua FrankFeb 10 '16 at 18:56

I got some great suggestions from everyone here, but I wanted to post my own answer about my final version because I think it's substantially different from the suggested solutions:

It uses a stacked bar chart to show the categories, and it also shows the entire population, so it's easy to see the relative sizes of the categories, along with the breakdown at each of the second and third phase.