‘Ofsted's comparison of results at 11 and 16 is an arbitrary one, apparently designed to put comprehensives in the worst possible light.' Photograph: David Cheskin/PA

More than four in 10 (41%) of the most able children who go to grammar schools fail to achieve their full potential. Is that a scandal? It sounds deplorable, since grammar schools are designed to get the best out of the cleverest. They don't have much excuse for failing such children. But it depends on what you mean by "most able", "achieve" and "potential".

The equivalent figure for comprehensive schools, 65%, was highlighted this week by Michael Wilshaw, the chief inspector of schools, launching an Ofsted report on whether "the most able" pupils are "doing as well as they should in our non-selective secondary schools". In Wilshaw's view, that 65% "failure rate" is certainly a scandal. Comprehensives, he declared, "are failing to nurture scholastic excellence" and this outcome, largely attributable to mixed ability teaching, was "unacceptable in an increasingly competitive world".

Ofsted's definition of "most able" is children who achieve level 5 or above for both reading and maths in Sats tests at age 11. To fulfil "potential", they should get an A* or A in both English and maths at GCSE five years later. Its "unacceptable" 65% is the proportion of those achieving the first who don't achieve the second. It does not tell us what proportion of those who got top results in GCSE had done badly in their Sats at 11. We therefore know how often comprehensives supposedly make sows' ears out of silk purses, but not how often they achieve the reverse.

Ofsted's comparison of children's results at 11 and 16 is an arbitrary one, apparently designed to put comprehensives in the worst possible light. The evidence is simply not robust enough to support Wilshaw's sweeping claims.

First, level 5 and above in Sats is achieved by nearly half of 11-year-olds in reading, and well over a third in maths. These children may be above average, but to describe them all as high flyers is a stretch. Grammar schools usually pick out, from within this group, those with the highest scores. Any comparison between their outcomes and the apparently inferior ones in comprehensives is therefore spurious.

Second, the Sats levels are notoriously unreliable since they depend on a test taken on a particular day, on a limited number of items. A test taken on a different day, covering different items, may well produce a different result. To argue, as Ofsted seems to do, that comprehensives should stream children into ability groups on the basis of Sats, is outrageous. Some experts reckon that as many as one child in three is given the wrong grade. Ofsted claimed to be shocked that some schools couldn't even identify their most able pupils, but if the schools used Sats results to do so they would get it badly wrong.

Third, the GCSE, in its present form, is a quite different test from Sats, and a more sophisticated one. Thanks to reforms of the past 30 years, it uses several methods to grade children, including tests taken at intervals throughout the course, projects, practicals, orals and coursework, as well as the traditional end-of-course written exam. Whatever its shortcomings, it is likely to give a more accurate snapshot of a pupil's attainments at 16 than Sats do of attainments at 11.

Alas, that will no longer be so when Michael Gove's counter-reformation is complete. The new GCSE, which the education secretary announced this week, will rely almost entirely on traditional exams. When the first results emerge in 2017 it may be reasonable to make comparisons between performance in Sats and GCSEs, but not reasonable to assume pupils are being properly educated or, in either case, accurately graded.

Wilshaw has played to Gove's agenda. The education secretary wants children to move along a conveyor belt following a narrow curriculum geared to the academic requirements of elite universities. In his view, schools need to focus on the brightest pupils lest we fail to match other countries in economic success.

But the evidence is mixed. Finland consistently comes at or near the top of international educational performance measures. Most league tables of GDP per head put Finland roughly equal to Britain, even a little higher. Yet it has no grammar schools, no fee-charging schools, no ability grouping until age 16 (by law), no external exams or tests until 18, no talk of failing schools, and none of the constant pressure that makes our teachers' lives such a misery. Before he berates English comprehensives, Wilshaw should raise his own game by looking at more varied and reliable evidence.