Regents Chancellor Merryl Tisch and state Education Commissioner David Steiner have soberly pledged that New York testing will get more rigorous, with higher standards and honest information. They failed to meet those objectives with last year’s exams.

Yes, they raised the “cut scores” on the math and English tests so that 25 percent fewer kids were deemed proficient in 2010. But the tests themselves weren’t intrinsically harder than before. The exams — still designed by test vendor CTB/McGraw-Hill — remain very soft.

Why does this matter? Simply put, it means that the tests can’t do a good job of measuring the proficiency of the 1.2 million students who take them.

An exam with too many easy questions is much less likely to accurately gauge a given child’s proficiency. It does a worse job of answering the questions behind the test: Is Johnny reading well? What kind of reading material can Jenny handle? What kind of help does Jimmy need?

The best tests feature questions (“items”) that vary in difficulty, allowing us to distinguish among students. Exam designers use field testing to try out questions to determine where items fall on the difficulty scale, then use those items in the “real” (operational) tests.

Field-test data obtained via a Freedom of Information request show clearly that the 2010 tests were no more difficult than the year before. CTB and state testing experts had enough information to anticipate this outcome — but apparently failed to warn the chancellor.

Consider the multiple-choice items. These account for two-thirds of the English and half the math results. CTB says “p-values should range between .30 and .90.” The “p-value” describes how easy the question is: Items with p-values of .90 are answered correctly by 90 percent of the test population.

In 2010, 20 percent of the fourth-grade math items had p-values greater than .90 — falling outside of CTB’s desired range. No items had values below .50 — not a formula for a challenging test. The 8th-grade math exams were nearly as weak.

On the fourth- and eighth-grade English exams, more than a quarter of the items had p-values beyond .90. Only one item was difficult.

By contrast, consider the p-values for items on the 2009 National Assessment of Educational Progress reading and math exams taken by New York students. In math, 6 percent of the fourth-grade and 1 percent of the eighth-grade multiple-choice items had values above .90. A third of the Math items had p-values between .30 and .50.

The reading NAEP also had only a few easy “ice-breakers” and a solid cluster of moderately difficult items. Clearly, New York has yet to match the NAEP’s testing model.

Despite his promises of greater rigor, Steiner has failed to apply higher standards within his own house — the state Education Department.

He still uses CTB, left the Office of State Assessment intact and relied on the same test advisers. That is, having rightly declared New York’s tests to be a fiasco, he placed reform in the hands of the failure’s architects.

But New York’s test mess won’t truly end until there’s genuine accountability for the failures of the past. Instead, we’ll only get more failures.

Fred Smith, a retired Board of Education senior analyst, worked in test research and development.