Bad model + high stakes = gaming

Specifically I want to discuss a New York Times article from 2011 (hat tip Suresh Naidu) that is entitled “Eager for Spotlight, but Not if It Is on a Testing Scandal”.

When she was Chancellor, Rhee was a huge backer of the standardized testing approach to locating “bad teachers”. She did obnoxious stuff like carry around a broom to illustrate her “cleaning out the trash” approach. She fired a principal on camera.

She also enjoyed taking credit when scores went up, and the system rewarded those teachers with bonuses. So it was very high stakes: you get a cash incentive to improve your students’ scores and the threat of a broom if they go down.

And guess what, there was good evidence of cheating. If you want to read more details, read the article, then read this and this: short version is that a pseudo-investigation came up with nothing (surprise!) but then again scores went way down when they changed leadership and added security.

My point isn’t that we should put security in every school, though. My point is that when you implement a model which is both gameable and high stakes, you should expect it to be gamed. Don’t be surprised by that, and don’t give yourself credit that everyone is suddenly perfect by your measurement in the meantime.

Another way of saying it is that if you go around trusting the numbers, you have to be ready to trust the evidence of gaming too. You can’t have it both ways. We taxpayers should remember that next time we give the banks gameable stress tests or when we discover off-shore tax shelters by corporations.

You do understand, that already, “money” is a gamed system right? There is no basis for it’s value other than “trading results”. The FED, in the U.S.A., is manufacturing better results, right now, by monkeying with it’s interest rate to encourage more borrowing so that it can “print” more money to correct the imbalance between “value” in the economy and “cash on hand”!

The U.S. Congress is arguing about “budget”, when they really should be focused on how do we get the economy big enough, that the taxes we are collecting, will pay for what we are spending it on.

They don’t know how to run a “business”, so they don’t know how to improve their own “profits” in the taxation business. So, instead, the have the “family” argument over who is eating how too much instead of bringing their own lunch, or who is going to the vending machine 4 times a day and getting the $5.00 water bottle.

Gaming is everywhere… Our world is all about gaming the “money” system so that you get more than you spend…

And the pressure on teachers is enormous. If you are a high school teacher in NYC, after Regents exams have been graded, you and your colleagues expect to sit down with those exams which have grades in the lower 60s–below passing–and “scrub” them up to 65. “Scrub” is the term of art which is used everywhere. Here are the stakes: some students won’t be able to graduate if they don’t pass the exam, so if you say, “You know, he really didn’t pass this test,” you are keeping a kid from graduating who then may or may not drop out. It is on you. Your department, and the school, are judged and evaluated by how many students pass the Regents, and how many graduate. So if you keep saying, “This isn’t passing, ” you are seen as someone who is not only holding kids back, but jeopardizing the whole school–not a team player or good colleague, big time. So teachers don’t need to be leaned on by administrators–although sometimes they are–or offered bonuses for test results. The ordinary pressures connected to tests and how many graduate and whether or not schools are closed provide plenty of motivation to even the most honorable people.

See also the outrage in the sportswriting community whenever there’s a PED scandal. We are shocked, SHOCKED that people would try to get around rules to gain a (perhaps only percieved, in some cases) advantage in an industry where you can make millions of dollars.

“Another way of saying it is that if you go around trusting the numbers, you have to be ready to trust the evidence of gaming too. You can’t have it both ways. We taxpayers should remember that next time we give the banks gameable stress tests or when we discover off-shore tax shelters by corporations.”

By means of corrupt, bribed, and broadly conspired deregulation of the investment banking and financial instrument systems, much of the financial “services” sector has been gamed, with unparalleled risk and reward involved. With dissolute Ivy League academia reinforcing the enshrinement of bad mathematics and the viewpoint that financial fraud no longer exists in such “efficient” markets, there is little prospect for eliminating endless Black Swans, the further erosion of the Middle Class, and taxpayer-funded Bankster Bailouts.

Welcome to Crony Capitalism and all the economic, social, and political corruption it establishes as the cost of doing latter-day world business.

I was very disappointed in Rhee. She intentionally offended and alienated people, often people who had less education and less opportunity that she did. Then when people responded badly to her initiatives, she attempted to brand them as enemies of progress. She’s still popular with some, but she was never popular with me. It was always a lengthy talk to explain how what she was doing that was harmful, and often those who were inspired by her easy solutions and antipathy toward some blacks didn’t want to hear it anyway.

Strategically, the harm she did lives on. Remember Bill Clinton’s mantra “Make Change Our Friend”? Michelle Rhee too often made change the enemy, and that is not progress anywhere. Her bold and ineffective leadership also granted some organizations a chance to walk away from the difficult problems, but I don’t follow DC schools close enough to know who seized upon the opportunity to exit that she so dramatically presented them.

If you offer people more money for more results but do not require honest reporting of those results, then you’ll get cheating every time, everywhere. It is duly noted that Rhee herself was not harmed by the positive news that the cheating initially generated. In fact, it came at a critical time, but ultimately wasn’t enough. Mayor Gray and DC’s citizens should be commended for their patience and calm in surviving Rhee’s provocative but ineffective tenure.

I don’t think people behaved the way they did for the sake of extravagant or unfair gain. They were forced to subvert the newly imposed system simply to preserve the minimum daily requirements of keeping their daily bread, Beyond that baseline what they did is analogous to the jury nullification of an unjust law or a Hogan’s Heroes approach to authoritarian rule. It’s what people just naturally do when they see that the Dictators In Charge (DICs) don’t really care about the realities of the touted mission, and so they feel justified in undermining it any way they can.

Cathy, that is a fascinating perspective on a major education issue. You offer expert quantitative analysis that arrives at the same conclusion that laypeople in education intuitively understand. Another layer to this issue is the willingness of the general public and politicians to rely on a single numeric outcome as a comprehensive measure for a complex situation.

I’ve been looking at scholarship on chronic dysfunction in other social services bureaucracies in the U.S. that have structural similarity to the public education system, and some people will bluntly argue that fixes like the recent standardized testing fad are cosmetic and flatter the intentions of the officials who initiate them, but as a rule they fail miserably on outcomes, because they overhaul procedural performance standards, which demoralizes the staff doing the case work (or in this instance, teaching in classrooms) and leads to administrative bloat because of the need to micromanage implementation to meet the procedural performance improvement criteria (or at least, reporting on attempts to improve that follow the new procedures). The volume of busy-work grows when these types of reforms are underway, but it is entirely normal for them to have no overall beneficial effect on outcomes.

It ties into one of the points in the book “Trust in Numbers”, that quantitative standards of validity for evaluating performance are often a great convenience for people responsible for making rhetorical use of summary statistics, and a great inconvenience for everyone else. In it, Theodore Porter takes a sympathetic look at the history of establishing quantitative standards of validity in medical research and other fields of applied biochemistry, and argues that the choice of statistical tests as gold standards was motivated by “problems of trust, which have been most acute in the context of regulatory and disciplinary confrontations.” But he also quotes Ian Hacking’s pithy side-swipe at rational positivism: “It is no metaphysics that makes the word ‘true’ so handy, but wit, whose soul is brevity.”

Porter describes the long-standing confident and uncontentious use of accounting standards of cost-effectiveness as key decision making criteria in civil engineering as somewhat symbolic in value. Although prioritizing cost-effectiveness criteria when comparing various planning options in supply chain decisions or project design did help impose a “moral distance” from other qualitative criteria that could theoretically be debated among stakeholders, there was a professional culture in the engineering field that was very comfortable with accounting as a core curriculum science known to all practitioners, and hence one that added transparency to the negotiation process. When procedures are relatively predictable, as accounting criteria and cost-effectiveness priorities can be when one staunchly refuses to take competing priorities seriously, this predictability itself has a lot of appeal. Predictability of professional standards helps people get comfortable in a given professional culture and make themselves at home as specialists cultivating a career’s worth of practical experience, and even if bean-counters’ priorities are narrow, their predictability probably made a lot of people happy during the hey-day of civil engineering.

This suggests, to me, that when the numbers being used for evaluation refer to something more ambiguous or debatable than prices for materials, the symbolic role of quantitative evaluation criteria is undermined. If the content and structure of standardized tests doesn’t agree with teachers’ priorities as practicing experts in pedagogy, the use of test scores for evaluation will be far more demoralizing and there will be far more temptation to game the system. And when “bad models” are also convoluted enough that many people expected to rely on them for analytical feedback worry about the lack of transparency, then consensus around the validity of quantitative decision criteria deteriorates even further.

In Western political culture, Porter argues that there is a long-standing rhetorical tradition of treating algebraic math and geometric proofs as symbols of an “ideal of open knowledge” accessible to everyone. And this fascination with numbers as oracles of political fairness who tell no lies is something he thinks Americans are especially susceptible to, because so many of our populist politicians use what he calls “anti-rhetorical rhetoric”, always looking for ways to sound like plain-spoken concrete thinkers precisely when they’re trying hardest to be persuasive.

For people who aren’t directly impacted by the pursuit for quantitative performance measures in a given field of work, judging by the numbers sounds great. It’s easy to imagine whoever is supposed to interpret the numbers will know how to go about it, and the rest of us can sit back and wait for that person’s report “when the numbers come in.” Only insiders will be griping about the unreliable quality of the data in question, much less the quality of the analytical models. (Unless we’re placing bets during drinking games, over a horse-race election cycle’s television coverage, picking pundits to bet on who supposedly extrapolate from early voting results and other sources of data for their predictive models.)