Archive for the ‘Value’ Category

In his 2003 book, The Soul of Capitalism, William Greider wrote, “If capitalism were someday found to have a soul, it would probably be located in the mystic qualities of capital itself” (p. 94). The recurring theme in the book is that the resolution of capitalism’s deep conflicts must grow out as organic changes from the roots of capitalism itself.

In the book, Greider quotes Innovest’s Michael Kiernan as suggesting that the goal has to be re-engineering the DNA of Wall Street (p. 119). He says the key to doing this is good reliable information that has heretofore been unavailable but which will make social and environmental issues matter financially. The underlying problems of exactly what solid, high quality information looks like, where it comes from, and how it is created are not stated or examined, but the point, as Kiernan says, is that “the markets are pretty good at punishing and rewarding.” The objective is to use “the financial markets as an engine of reform and positive change rather than destruction.”

This objective is, of course, the focus of multiple postings in this blog (see especially this one and this one). From my point of view, capitalism indeed does have a soul and it is actually located in the qualities of capital itself. Think about it: if a soul is a spirit of something that exists independent of its physical manifestation, then the soul of capitalism is the fungibility of capital. Now, this fungibility is complex and ambiguous. It takes its strength and practical value from the way market exchange are represented in terms of currencies, monetary units that, within some limits, provide an objective basis of comparison useful for rewarding those capable of matching supply with demand.

But the fungibility of capital can also be dangerously misconceived when the rich complexity and diversity of human capital is unjustifiably reduced to labor, when the irreplaceable value of natural capital is unjustifiably reduced to land, and when the trust, loyalty, and commitment of social capital is completely ignored in financial accounting and economic models. As I’ve previously said in this blog, the concept of human capital is inherently immoral so far as it reduces real human beings to interchangeable parts in an economic machine.

So how could it ever be possible to justify any reduction of human, social, and natural value to a mere number? Isn’t this the ultimate in the despicable inhumanity of economic logic, corporate decision making, and, ultimately, the justification of greed? Many among us who profess liberal and progressive perspectives seem to have an automatic and reactionary prejudice of this kind. This makes these well-intentioned souls as much a part of the problem as those among us with sometimes just as well-intentioned perspectives that accept such reductionism as the price of entry into the game.

There is another way. Human, social, and natural value can be measured and made manageable in ways that do not necessitate totalizing reduction to a mere number. The problem is not reduction itself, but unjustified, totalizing reduction. Referring to all people as “man” or “men” is an unjustified reduction dangerous in the way it focuses attention only on males. The tendency to think and act in ways privileging males over females that is fostered by this sense of “man” shortchanges us all, and has happily been largely eliminated from discourse.

Making language more inclusive does not, however, mean that words lose the singular specificity they need to be able to refer to things in the world. Any given word represents an infinite population of possible members of a class of things, actions, and forms of life. Any simple sentence combining words into a coherent utterance then multiplies infinities upon infinities. Discourse inherently reduces multiplicities into texts of limited lengths.

Like any tool, reduction has its uses. Also like any tool, problems arise when the tool is allowed to occupy some hidden and unexamined blind spot from which it can dominate and control the way we think about everything. Critical thinking is most difficult in those instances in which the tools of thinking themselves need to be critically evaluated. To reject reduction uncritically as inherently unjustified is to throw the baby out with the bathwater. Indeed, it is impossible to formulate a statement of the rejection without simultaneously enacting exactly what is supposed to be rejected.

We have numerous ready-to-hand examples of how all reduction has been unjustifiably reduced to one homogenized evil. But one of the results of experiments in communal living in the 1960s and 1970s, as well as of the fall of the Soviet Union, was the realization that the centralized command and control of collectively owned community property cannot compete with the creativity engendered when individuals hold legal title to the fruits of their labors. If individuals cannot own the results of the investments they make, no one makes any investments.

In other words, if everything is owned collectively and is never reduced to individually possessed shares that can be creatively invested for profitable returns, then the system is structured so as to punish innovation and reward doing as little as possible. But there’s another way of thinking about the relation of the collective to the individual. The living soul of capitalism shows itself in the way high quality information makes it possible for markets to efficiently coordinate and align individual producers’ and consumers’ collective behaviors and decisions. What would happen if we could do that for human, social, and natural capital markets? What if “social capitalism” is more than an empty metaphor? What if capital institutions can be configured so that individual profit really does become the driver of socially responsible, sustainable economics?

And here we arrive at the crux of the problem. How do we create the high quality, solid information markets need to punish and reward relative to ethical and sustainable human, social, and environmental values? Well, what can we learn from the way we created that kind of information for property and manufactured capital? These are the questions taken up and explored in the postings in this blog, and in my scientific research publications and meeting presentations. In the near future, I’ll push my reflection on these questions further, and will explore some other possible answers to the questions offered by Greider and his readers in a recent issue of The Nation.

Though he attributes his insight to a colleague (George Baker), Michael Jensen has once more succinctly stated a key point I’ve repeatedly tried to convey in my blog posts. As Jensen (2003, p. 397) puts it,

…any activity whose performance can be perfectly measured objectively does not belong inside the firm. If its performance can be adequately measured objectively it can be spun out of the firm and contracted for in a market transaction.

YES!! Though nothing is measured perfectly, my message has been a series of variations on precisely this theme. Well-measured property, services, products, and commodities in today’s economy are associated with scientific, legal and financial structures and processes that endow certain representations with meaningful indications of kind, amount, value and ownership. It is further well established that the ownership of the products of one’s creative endeavors is essential to economic advancement and the enlargement of the greater good. Markets could not exist without objective measures, and thus we have the central commercial importance of metric standards.

The improved measurement of service outcomes and performances is going to create an environment capable of supporting similar legal and financial indications of value and ownership. Many of the causes of today’s economic crises can be traced to poor quality information and inadequate measures of human, social, and natural value. Bringing publicly verifiable scientific data and methods to bear on the tuning of instruments for measuring these forms of value will make their harmonization much simpler than it ever could be otherwise. Social and environmental costs and value have been relegated to the marginal status of externalities because they have not been measured in ways that made it possible to bring them onto the books and into the models.

But the stage is being set for significant changes. Decades of research calibrating objective measures of a wide variety of performances and outcomes are inexorably leading to the creation of an intangible assets metric system (Fisher, 2009a, 2009b, 2011). Meaningful and rigorous individual-level universally available uniform metrics for each significant intangible asset (abilities, health, trustworthiness, etc.) will

(a) make it possible for each of us to take full possession, ownership, and management control of our investments in and returns from these forms of capital,

(b) coordinate the decisions and behaviors of consumers, researchers, and quality improvement specialists to better match supply and demand, and thereby

(c) increase the efficiency of human, social, and natural capital markets, harnessing the profit motive for the removal of wasted human potential, lost community coherence, and destroyed environmental quality.

Jensen’s observation emerges in his analysis of performance measures as one of three factors in defining the incentives and payoffs for a linear compensation plan (the other two being the intercept and the slope of the bonus line relating salary and bonus to the performance measure targets). The two sentences quoted above occur in this broader context, where Jensen (2003, pp. 396-397) states that,

…we must decide how much subjectivity will be involved in each performance measure. In considering this we must recognize that every performance measurement system in a firm must involve an important amount of subjectivity. The reason, as my colleague George Baker has pointed out, is that any activity whose performance can be perfectly measured objectively does not belong inside the firm. If its performance can be adequately measured objectively it can be spun out of the firm and contracted for in a market transaction. Thus, one of the most important jobs of managers, complementing objective measures of performance with managerial subjective evaluation of subtle interdependencies and other factors is exactly what most managers would like to avoid. Indeed, it is this factor along with efficient risk bearing that is at the heart of what gives managers and firms an advantage over markets.

Jensen is here referring implicitly to the point Coase (1990) makes regarding the nature of the firm. A firm can be seen as a specialized market, one in which methods, insights, and systems not generally available elsewhere are employed for competitive advantage. Products are brought to market competitively by being endowed with value not otherwise available. Maximizing that value is essential to the viability of the firm.

Given conflicting incentives and the mixed messages of the balanced scorecard, managers have plenty of opportunities for creatively avoiding the difficult task of maximizing the value of the firm. Jensen (2001) shows that attending to the “managerial subjective evaluation of subtle interdependencies” is made impossibly complex when decisions and behaviors are pulled in different directions by each stakeholder’s particular interests. Other research shows that even traditional capital structures are plagued by the mismeasurement of leverage, distress costs, tax shields, and the speed with which individual firms adjust their capital needs relative to leverage targets (Graham & Leary, 2010). The objective measurement of intangible assets surely seems impossibly complex to those familiar with these problems.

But perhaps the problems associated with measuring traditional capital structures are not so different from those encountered in the domain of intangible assets. In both cases, a particular kind of unjustified self-assurance seems always to attend the mere availability of numeric data. To the unpracticed eye, numbers seem to always behave the same way, no matter if they are rigorous measures of physical commodities, like kilowatts, barrels, or bushels, or if they are currency units in an accounting spreadsheet, or if they are percentages of agreeable responses to a survey question. The problem is that, when interrogated in particular ways with respect to the question of how much of something is supposedly measured, these different kinds of numbers give quite markedly different kinds of answers.

The challenge we face is one of determining what kind of answers we want to the questions we have to ask. Presumably, we want to ask questions and get answers pertinent to obtaining the information we need to manage life creatively, meaningfully, effectively and efficiently. It may be useful then, as a kind of thought experiment, to make a bold leap and imagine a scenario in which relevant questions are answered with integrity, accountability, and transparency.

What will happen when the specialized expertise of human resource professionals is supplanted by a market in which meaningful and comparable measures of the hireability, retainability, productivity, and promotability of every candidate and employee are readily available? If Baker and Jensen have it right, perhaps firms will no longer have employees. This is not to say that no one will work for pay. Instead, firms will contract with individual workers at going market rates, and workers will undoubtedly be well aware of the market value of their available shares of their intangible assets.

A similar consequence follows for the social safety net and a host of other control, regulatory, and policing mechanisms. But we will no longer be stuck with blind faith in the invisible hand and market efficiency, following the faith of those willing to place their trust and their futures in the hands of mechanisms they only vaguely understand and cannot control. Instead, aggregate effects on individuals, communities, and the environment will be tracked in publicly available and critically examined measures, just as stocks, bonds, and commodities are tracked now.

Previous posts in this blog explore the economic possibilities that follow from having empirically substantiated, theoretically predictable, and instrumentally mediated measures embodying broad consensus standards. What we will have for human, social, and natural capital will be the same kind of objective measures that have made markets work as well as they have thus far. It will be a whole new ball game when profits become tied to human, social, and environmental outcomes.

References

Coase, R. (1990). The firm, the market, and the law. Chicago: University of Chicago Press.

One of the ironies of life is that we often overlook the obvious in favor of the obscure. And so one hears of huge resources poured into finding and capitalizing on opportunities that provide infinitesimally small returns, while other opportunities—with equally certain odds of success but far more profitable returns—are completely neglected.

The National Institute for Standards and Technology (NIST) reports returns on investment ranging from 32% to over 400% in 32 metrological improvements made in semiconductors, construction, automation, computers, materials, manufacturing, chemicals, photonics, communications and pharmaceuticals (NIST, 2009). Previous posts in this blog offer more information on the economic value of metrology. The point is that the returns obtained from improvements in the measurement of tangible assets will likely also be achieved in the measurement of intangible assets.

How? With a little bit of imagination, each stage in the development of increasingly meaningful, efficient, and useful measures described in this previous post can be seen as implying a significant return on investment. As those returns are sought, investors will coordinate and align different technologies and resources relative to a roadmap of how these stages are likely to unfold in the future, as described in this previous post. The basic concepts of how efficient and meaningful measurement reduces transaction costs and market frictions, and how it brings capital to life, are explained and documented in my publications (Fisher, 2002-2011), but what would a concrete example of the new value created look like?

The examples I have in mind hinge on the difference between counting and measuring. Counting is a natural and obvious thing to do when we need some indication of how much of something there is. But counting is not measuring (Cooper & Humphry, 2010; Wright, 1989, 1992, 1993, 1999). This is not some minor academic distinction of no practical use or consequence. It is rather the source of the vast majority of the problems we have in comparing outcome and performance measures.

Imagine how things would be if we couldn’t weigh fruit in a grocery store, and all we could do was count pieces. We can tell when eight small oranges possess less overall mass of fruit than four large ones by weighing them; the eight small oranges might weigh .75 kilograms (about 1.6 pounds) while the four large ones come in at 1.0 kilo (2.2 pounds). If oranges were sold by count instead of weight, perceptive traders would buy small oranges and make more money selling them than they could if they bought large ones.

But we can’t currently arrive so easily at the comparisons we need when we’re buying and selling intangible assets, like those produced as the outcomes of educational, health care, or other services. So I want to walk through a couple of very down-to-earth examples to bring the point home. Today we’ll focus on the simplest version of the story, and tomorrow we’ll take up a little more complicated version, dealing with the counts, percentages, and scores used in balanced scorecard and dashboard metrics of various kinds.

What if you score eight on one reading test and I score four on a different reading test? Who has more reading ability? In the same way that we might be able to tell just by looking that eight small oranges are likely to have less actual orange fruit than four big ones, we might also be able to tell just by looking that eight easy (short, common) words can likely be read correctly with less reading ability than four difficult (long, rare) words can be.

So let’s analyze the difference between buying oranges and buying reading ability. We’ll set up three scenarios for buying reading ability. In all three, we’ll imagine we’re comparing how we buy oranges with the way we would have to go about buying reading ability today if teachers were paid for the gains made on the tests they administer at the beginning and end of the school year.

In the first scenario, the teachers make up their own tests. In the second, the teachers each use a different standardized test. In the third, each teacher uses a computer program that draws questions from the same online bank of precalibrated items to construct a unique test custom tailored to each student. Reading ability scenario one is likely the most commonly found in real life. Scenario three is the rarest, but nonetheless describes a situation that has been available to millions of students in the U.S., Australia, and elsewhere for several years. Scenarios one, two and three correspond with developmental levels one, three, and five described in a previous blog entry.

Buying Oranges

When you go into one grocery store and I go into another, we don’t have any oranges with us. When we leave, I have eight and you have four. I have twice as many oranges as you, but yours weigh a kilo, about a third more than mine (.75 kilos).

When we paid for the oranges, the transaction was finished in a few seconds. Neither one of us experienced any confusion, annoyance, or inconvenience in relation to the quality of information we had on the amount of orange fruits we were buying. I did not, however, pay twice as much as you did. In fact, you paid more for yours than I did for mine, in direct proportion to the difference in the measured amounts.

No negotiations were necessary to consummate the transactions, and there was no need for special inquiries about how much orange we were buying. We knew from experience in this and other stores that the prices we paid were comparable with those offered in other times and places. Our information was cheap, as it was printed on the bag of oranges or could be read off a scale, and it was very high quality, as the measures were directly comparable with measures from any other scale in any other store. So, in buying oranges, the impact of information quality on the overall cost of the transaction was so inexpensive as to be negligible.

Buying Reading Ability (Scenario 1)

So now you and I go through third grade as eight year olds. You’re in one school and I’m in another. We have different teachers. Each teacher makes up his or her own reading tests. When we started the school year, we each took a reading test (different ones), and we took another (again, different ones) as we ended the school year.

For each test, your teacher counted up your correct answers and divided by the total number of questions; so did mine. You got 72% correct on the first one, and 94% correct on the last one. I got 83% correct on the first one, and 86% correct on the last one. Your score went up 22%, much more than the 3% mine went up. But did you learn more? It is impossible to tell. What if both of your tests were easier—not just for you or for me but for everyone—than both of mine? What if my second test was a lot harder than my first one? On the other hand, what if your tests were harder than mine? Perhaps you did even better than your scores seem to indicate.

We’ll just exclude from consideration other factors that might come to bear, such as whether your tests were significantly longer or shorter than mine, or if one of us ran out of time and did not answer a lot of questions.

If our parents had to pay the reading teacher at the end of the school year for the gains that were made, how would they tell what they were getting for their money? What if your teacher gave a hard test at the start of the year and an easy one at the end of the year so that you’d have a big gain and your parents would have to pay more? What if my teacher gave an easy test at the start of the year and a hard one at the end, so that a really high price could be put on very small gains? If our parents were to compare their experiences in buying our improved reading ability, they would have a lot of questions about how much improvement was actually obtained. They would be confused and annoyed at how inconvenient the scores are, because they are difficult, if not impossible, to compare. A lot of time and effort might be invested in examining the words and sentences in each of the four reading tests to try to determine how easy or hard they are in relation to each other. Or, more likely, everyone would throw their hands up and pay as little as they possibly can for outcomes they don’t understand.

Buying Reading Ability (Scenario 2)

In this scenario, we are third graders again, in different schools with different reading teachers. Now, instead of our teachers making up their own tests, our reading abilities are measured at the beginning and the end of the school year using two different standardized tests sold by competing testing companies. You’re in a private suburban school that’s part of an independent schools association. I’m in a public school along with dozens of others in an urban school district.

For each test, our parents received a report in the mail showing our scores. As before, we know how many questions we each answered correctly, and, unlike before, we don’t know which particular questions we got right or wrong. Finally, we don’t know how easy or hard your tests were relative to mine, but we know that the two tests you took were equated, and so were the two I took. That means your tests will show how much reading ability you gained, and so will mine.

We have one new bit of information we didn’t have before, and that’s a percentile score. Now we know that at the beginning of the year, with a percentile ranking of 72, you performed better than 72% of the other private school third graders taking this test, and at the end of the year you performed better than 76% of them. In contrast, I had percentiles of 84 and 89.

The question we have to ask now is if our parents are going to pay for the percentile gain, or for the actual gain in reading ability. You and I each learned more than our peers did on average, since our percentile scores went up, but this would not work out as a satisfactory way to pay teachers. Averages being averages, if you and I learned more and faster, someone else learned less and slower, so that, in the end, it all balances out. Are we to have teachers paying parents when their children learn less, simply redistributing money in a zero sum game?

And so, additional individualized reports are sent to our parents by the testing companies. Your tests are equated with each other, and they measure in a comparable unit that ranges from 120 to 480. You had a starting score of 235 and finished the year with a score of 420, for a gain of 185.

The tests I took are comparable and measure in the same unit, too, but not the same unit as your tests measure in. Scores on my tests range from 400 to 1200. I started the year with a score of 790, and finished at 1080, for a gain of 290.

Now the confusion in the first scenario is overcome, in part. Our parents can see that we each made real gains in reading ability. The difficulty levels of the two tests you took are the same, as are the difficulties of the two tests I took. But our parents still don’t know what to pay the teacher because they can’t tell if you or I learned more. You had lower percentiles and test scores than I did, but you are being compared with what is likely a higher scoring group of suburban and higher socioeconomic status students than the urban group of disadvantaged students I’m compared against. And your scores aren’t comparable with mine, so you might have started and finished with more reading ability than I did, or maybe I had more than you. There isn’t enough information here to tell.

So, again, the information that is provided is insufficient to the task of settling on a reasonable price for the outcomes obtained. Our parents will again be annoyed and confused by the low quality information that makes it impossible to know what to pay the teacher.

Buying Reading Ability (Scenario 3)

In the third scenario, we are still third graders in different schools with different reading teachers. This time our reading abilities are measured by tests that are completely unique. Every student has a test custom tailored to their particular ability. Unlike the tests in the first and second scenarios, however, now all of the tests have been constructed carefully on the basis of extensive data analysis and experimental tests. Different testing companies are providing the service, but they have gone to the trouble to work together to create consensus standards defining the unit of measurement for any and all reading test items.

For each test, our parents received a report in the mail showing our measures. As before, we know how many questions we each answered correctly. Now, though we don’t know which particular questions we got right or wrong, we can see typical items ordered by difficulty lined up in a way that shows us what kind of items we got wrong, and which kind we got right. And now we also know your tests were equated relative to mine, so we can compare how much reading ability you gained relative to how much I gained. Now our parents can confidently determine how much they should pay the teacher, at least in proportion to their children’s relative measures. If our measured gains are equal, the same payment can be made. If one of us obtained more value, then proportionately more should be paid.

In this third scenario, we have a situation directly analogous to buying oranges. You have a measured amount of increased reading ability that is expressed in the same unit as my gain in reading ability, just as the weights of the oranges are comparable. Further, your test items were not identical with mine, and so the difficulties of the items we took surely differed, just as the sizes of the oranges we bought did.

This third scenario could be made yet more efficient by removing the need for creating and maintaining a calibrated item bank, as described by Stenner and Stone (2003) and in the sixth developmental level in a prior blog post here. Also, additional efficiencies could be gained by unifying the interpretation of the reading ability measures, so that progress through high school can be tracked with respect to the reading demands of adult life (Williamson, 2008).

Comparison of the Purchasing Experiences

In contrast with the grocery store experience, paying for increased reading ability in the first scenario is fraught with low quality information that greatly increases the cost of the transactions. The information is of such low quality that, of course, hardly anyone bothers to go to the trouble to try to decipher it. Too much cost is associated with the effort to make it worthwhile. So, no one knows how much gain in reading ability is obtained, or what a unit gain might cost.

When a school district or educational researchers mount studies to try to find out what it costs to improve reading ability in third graders in some standardized unit, they find so much unexplained variation in the costs that they, too, raise more questions than answers.

In grocery stores and other markets, we don’t place the cost of making the value comparison on the consumer or the merchant. Instead, society as a whole picks up the cost by funding the creation and maintenance of consensus standard metrics. Until we take up the task of doing the same thing for intangible assets, we cannot expect human, social, and natural capital markets to obtain the efficiencies we take for granted in markets for tangible assets and property.

One of the ironies of life is that we often overlook the obvious in favor of the obscure. And so one hears of huge resources poured into finding and capitalizing on opportunities that provide infinitesimally small returns, while other opportunities—with equally certain odds of success but far more profitable returns—are completely neglected.

The National Institute for Standards and Technology (NIST) reports returns on investment ranging from 32% to over 400% in 32 metrological improvements made in semiconductors, construction, automation, computers, materials, manufacturing, chemicals, photonics, communications and pharmaceuticals (NIST, 2009). Previous posts in this blog offer more information on the economic value of metrology. The point is that the returns obtained from improvements in the measurement of tangible assets will likely also be achieved in the measurement of intangible assets.

How? With a little bit of imagination, each stage in the development of increasingly meaningful, efficient, and useful measures described in this previous post can be seen as implying a significant return on investment. As those returns are sought, investors will coordinate and align different technologies and resources relative to a roadmap of how these stages are likely to unfold in the future, as described in this previous post. But what would a concrete example of the new value created look like?

The examples I have in mind hinge on the difference between counting and measuring. Counting is a natural and obvious thing to do when we need some indication of how much of something there is. But counting is not measuring (Cooper & Humphry, 2010; Wright, 1989, 1992, 1993, 1999). This is not some minor academic distinction of no practical use or consequence. It is rather the source of the vast majority of the problems we have in comparing outcome and performance measures.

Imagine how things would be if we couldn’t weigh fruit in a grocery store, and all we could do was count pieces. We can tell when eight small oranges possess less overall mass of fruit than four large ones by weighing them; the eight small oranges might weigh .75 kilograms (about 1.6 pounds) while the four large ones come in at 1.0 kilo (2.2 pounds). If oranges were sold by count instead of weight, perceptive traders would buy small oranges and make more money selling them than they could if they bought large ones.

But we can’t currently arrive so easily at the comparisons we need when we’re buying and selling intangible assets, like those produced as the outcomes of educational, health care, or other services. So I want to walk through a couple of very down-to-earth examples to bring the point home. Today we’ll focus on the simplest version of the story, and tomorrow we’ll take up a little more complicated version, dealing with the counts, percentages, and scores used in balanced scorecard and dashboard metrics of various kinds.

What if you score eight on one reading test and I score four on a different reading test? Who has more reading ability? In the same way that we might be able to tell just by looking that eight small oranges are likely to have less actual orange fruit than four big ones, we might also be able to tell just by looking that eight easy (short, common) words can likely be read correctly with less reading ability than four difficult (long, rare) words can be.

So let’s analyze the difference between buying oranges and buying reading ability. We’ll set up three scenarios for buying reading ability. In all three, we’ll imagine we’re comparing how we buy oranges with the way we would have to go about buying reading ability today if teachers were paid for the gains made on the tests they administer at the beginning and end of the school year.

In the first scenario, the teachers make up their own tests. In the second, the teachers each use a different standardized test. In the third, each teacher uses a computer program that draws questions from the same online bank of precalibrated items to construct a unique test custom tailored to each student. Reading ability scenario one is likely the most commonly found in real life. Scenario three is the rarest, but nonetheless describes a situation that has been available to millions of students in the U.S., Australia, and elsewhere for several years. Scenarios one, two and three correspond with developmental levels one, three, and five described in a previous blog entry.

Buying Oranges

When you go into one grocery store and I go into another, we don’t have any oranges with us. When we leave, I have eight and you have four. I have twice as many oranges as you, but yours weigh a kilo, about a third more than mine (.75 kilos).

When we paid for the oranges, the transaction was finished in a few seconds. Neither one of us experienced any confusion, annoyance, or inconvenience in relation to the quality of information we had on the amount of orange fruits we were buying. I did not, however, pay twice as much as you did. In fact, you paid more for yours than I did for mine, in direct proportion to the difference in the measured amounts.

No negotiations were necessary to consummate the transactions, and there was no need for special inquiries about how much orange we were buying. We knew from experience in this and other stores that the prices we paid were comparable with those offered in other times and places. Our information was cheap, as it was printed on the bag of oranges or could be read off a scale, and it was very high quality, as the measures were directly comparable with measures from any other scale in any other store. So, in buying oranges, the impact of information quality on the overall cost of the transaction was so inexpensive as to be negligible.

Buying Reading Ability (Scenario 1)

So now you and I go through third grade as eight year olds. You’re in one school and I’m in another. We have different teachers. Each teacher makes up his or her own reading tests. When we started the school year, we each took a reading test (different ones), and we took another (again, different ones) as we ended the school year.

For each test, your teacher counted up your correct answers and divided by the total number of questions; so did mine. You got 72% correct on the first one, and 94% correct on the last one. I got 83% correct on the first one, and 86% correct on the last one. Your score went up 22%, much more than the 3% mine went up. But did you learn more? It is impossible to tell. What if both of your tests were easier—not just for you or for me but for everyone—than both of mine? What if my second test was a lot harder than my first one? On the other hand, what if your tests were harder than mine? Perhaps you did even better than your scores seem to indicate.

We’ll just exclude from consideration other factors that might come to bear, such as whether your tests were significantly longer or shorter than mine, or if one of us ran out of time and did not answer a lot of questions.

If our parents had to pay the reading teacher at the end of the school year for the gains that were made, how would they tell what they were getting for their money? What if your teacher gave a hard test at the start of the year and an easy one at the end of the year so that you’d have a big gain and your parents would have to pay more? What if my teacher gave an easy test at the start of the year and a hard one at the end, so that a really high price could be put on very small gains? If our parents were to compare their experiences in buying our improved reading ability, they would have a lot of questions about how much improvement was actually obtained. They would be confused and annoyed at how inconvenient the scores are, because they are difficult, if not impossible, to compare. A lot of time and effort might be invested in examining the words and sentences in each of the four reading tests to try to determine how easy or hard they are in relation to each other. Or, more likely, everyone would throw their hands up and pay as little as they possibly can for outcomes they don’t understand.

Buying Reading Ability (Scenario 2)

In this scenario, we are third graders again, in different schools with different reading teachers. Now, instead of our teachers making up their own tests, our reading abilities are measured at the beginning and the end of the school year using two different standardized tests sold by competing testing companies. You’re in a private suburban school that’s part of an independent schools association. I’m in a public school along with dozens of others in an urban school district.

For each test, our parents received a report in the mail showing our scores. As before, we know how many questions we each answered correctly, and, as before, we don’t know which particular questions we got right or wrong. Finally, we don’t know how easy or hard your tests were relative to mine, but we know that the two tests you took were equated, and so were the two I took. That means your tests will show how much reading ability you gained, and so will mine.

But we have one new bit of information we didn’t have before, and that’s a percentile score. Now we know that at the beginning of the year, with a percentile ranking of 72, you performed better than 72% of the other private school third graders taking this test, and at the end of the year you performed better than 76% of them. In contrast, I had percentiles of 84 and 89.

The question we have to ask now is if our parents are going to pay for the percentile gain, or for the actual gain in reading ability. You and I each learned more than our peers did on average, since our percentile scores went up, but this would not work out as a satisfactory way to pay teachers. Averages being averages, if you and I learned more and faster, someone else learned less and slower, so that, in the end, it all balances out. Are we to have teachers paying parents when their children learn less, simply redistributing money in a zero sum game?

And so, additional individualized reports are sent to our parents by the testing companies. Your tests are equated with each other, so they measure in a comparable unit that ranges from 120 to 480. You had a starting score of 235 and finished the year with a score of 420, for a gain of 185.

The tests I took are comparable and measure in the same unit, too, but not the same unit as your tests measure in. Scores on my tests range from 400 to 1200. I started the year with a score of 790, and finished at 1080, for a gain of 290.

Now the confusion in the first scenario is overcome, in part. Our parents can see that we each made real gains in reading ability. The difficulty levels of the two tests you took are the same, as are the difficulties of the two tests I took. But our parents still don’t know what to pay the teacher because they can’t tell if you or I learned more. You had lower percentiles and test scores than I did, but you are being compared with what is likely a higher scoring group of suburban and higher socioeconomic status students than the urban group of disadvantaged students I’m compared against. And your scores aren’t comparable with mine, so you might have started and finished with more reading ability than I did, or maybe I had more than you. There isn’t enough information here to tell.

So, again, the information that is provided is insufficient to the task of settling on a reasonable price for the outcomes obtained. Our parents will again be annoyed and confused by the low quality information that makes it impossible to know what to pay the teacher.

Buying Reading Ability (Scenario 3)

In the third scenario, we are still third graders in different schools with different reading teachers. This time our reading abilities are measured by tests that are completely unique. Every student has a test custom tailored to their particular ability. Unlike the tests in the first and second scenarios, however, now all of the tests have been constructed carefully on the basis of extensive data analysis and experimental tests. Different testing companies are providing the service, but they have gone to the trouble to work together to create consensus standards defining the unit of measurement for any and all reading test items.

For each test, our parents received a report in the mail showing our measures. As before, we know how many questions we each answered correctly. Now, though we don’t know which particular questions we got right or wrong, we can see typical items ordered by difficulty lined up in a way that shows us what kind of items we got wrong, and which kind we got right. And now we also know your tests were equated relative to mine, so we can compare how much reading ability you gained relative to how much I gained. Now our parents can confidently determine how much they should pay the teacher, at least in proportion to their children’s relative measures. If our measured gains are equal, the same payment can be made. If one of us obtained more value, then proportionately more should be paid.

In this third scenario, we have a situation directly analogous to buying oranges. You have a measured amount of increased reading ability that is expressed in the same unit as my gain in reading ability, just as the weights of the oranges are comparable. Further, your test items were not identical with mine, and so the difficulties of the items we took surely differed, just as the sizes of the oranges we bought did.

This third scenario could be made yet more efficient by removing the need for creating and maintaining a calibrated item bank, as described by Stenner and Stone (2003) and in the sixth developmental level in a prior blog post here. Also, additional efficiencies could be gained by unifying the interpretation of the reading ability measures, so that progress through high school can be tracked with respect to the reading demands of adult life (Williamson, 2008).

Comparison of the Purchasing Experiences

In contrast with the grocery store experience, paying for increased reading ability in the first scenario is fraught with low quality information that greatly increases the cost of the transactions. The information is of such low quality that, of course, hardly anyone bothers to go to the trouble to try to decipher it. Too much cost is associated with the effort to make it worthwhile. So, no one knows how much gain in reading ability is obtained, or what a unit gain might cost.

When a school district or educational researchers mount studies to try to find out what it costs to improve reading ability in third graders in some standardized unit, they find so much unexplained variation in the costs that they, too, raise more questions than answers.

But we don’t place the cost of making the value comparison on the consumer or the merchant in the grocery store. Instead, society as a whole picks up the cost by funding the creation and maintenance of consensus standard metrics. Until we take up the task of doing the same thing for intangible assets, we cannot expect human, social, and natural capital markets to obtain the efficiencies we take for granted in markets for tangible assets and property.

More and more states and nations around the world face the possibility of defaulting on their financial obligations. The financial crises are of epic historical proportions. This is a disaster of the first order. And yet, it is so odd–we have the solutions and preventative measures we need at our finger tips, but no one knows about them or is looking for them.

So, I am persuaded to once again wonder if there might now be some real interest in the possibilities of capitalizing on

instruments calibrated to measure in constant units (not ordinal ones) within known error ranges (not as though the measures are perfectly precise) with known data quality;

measures made meaningful by their association with invariant scales defined in terms of the questions asked;

adaptive instrument administration methods that make all measures equally precise by targeting the questions asked;

judge calibration methods that remove the person rating performances as a factor influencing the measures;

the metaphor of transparency by calibrating instruments that we really look right through at the thing measured (risk, governance, abilities, health, performance, etc.);

efficient markets for human, social, and natural capital by means of the common currencies of uniform metrics, calibrated instrumentation, and metrological networks;

the means available for tuning the instruments of the human, social, and environmental sciences to well-tempered scales that enable us to more easily harmonize, orchestrate, arrange, and choreograph relationships;

our understandings that universal human rights require universal uniform measures, that fair dealing requires fair measures, and that our measures define who we are and what we value; and, last but very far from least,

the power of love–the back and forth of probing questions and honest answers in caring social intercourse plants seminal ideas in fertile minds that can be nurtured to maturity and Socratically midwifed as living meaning born into supportive ecologies of caring relations.

How bad do things have to get before we systematically and collectively implement the long-established and proven methods we have at our disposal? It is the most surreal kind of schizophrenia or passive-aggressive avoidance pathology to keep on tormenting ourselves with problems for which we have solutions.

To properly pursue perfection, we need to parameterize it. That is, taking perfection as the ideal, unattainable standard against which we judge our performance is equivalent to thinking of it as a mathematical model. Organizations are intended to realize their missions independent of the particular employees, customers, suppliers, challenges, products, etc. they happen to engage with at any particular time. Organizational performance measurement (Spitzer, 2007) ought to then be designed in terms of a model that posits, tests for, and capitalizes on the always imperfectly realized independence of those parameters.

Lean thinking (Womack & Jones, 1996) focuses on minimizing waste and maximizing value. At every point at which resources are invested in processes, services, or products, the question is asked, “What value is added here?” Resources are wasted when no value is added, when they can be removed with no detrimental effect on the value of the end product. In their book, Natural Capitalism: Creating the Next Industrial Revolution, Hawken, Lovins, and Lovins (1999, p. 133) say

“Lean thinking … changes the standard for measuring corporate success. … As they [Womack and Jones] express it: ‘Our earnest advice to lean firms today is simple. To hell with your competitors; compete against perfection by identifying all activities that are muda [the Japanese term for waste used in Toyota’s landmark quality programs] and eliminating them. This is an absolute rather than a relative standard which can provide the essential North Star for any organization.”

Further, every input should “be presumed waste until shown otherwise.” A constant, ongoing, persistent pressure for removing waste is the basic characteristic of lean thinking. Perfection is never achieved, but it aptly serves as the ideal against which progress is measured.

Lean thinking sounds a lot like a mathematical model, though it does not seem to have been written out in a mathematical form, or used as the basis for calibrating instruments, estimating measures, evaluating data quality, or for practical assessments of lean organizational performance. The closest anyone seems to have come to parameterizing perfection is in the work of Genichi Taguchi (Ealey, 1988), which has several close parallels with Rasch measurement (Linacre, 1993). But meaningful and objective quantification, as required and achieved in the theory and practice of fundamental measurement (Andrich, 2004; Bezruczko, 2005; Bond & Fox 2007; Smith & Smith, 2004; Wilson, 2005; Wright, 1999), in fact asserts abstract ideals of perfection as models of organizational, social, and psychological processes in education, health care, marketing, etc. These models test the extent to which outcomes remain invariant across examination or survey questions, across teachers, students, schools, and curricula, or across treatment methods, business processes, or policies.

Though as yet implemented only to a limited extent in business (Drehmer, Belohlav, James, & Coye, 2000; Drehmer & Deklava, 2001; Lunz & Linacre, 1998; Salzberger, 2009), advanced measurement’s potential rewards are great. Fundamental measurement theory has been successfully applied in research and practice thousands of times over the last 40 years and more, including in very large scale assessments and licensure/certification applications (Adams, Wu, & Macaskill, 1997; Masters, 2007; Smith, Julian, Lunz, et al., 1994). These successes speak to an opportunity for making broad improvements in outcome measurement that could provide more coherent product definition, and significant associated opportunities for improving product quality and the efficiency with which it is produced, in the manner that has followed from the use of fundamental measures in other industries.

Of course, processes and outcomes are never implemented or obtained with perfect consistency. This would be perfectly true only in a perfect world. But to pursue perfection, we need to parameterize it. In other words, to raise the bar in any area of performance assessment, we have to know not only what direction is up, but we also need to know when we have raised the bar far enough. But we cannot tell up from down, we do not know how much to raise the bar, and we cannot properly evaluate the effects of lean experiments when we have no way of locating measures on a number line that embodies the lean ideal.

To think together collectively in ways that lead to significant new innovations, to rise above what Jaron Lanier calls the “global mush” of confused and self-confirming hive thinking, we need the common languages of widely accepted fundamental measures of the relevant processes and outcomes, measures that remain constant across samples of customers, patients, employees, students, etc., and across products, sales techniques, curricula, treatment processes, assessment methods, and brands of instrument.

We are all well aware that the consequences of not knowing where the bar is, of not having product definitions, can be disastrous. In many respects, as I’ve said previously in this blog, the success or failure of health care reform hinges on getting measurement right. The Institute of Medicine report, To Err is Human, of several years ago stresses the fact that system failures pose the greatest threat to safety in health care because they lead to human errors. When a system as complex as health care lacks a standard product definition, and product delivery is fragmented across multiple providers with different amounts and kinds of information in different settings, the system becomes dangerously cumbersome and over-complicated, with unacceptably wide variations and errors in its processes and outcomes, not to even speak of its economic inefficiency.

In contrast with the widespread use of fundamental measures in the product definitions of other industries, health care researchers typically implement neither the longstanding, repeatedly proven, and mathematically rigorous models of fundamental measurement theory nor the metrological networks through which reference standard metrics are engineered. Most industries carefully define, isolate, and estimate the parameters of their products, doing so in ways 1) that ensure industry-wide comparability and standardization, and 2) that facilitate continuous product improvement by revealing multiple opportunities for enhancement. Where organizations in other industries manage by metrics and thereby keep their eyes on the ball of product quality, health care organizations often manage only their own internal processes and cannot in fact bring the product quality ball into view.

In his message concerning the Institute for Healthcare Improvement’s Pursuing Perfection project a few years ago, Don Berwick, like others (Coye, 2001; Coye & Detmer, 1998), observed that health care does not yet have an organization setting new standards in the way that Toyota did for the auto industry in the 1970s. It still doesn’t, of course. Given the differences between the auto and health care industries uses of fundamental measures of product quality and associated abilities to keep their eyes on the quality ball, is it any wonder then, that no one in health care has yet hit a home run? It may well be that no one will hit a home run in health care until reference standard measures of product quality are devised.

The need for reference standard measures in uniform data systems is crucial, and the methods for obtaining them are widely available and well-known. So what is preventing the health care industry from adopting and deploying them? Part of the answer is the cost of the initial investment required. In 1980, metrology comprised about six percent of the U.S. gross national product (Hunter, 1980). In the period from 1981 to 1994, annual expenditures on research and development in the U.S. were less than three percent of the GNP, and non-defense R&D was about two percent (NIST Subcommittee on Research, National Science and Technology Council, 1996). These costs, however, must be viewed as investments from which high rates of return can be obtained (Barber, 1987; Gallaher, Rowe, Rogozhin, et al., 2007; Swann, 2005).

For instance, the U.S. National Institute of Standards and Technology estimated the economic impact of 12 areas of research in metrology, in four broad areas including semiconductors, electrical calibration and testing, optical industries, and computer systems (NIST, 1996, Appendix C; also see NIST, 2003). The median rate of return in these 12 areas was 147 percent, and returns ranged from 41 to 428 percent. The report notes that these results compare favorably with those obtained in similar studies of return rates from other public and private research and development efforts. Even if health care metrology produces only a small fraction of the return rate produced in physical metrology, its economic impact could still amount to billions of dollars annually. The proposed pilot projects therefore focus on determining what an effective health care outcomes metrology system should look like. What should its primary functions be? What should it cost? What rates of return could be expected from it?

Metrology, the science of measurement (Pennella, 1997), requires 1) that instruments be calibrated within individual laboratories so as to isolate and estimate the values of the required parameters (Wernimont, 1978); and 2) that individual instruments’ capacities to provide the same measure for the same amount, and so be traceable to a reference standard, be established and monitored via interlaboratory round-robin trials (Mandel, 1978).

Fundamental measurement has already succeeded in demonstrating the viability of reference standard measures of health outcomes, measures whose meaningfulness does not depend on the particular samples of items employed or patients measured. Though this work succeeds as far as it goes, it being done in a context that lacks any sense of the need for metrological infrastructure. Health care needs networks of scientists and technicians collaborating not only in the first, intralaboratory phase of metrological work, but also in the interlaboratory trials through which different brands or configurations of instruments intended to measure the same variable would be tuned to harmoniously produce the same measure for the same amount.

Implementation of the two phases of metrological innovation in health care would then begin with the intralaboratory calibration of existing and new instruments for measuring overall organizational performance, quality of care, and patients’ health status, quality of life, functionality, etc. The second phase takes up the interlaboratory equating of these instruments, and the concomitant deployment of reference standard units of measurement throughout a health care system and the industry as a whole. To answer questions concerning health care metrology’s potential returns on investment, the costs for, and the savings accrued from, accomplishing each phase of each pilot will be tracked or estimated.

When instruments measuring in universally uniform, meaningful units are put in the hands of clinicians, a new scientific revolution will occur in medicine. It will be analogous to previous ones associated with the introduction of the thermometer and the instruments of optometry and the clinical laboratory. Such tools will multiply many times over the quality improvement methods used by Brent James, touted as holding the key to health care reform in a recent New York Times profile. Instead of implicitly hypothesizing models of perfection and assessing performance relative to them informally, what we need is a new science that systematically implements the lean ideal on industry-wide scales. The future belongs to those who master these techniques.

National Institute for Standards and Technology (NIST). (1996). Appendix C: Assessment examples. Economic impacts of research in metrology. In C. o. F. S. Subcommittee on Research (Ed.), Assessing fundamental science: A report from the Subcommittee on Research, Committee on Fundamental Science. Washington, DC: National Standards and Technology Council [http://www.nsf.gov/statistics/ostp/assess/nstcafsk.htm#Topic%207; last accessed 18 February 2008].

Swann, G. M. P. (2005, 2 December). John Barber’s pioneering work on the economics of measurement standards [Electronic version]. Retrieved http://www.cric.ac.uk/cric/events/jbarber/swann.pdf from Notes for Workshop in Honor of John Barber held at University of Manchester.