program management data

I operate in a well regulated industry focused on project management. What this means practically is that there are data streams that flow from the R&D activities, recording planning and progress, via control and analytical systems to both management and customer. The contract type in most cases is Cost Plus, with cost and schedule risk often flowing to the customer in the form of cost overruns and schedule slippages.

Among the methodologies used to determine progress and project eventual outcomes is earned value management (EVM). Of course, this is not the only type of data that flows in performance management streams, but oftentimes EVM is used as shorthand to describe all of the data captured and submitted to customers in performance management. Other planning and performance management data includes time-phased scheduling of tasks and activities, cost and schedule risk assessments, and technical performance.

Previously in my critique regarding the differences between project monitoring and project management (before Hurricane Irma created some minor rearranging of my priorities), I pointed out that “looking in the rear view mirror” was often used as an excuse for by-passing unwelcome business intelligence. I followed this up with an intro to the synergistic economics of properly integrated data. In the first case I answered the critique demonstrating that it is based on an old concept that no longer applies. In the second case I surveyed the economics of data that drives efficiencies. In both cases, new technology is key to understanding the art of the possible.

As I have visited sites in both government and private industry, I find that old ways of doing things still persist. The reason for this is multivariate. First, technology is developing so quickly that there is fear that one’s job will be eliminated with the introduction of technology. Second, the methodology of change agents in introducing new technology often lacks proper socialization across the various centers of power that inevitably exist in any organization. Third, the proper foundation to clearly articulate the need for change is not made. This last is particularly important when stakeholders perform a non-rational assessment in their minds of cost-benefit. They see many downsides and cannot accept the benefits, even when they are obvious. For more on this and insight into other socioeconomic phenomena I strongly recommend Daniel Kahneman’s Thinking Fast and Slow. There are other reasons as well, but these are the ones that are most obvious when I speak with individuals in the field.

The Past is Prologue

For now I will restrict myself to the one benefit of new technology that addresses the “looking in the rear window” critique. It is important to do so because the critique is correct in application (for purposes that I will outline) if incorrect in its cause-and-effect. It is also important to focus on it because the critique is so ubiquitous.

As I indicated above, there are many sources of data in project management. They derive from the following systems (in brief):

a. The planning and scheduling applications, which measure performance through time in the form of discrete activities and events. In the most sophisticated implementations, these applications will include the assignment of resources, which requires the integration of these systems with resource management. Sometimes simple costs are also assigned and tracked through time as well.

b. The cost performance (earned value) applications, which ideally are aligned with the planning and scheduling applications, providing cross-integration with WBS and OBS structures, but focused on work accomplishment defined by the value of work completed against a baseline plan. These performance figures are tied to work accomplishment through expended effort collected by and, ideally, integrated with the financial management system. It involves the proper application of labor rates and resource expenditures in the accomplishment of the work to not only provide an statistical assessment of performance to date, but a projection of likely cost performance outcomes at completion of the effort.

c. Risk assessment applications which, depending of their sophistication and ease of use, provide analysis of possible cost and schedule outcomes, identify the sensitivity of particular activities and tasks, provide an assessment of alternative driving and critical paths, and apply different models of baseline performance to predict future outcomes.

d. Systems engineering applications that provide an assessment of technical performance to date and the likely achievement of technical parameters within the scope of the effort.

e. The financial management applications that provide an accounting of funds allocation, cash-flow, and expenditure, including planning information regarding expenditures under contract and planned expenditures in the future.

These are the core systems of record upon which performance information is derived. There are others as well, depending on the maturity of the project such as ERP systems and MRP systems. But for purposes of this post, we will bound the discussion to these standard sources of data.

In the near past, our ability to understand the significance of the data derived from these systems required manual processing. I am not referring to the sophistication of human computers of 1960s and before, dramatized to great effect in the uplifting movie Hidden Figures. Since we are dealing with business systems, these methodologies were based on simple business metrics and other statistical methods, including those that extended the concept of earned value management.

With the introduction of PCs in the workplace in the 1980s, desktop spreadsheet applications allowed this data to be entered, usually from printed reports. Each analyst not only used standard methods common in the discipline, but also developed their own methods to process and derive importance from the data, transforming it into information and useful intelligence.

Shortly after this development simple analytical applications were introduced to the market that allowed for pairing back the amount of data deriving from some of these systems and performing basic standard calculations, rendering redundant calculations unnecessary. Thus, for example, instead of a person having to calculate multiple estimates to complete, the application could perform those calculations as part of its functionality and deliver them to the analyst for use in, hopefully, their own more extensive assessments.

But even in this case, the data flow was limited to the EVM silo. The data streams relating to schedule, risk, SE, and FM were left to their own devices, oftentimes requiring manual methods or, in the best of cases, cut-and-paste, to incorporate data from reports derived from these systems. In the most extreme cases, for project oversight organizations, this caused analysts to acquire a multiplicity of individual applications (with the concomitant overhead and complexity of understanding differing lexicons and software application idiosyncrasies) in order to read proprietary data types from the various sources just to perform simple assessments of the data before even considering integrating it properly into the context of all of the other project performance data that was being collected.

The bottom line of outlining these processes is to note that, given a combination of manual and basic automated tools, that putting together and reporting on this data takes time, and time, as Mr. Benjamin Franklin noted, is money.

By itself the critique that “looking in the rear view mirror” has no value and attributing it to one particular type of information (EVM) is specious. After all, one must know where one has been and presently is before you can figure out where you need to go and how to get there and EVM is just one dimension of a multidimensional space.

But there is a utility value associated with the timing and locality of intelligence and that is the issue.

Contributors to time

Time when expended to produce something is a form of entropy. For purposes of this discussion at this level of existence, I am defining entropy as availability of the energy in a system to do work. The work in this case is the processing and transformation of data into information, and the further transformation of information into usable intelligence.

There are different levels and sub-levels when evaluating the data stream related to project management. These are:

a. Within the supplier/developer/manufacturer

(1) First tier personnel such as Control Account Managers, Schedulers (if separate), Systems Engineers, Financial Managers, and Procurement personnel among other actually recording and verifying the work accomplishment;

(2) Second tier personnel that includes various levels of management, either across teams or in typical line-and-staff organizations.

b. Within customer and oversight organizations

(1) Reporting and oversight personnel tasks with evaluating the fidelity of specific business systems;

(2) Counterpart project or program officer personnel tasked with evaluating progress, risk, and any factors related to scope execution;

(3) Staff organizations designed to supplement and organize the individual project teams, providing a portfolio perspective to project management issues that may be affected by other factors outside of the individual project ecosystem;

(4) Senior management at various levels of the organization.

Given the multiplicity of data streams it appears that the issue of economies is vast until it is understood that the data that underlies the consumers of the information is highly structured and specific to each of the domains and sub-domains. Thus there are several opportunities for economies.

For example, cost performance and scheduling data have a direct correlation and are closely tied. Thus, these separate streams in the A&D industry were combined under a common schema, first using the UN/CEFACT XML, and now transitioning to a more streamlined JSON schema. Financial management has gone through a similar transition. Risk and SE data are partially incorporated into project performance schemas, but the data is also highly structured and possesses commonalities to be directly accessed using technologies that effectively leverage APIs.

Back to the Future

The current state, despite advances in the data formats that allow for easy rationalization and normalization of data that breaks through propriety barriers, still largely is based a slightly modified model of using a combination of manual processing augmented by domain-specific analytical tools. (Actually sub-domain analytical tools that support sub-optimization of data that are a barrier to incorporation of cross-domain integration necessary to create credible project intelligence).

Thus, it is not unusual at the customer level to see project teams still accepting a combination of proprietary files, hard copy reports, and standard schema reports. Usually the data in these sources is manually entered into Excel spreadsheets or a combination of Excel and some domain-specific analytical tool (and oftentimes several sub-specialty analytical tools). After processing, the data is oftentimes exported or built in PowerPoint in the form of graphs or standard reporting formats. This is information management by Excel and PowerPoint.

In sum, in all too many cases the project management domain, in terms of data and business intelligence, continues to party like it is 1995. This condition also fosters and reinforces insular organizational domains, as if the project team is disconnected from and can possess goals antithetical and/or in opposition to the efficient operation of the larger organization.

A typical timeline goes like this:

a. Supplier provides project performance data 15-30 days after the close of a period. (Some contract clauses give more time). Let’s say the period closed at the end of July. We are now effectively in late August or early September.

b. Analysts incorporate stove-piped domain data into their Excel spreadsheets and other systems another week or so after submittal.

c. Analysts complete processing and analyzing data and submit in standard reporting formats (Excel and PowerPoint) for program review four to six weeks after incorporation of the data.

Items a through c now put a typical project office at project review for July information at the end of September or beginning of October. Furthermore, this information is focused on individual domains, and given the lack of cross-domain knowledge, can be contradictory.

This system is broken.

Even suppliers who have direct access to systems of record all too often rely on domain-specific solutions to be able to derive significance from the processing of project management data. The larger suppliers seem to have recognized this problem and have been moving to address it, requiring greater integration across solutions. But the existence of a 15-30 day reconciliation period after the end of a period, and formalized in contract clauses, is indicative of an opportunity for greater efficiency in that process as well.

The Way Forward

But there is another way.

The opportunities for economy in the form of improvements in time and effort are in the following areas, given the application of the right technology:

In retrieving all data so that it is easily accessible to the organization at the level of detailed required by the task at hand;

In processing this data so that it can converted by the analyst into usable intelligence;

In properly accessing, displaying, and reporting properly integrated data across domains, as appropriate, to each level of the organization regardless of originating data stream.

Furthermore, there opportunities to realizing business value by improving these processes:

By extending expertise beyond a limited number of people who tend to monopolize innovations;

By improving organizational knowledge by incorporating innovation into the common system;

By gaining greater insight into more reliable predictors of project performance across domains instead of the “traditional” domain-specific indices that have marginal utility;

By developing a project focused organization that breaks down domain-centric thinking;

By developing a culture that ties cross-domain project knowledge to larger picture metrics that will determine the health of the overarching organization.

It is interesting that when I visit the field how often it is asserted that “the technology doesn’t matter, it’s process that matters”.

Wrong. Technology defines the art of the possible. There is no doubt that in an ideal world we would optimize our systems prior to the introduction of new technology. But that assumes that the most effective organization (MEO) is achievable without technological improvements to drive the change. If one cannot efficiently integrate all submitted cross-domain information effectively and efficiently using Excel in any scenario (after all, it’s a lot of data), then the key is the introduction of new technology that can do that very thing.

So what technologies will achieve efficiency in the use of this data? Let’s go through the usual suspects:

a. Will more effective use of PowerPoint reduce these timelines? No.

b. Will a more robust set of Excel workbooks reduce these timelines? No.

c. Will an updated form of a domain-specific analytical tool reduce these timelines? No.

d. Will a NoSQL solution reduce these timelines? Yes, given that we can afford the customization.

e. Will a COTS BI application that accepts a combination of common schemas and APIs reduce these timelines? Yes.

The technological solution must be fitted to its purpose and time. Technology matters because we cannot avoid the expenditure of time or energy (entropy) in the processing of information. We can perform these operations using a large amount of energy in the form of time and effort, or we can conserve time and effort by substituting the power of computing and information processing. While we will never get to the point where we completely eliminate entropy, our application of appropriate technology makes it seem as if effort in the form of time is significantly reduced. It’s not quite money for nothing, but it’s as close as we can come and is an obvious area of improvement that can be made for a relatively small investment.

“The past is never dead. It’s not even past.” — William Faulkner, Requiem for a Nun

Over the years I and others have briefed project managers on project performance using KPPs, earned value management, schedule analysis, business analytics, and what we now call predictive analytics. Oftentimes, some set of figures will be critiqued as being ineffective or unhelpful; that the analytics “only look in the rear view mirror” and that they “tell me what I already know.”

In approaching this critique, it is useful to understand Faulkner’s oft-cited quote above. When we walk down a street, let us say it is a busy city street in any community of good size, we are walking in the past. The moment we experience something it is in the past. If we note the present condition of our city street we will see that for every building, park, sidewalk, and individual that we pass on that sidewalk, each has a history. These structures and the people are as much driven by their pasts as their expectations for the future.

Now let us take a snapshot of our street. In doing so we can determine population density, ethnic demographics, property values, crime rate, and numerous other indices and parameters regarding what is there. No doubt, if we stop here we are just “looking in the rear view mirror” and noting what we may or may not know, however certain our anecdotal filter.

Now, let us say that we have an affinity for this street and may want to live there. We will take the present indices and parameters that noted above, which describe our geographical environment, and trend it. We may find that housing pricing are rising or falling, that crime is rising or falling, etc. If we delve into the street’s ownership history we may find that one individual or family possesses more than one structure, or that there is a great deal of diversity. We may find that a Superfund site is not too far away. We may find that economic demographics are pointing to stagnation of the local economy, or that the neighborhood is becoming gentrified. Just by time-phasing and delving into history–by mapping out the trends and noting the significant historical background–provides us with enough information to inform us about whether our affinity is grounded in reality or practicality.

But let us say that, despite negatives, we feel that this is the next up-and-coming neighborhood. We would need signs to make that determination. For example, what kinds of businesses have moved into the neighborhood and what is their number? What demographic do they target? There are many other questions that can be asked to see if our economic analysis is valid–and that analysis would need to be informed by risk.

The fact of the matter is that we are always living with the past: the cumulative effect of the past actions of numerous individuals, including our own, and organizations, groups of individuals, and institutions; not to mention larger economic forces well beyond our control. Any desired change in the trajectory of the system being evaluated must identify those elements that can be impacted or influenced, and an analysis of the effort that must be expended to bring about the change, is also essential.

This is a scientific fact, proven countless times by physics, biology, and other disciplines. A deterministic universe, which provides for some uncertainty at any given point at our level of existence, drives the possible within very small limits of possibility and even smaller limits of probability. What this means in plain language is that the future is usually a function of the past.

Any one number or index, no doubt, does not necessarily tell us something important. But it could if it is relevant, material, and prompts further inquiry essential to project performance.

For example, let us look at an integrated master schedule that underlies a typical medium-sized project.

We will select a couple of metrics that indicates project schedule performance. In the case below we are looking at task hits and misses and Baseline Execution Index, a popular index that determines efficiency in meeting baseline schedule planning.

Note that the chart above plots the performance over time. What will it take to improve our efficiency? So as a quick logic check on realism, let’s take a look at the work to date with all of the late starts and finishes.

Our bow waves track the cumulative effort to date. As we work to clear missed starts or missed finishes in a project we also must devote resources to the accomplishment of current work that is still in line with the baseline. What this means is that additional resources may need to be devoted to particular areas of work accomplishment or risk handling.

This is not, of course, the limit to our analysis that should be undertaken. The point here is that at every point in history in every system we stand at a point of the cumulative efforts, risk, failure, success, and actions of everyone who came before us. At the microeconomic level this is also true within our project management systems. There are also external constraints and influences that will define the framing assumptions and range of possibilities and probabilities involved in project outcomes.

The shear magnitude of the bow waves that we face in all endeavors will often be too great to fully overcome. As an analogy, a bow wave in complex systems is more akin to a tsunami as opposed to the tidal waves that crash along our shores. All of the force of all of the collective actions that have preceded present time will drive our trajectory.

This is known as inertia.

Identifying and understanding the contributors to the inertia that is driving our performance is important to knowing what to do. Thus, looking in the rear view mirror is important and not a valid argument for ignoring an inconvenient metric that may only require additional context. Furthermore, knowing where we sit is important and not insignificant. Knowing the factors that put us where we are–and the effort that it will take to influence our destiny–will guide what is possible and not possible in our future actions.

For those not in the know, these meetings are an essential coming together of policy makers, subject matter experts, and private industry practitioners regarding the practical and mundane state-of-the-practice in complex project management, particularly focused on the concerns of the the federal government and the Department of Defense. The end result of these meetings is to publish white papers and recommendations regarding practice to support continuous process improvement and the practical application of project management practices–allowing for a cross-pollination of commercial and government lessons learned. This is also the intersection where innovation among the large and small are given an equal vetting and an opportunity to introduce new concepts and solutions. This is an idealized description, of course, and most of the petty personality conflicts, competition, and self-interest that plagues any group of individuals coming together under a common set of interests also plays out here. But generally the days are long and the workshops generally produce good products that become the de facto standard of practice in the industry. Furthermore the control that keeps the more ruthless personalities in check is the fact that, while it is a large market, the complex project management community tends to be a relatively small one, which reinforces professionalism.

The “blues” in this case is not so much borne of frustration or disappointment but, instead, from the long and intense days that the sessions offer. The biggest news from an IT project management and application perspective was twofold. The data stream used by the industry in sharing data in an open systems manner will be simplified. The other was the announcement that the technology used to communicate will move from XML to JSON.

Human readable formatting to Data-focused formatting. Under Kendall’s Better Buying Power 3.0 the goal of the Department of Defense (DoD) has been to incorporate better practices from private industry where they can be applied. I don’t see initiatives for greater efficiency and reduction of duplication going away in the new Administration, regardless of what a new initiative is called.

In case this is news to you, the federal government buys a lot of materials and end items–billions of dollars worth. Accountability must be put in place to ensure that the money is properly spent to acquire the things being purchased. Where technology is pushed and where there are no commercial equivalents that can be bought off the shelf, as in the systems purchased by the Department of Defense, there are measures of progress and performance (given that the contract is under a specification) that are submitted to the oversight agency in DoD. This is a lot of data and to be brutally frank the method and format of delivery has been somewhat chaotic, inefficient, and duplicative. The Department moved to address this by a somewhat modest requirement of open systems submission of an application-neutral XML file under the standards established by the UN/CEFACT XML organization. This was called the Integrated Program Management Report (IMPR). This move garnered some improvement where it has been applied, but contracts are long-term, so incorporating improvements though new contractual requirements tends to take time. Plus, there is always resistance to change. The Department is moving to accelerate addressing these inefficiencies in their data streams by eliminating the unnecessary overhead associated with specifications of formatting data for paper forms and dealing with data as, well, data. Great idea and bravo! The rub here is that in making the change, the Department has proposed dropping XML as the technology used to transfer data and move to JSON.

XML to JSON. Before I spark another techie argument about the relative merits of each, there are some basics to understand here. First, XML is a language, JSON is simply data exchange format. This means that XML is specifically designed to deal with hierarchical and structured data that can be queried and where validation and fidelity checks within the data are inherent in the technology. Furthermore, XML is known to scale while maintaining the integrity of the data, which is intended for use in relational databases. Furthermore, XML is hard to break. It is meant for editing and will maintain its structure and integrity afterward.

The counter argument encountered is that JSON is new! and uses fewer characters! (which usually turns out to be inconsequential), and people are talking about it for Big Data and NoSQL! (but this happened after the fact and the reason for shoehorning it this way is discussed below).

So does it matter? Yes and no. As a supplier specializing in delivering solutions that normalize and rationalize data across proprietary file structures and leverage database capabilities, I don’t care. I can adapt quickly and will have a proof-of-concept solution out within 30 days of receiving the schema.

The risk here, which applies to DoD and the industry, is that the decision to go to JSON is made only because it is the shiny new thing used by gamers and social networking developers. There has also been a move to adapt to other uses because of the history of significant security risks that had been found in Java, so much so that an entire Wikipedia page is devoted to them. Oracle just killed off Java applets, though Java hangs on. JSON, of course, isn’t Java, but it was designed from birth as JavaScript Object Notation (hence the acronym JSON), with the purpose of handling relatively small bits of data across web servers in a number of proprietary settings.

To address JSON deficiencies relative to XML, a number of tools have been and are being developed to replicate the fidelity and reliability found in XML. Whether this is sufficient to be effective against a structured LANGUAGE is to be seen. Much of the overhead that technies complain about in XML is due to the native functionality related to the power it brings to the table. No doubt, a bicycle is simpler than a Formula One racer–and this is an apt comparison. Claiming “simpler” doesn’t pass the “So What?” test knowing the business processes involved. The technology needs to be fit to the solution. The purpose of data transmission using APIs is not only to make it easy to produce but for it to–you know–achieve the goals of normalization and rationalization so that it can be used on the receiving end which is where the consumer (which we usually consider to be the customer) sits.

At the end of the day the ability to scale and handle hierarchical, structured data will rely on the quality and strength of the schema and the tools that are published to enforce its fidelity and compliance. Otherwise consuming organizations will be receiving a dozen different proprietary JSON files, and that does not address the present chaos but simply adds to it. These issues were aired out during the meeting and it seems that everyone is aware of the risks and that they can be addressed. Furthermore, as the schema is socialized across solutions providers, it will be apparent early if the technology will be able handle the project performance data resulting from the development of a high performance aircraft or a U.S. Navy destroyer.

Neoclassical economics abhors inefficiency, and yet inefficiencies exist. Among the core issues that create inefficiencies is the asymmetrical nature of information. Asymmetry is an accepted cornerstone of economics that leads to inefficiency. We can see in our daily lives and employment the effects of one party in a transaction having more information than the other: knowing whether the used car you are buying is a lemon, measuring risk in the purchase of an investment and, apropos to this post, identifying how our information systems allow us to manage complex projects.

Regarding this last proposition we can peel this onion down through its various levels: the asymmetry in the information between the customer and the supplier, the asymmetry in information between the board and stockholders, the asymmetry in information between management and labor, the asymmetry in information between individual SMEs and the project team, etc.–it’s elephants all the way down.

This asymmetry, which drives inefficiency, is exacerbated in markets that are dominated by monopoly, monopsony, and oligopoly power. When informed by the work of Hart and Holmström regarding contract theory, which recently garnered the Nobel in economics, we have a basis for understanding the internal dynamics of projects in seeking efficiency and productivity. What is interesting about contract theory is that it incorporates the concept of asymmetrical information (labeled as adverse selection), but expands this concept in human transactions at the microeconomic level to include considerations of moral hazard and the utility of signalling.

The state of asymmetry and inefficiency is exacerbated by the patchwork quilt of “tools”–software applications that are designed to address only a very restricted portion of the total contract and project management system–that are currently deployed as the state of the art. These tend to require the insertion of a new class of SME to manage data by essentially reversing the efficiencies in automation, involving direct effort to reconcile differences in data from differing tools. This is a sub-optimized system. It discourages optimization of information across the project, reinforces asymmetry, and is economically and practically unsustainable.

The key in all of this is ensuring that sub-optimal behavior is discouraged, and that those activities and behaviors that are supportive of more transparent sharing of information and, therefore, contribute to greater efficiency and productivity are rewarded. It should be noted that more transparent organizations tend to be more sustainable, healthier, and with a higher degree of employee commitment.

The path forward where there is monopsony power, where there is a dominant buyer, is to impose the conditions for normative behavior that would otherwise be leveraged through practice in a more open market. For open markets not dominated by one player as either supplier or seller, instituting practices that reward behavior that reduces the effects of asymmetrical information, and contracting disincentives in business transactions on the open market is the key.

In the information management market as a whole the trends that are working against asymmetry and inefficiency involve the reduction of data streams, the construction of cross-domain data repositories (or reservoirs) that allow for the satisfaction of multiple business stakeholders, and the introduction of systems that are more open and adaptable to the needs of the project system in lieu of a limited portion of the project team. These solutions exist, yet their adoption is hindered because of the long-term infrastructure that is put in place in complex project management. This infrastructure is supported by incumbents that are reinforcing to the status quo. Because of this, from the time a market innovation is introduced to the time that it is adopted in project-focused organizations usually involves the expenditure of several years.

This argues for establishing an environment that is more nimble. This involves the adoption of a series of approaches to achieve the goals of broader information symmetry and efficiency in the project organization. These are:

a. Instituting contractual relationships, both internally and externally, that encourage project personnel to identify risk. This would include incentives to kill efforts that have breached their framing assumptions, or to consolidate progress that the project has achieved to date–sending it as it is to production–while killing further effort that would breach framing assumptions.

b. Institute policy and incentives on the data supply end to reduce the number of data streams. Toward this end both acquisition and contracting practices should move to discourage proprietary data dead ends by encouraging normalized and rationalized data schemas that describe the environment using a common or, at least, compatible lexicon. This reduces the inefficiency derived from opaqueness as it relates to software and data.

c. Institute policy and incentives on the data consumer end to leverage the economies derived from the increased computing power from Moore’s Law by scaling data to construct interrelated datasets across multiple domains that will provide a more cohesive and expansive view of project performance. This involves the warehousing of data into a common repository or reduced set of repositories. The goal is to satisfy multiple project stakeholders from multiple domains using as few streams as necessary and encourage KDD (Knowledge Discovery in Databases). This reduces the inefficiency derived from data opaqueness, but also from the traditional line-and-staff organization that has tended to stovepipe expertise and information.

d. Institute acquisition and market incentives that encourage software manufacturers to engage in positive signalling behavior that reduces the opaqueness of the solutions being offered to the marketplace.

In summary, the current state of project data is one that is characterized by “best-of-breed” patchwork quilt solutions that tend to increase direct labor, reduces and limits productivity, and drives up cost. At the end of the day the ability of the project to handle risk and adapt to technical challenges rests on the reliability and efficiency of its information systems. A patchwork system fails to meet the needs of the organization as a whole and at the end of the day is not “takin’ care of business.”

Been attending conferences and meetings of late and came upon a discussion of the means of reducing data streams while leveraging Moore’s Law to provide more, better data. During a discussion with colleagues over lunch they asked if asking for more detailed data would provide greater insight. This led to a discussion of the qualitative differences in data depending on what information is being sought. My response to more detailed data was to respond: “well there has to be a pony in there somewhere.” This was greeted by laughter, but then I finished the point: more detailed data doesn’t necessarily yield greater insight (though it could and only actually looking at it will tell you that, particularly in applying the principle of KDD). But more detailed data that is based on a hierarchical structure will, at the least, provide greater reliability and pinpoint areas of intersection to detect areas of risk manifestation that is otherwise averaged out–and therefore hidden–at the summary levels.

Not to steal the thunder of new studies that are due out in the area of data later this spring but, for example, I am aware after having actually achieved lowest level integration for extremely complex projects through my day job, that there is little (though not zero) insight gained in predictive power between say, the control account level of a WBS and the work package level. Going further down to element of cost may, in the words of the character in the movie Still Alice, where “You may say that this falls into the great academic tradition of knowing more and more about less and less until we know everything about nothing.” But while that may be true for project management, that isn’t necessarily so when collecting parametrics and auditing the validity of financial information.

Rolling up data from individually detailed elements of a hierarchy is the proper way to ensure credibility. Since we are at the point where a TB of data has virtually the same marginal cost of a GB of data (which is vanishingly small to begin with), then the more the merrier in eliminating the abuse associated with human-readable summary reporting. Furthermore, I have long proposed through this blog and elsewhere, that the emphasis should be away from people, process, and tools, to people, process, and data. This rightly establishes the feedback loop necessary for proper development and project management. More importantly, the same data available through project management processes satisfy the different purposes of domains both within the organization, and of multiple external stakeholders.

This then leads us to the concept of integrated project management (IPM), which has become little more than a buzz-phrase, and receives a lot of hand waves, mostly by technology companies that want to push their tools–which are quickly becoming obsolete–while appearing forward leaning. This tool-centric approach is nothing more than marketing–focusing on what the software manufacturer would have us believe is important based on the functionality baked into their applications. One can see where this could be a successful approach, given the emphasis on tools in the PM triad. But, of course, it is self-limiting in a self-interested sort of way. The emphasis needs to be on the qualitative and informative attributes of available data–not of tool functionality–that meet the requirements of different data consumers while minimizing, to the extent possible, the number of data streams.

Thus, there are at least two main aspects of data that are important in understanding the utility of project management: early warning/predictiveness and credibility/traceability/fidelity. The chart attached below gives a rough back-of-the-envelope outline of this point, with some proposed elements, though this list is not intended to be exhaustive.

In order to capture data across the essential elements of project management, our data must demonstrate both a breadth and depth that allows for the discovery of intersections of the different elements. The weakness in the two-dimensional model above is that it treats each indicator by itself. But, when we combine, for example, IMS consecutive slips with other elements listed, the informational power of the data becomes many times greater. This tells us that the weakness in our present systems is that we treat the data as a continuity between autonomous elements. But we know that the project consists of discontinuities where the next level of achievement/progress is a function of risk. Thus, when we talk about IPM, the secret is in focusing on data that informs us what our systems are doing. This will require more sophisticated types of modeling.

As many of my colleagues in project management know, I wrote a series of articles on the application of technical performance risk in project management back in 1997, one of which made me an award recipient from the institution now known as Defense Acquisition University. Over the years various researchers and project organizations have asked me if I have any additional thoughts on the subject and the response up until now has been: no. From a practical standpoint, other responsibilities took me away from the domain of determining the best way of recording technical achievement in complex projects. Furthermore, I felt that the field was not ripe for further development until there were mathematics and statistical methods that could better approach the behavior of complex adaptive systems.

But now, after almost 20 years, there is an issue that has been nagging at me since publication of the results of the project studies that I led from 1995 through 1997. It is this: the complaint by project managers in resisting the application of measuring technical achievement of any kind, and integrating it with cost performance, the best that anyone can do is 100%. “All TPM can do is make my performance look worse”, was the complaint. One would think this observation would not only not face opposition, especially from such an engineering dependent industry, but also because, at least in this universe, the best you can do is 100%.* But, of course, we weren’t talking about the same thing and I have heard this refrain again at recent conferences and meetings.

To be honest, in our recommended solution in 1997, we did not take things as far as we could have. It was always intended to be the first but not the last word regarding this issue. And there have been some interesting things published about this issue recently, which I noted in this post.

In the discipline of project management in general, and among earned value practitioners in particular, the performance being measured oftentimes exceeds 100%. But there is the difference. What is being measured as exceeding 100% is progress against both a time-based and fiscally-based linear plan. Most of the physical world doesn’t act nor can it be measured this way. When measuring the attributes of a system or component against a set of physical or performance thresholds, linearity against a human-imposed plan oftentimes goes out the window.

But a linear progression can be imposed on the development toward the technical specification. So then the next question is how do we measure progress during the development curve and duration.

The short answer, without repeating a summarization of the research (which is linked above) is through risk assessment, and the method that we used back in 1997 was a distribution curve that determined the probability of reaching the next step in the technical development. This was based on well-proven systems engineering techniques that had been used in industry for many years, particularly at pre-Lockheed Martin Martin Marietta. Technical risk assessment, even using simplistic 0-50-80-100 curves, provides a good approximation of probability and risk between each increment of development, though now there are more robust models. For example, the use of Bayesian methodology, which introduces mathematical rigor into statistics, as outlined in this post by Eliezer Yudkowsky. (As an aside, I strongly recommend his blogs for anyone interested in the cutting edge of rational inquiry and AI).

So technical measurement is pretty well proven. But the issue that then presents itself (and presented itself in 1997) was how to derive value from technical performance. Value is a horse of a different color. The two bugaboos that were presented as being impassible roadblocks were weight and test failure.

Let’s take weight first. On one of my recent trips I found myself seated in an Embraer E-jet. These are fairly small aircraft, especially compared to conventional commercial aircraft, and are lightweight. As such, they rely on a proper distribution and balance of weight, especially if one finds oneself at 5,000 feet above sea level with the long runway shut down, a 10-20 mph crosswind, and a mountain range rising above the valley floor in the direction of takeoff. So the flight crew, when the cockpit noted a weight disparity, shifted baggage from belly stowage to the overhead compartments in the main cabin. What was apparent is that weight is not an ad hoc measurement. The aircraft’s weight distribution and tolerances are documented–and can be monitored as part of operations.

When engineering an aircraft, each component is assigned its weight. Needless to say, weight is then allocated and measured as part of the development of subsystems of the aircraft. One would not measure the overall weight of the aircraft or end item without ensuring that the components and subsystems did not conform to the weight limitations. The overall weight limitation of an aircraft will very depending on mission and use. If a commercial-type passenger airplane built to takeoff and land from modern runways, weight limitations are not as rigorous. If the aircraft in question is going to takeoff and land from a carrier deck at sea then weight limitations become more critical. (Side note: I also learned these principles in detail while serving on active duty at NAS Norfolk and working with the Navy Air Depot there). Aside from aircraft weight is important in a host of other items–from laptops to ships. In the latter case, of which I am also intimately familiar, weight is important in balancing the ship and its ability to make way in the water (and perform its other missions).

So given that weight is an allocated element of performance within subsystem or component development, we achieve several useful bits of information. First off, we can aggregate and measure weight of the entire end item to track if we are meeting the limitations of the item. Secondly, we can perform trade-off. If a subsystem or component can be made with a lighter material or more efficiently weight-wise, then we have more leeway (maybe) somewhere else. Conversely, if we need weight for balance and the component or subsystem is too light, we need to figure out how to add weight or ballast. So measuring and recording weight is not a problem. Finally, we allocate and tie performance-wise a key technical specification to the work, avoiding subjectivity.

So how to do we show value? We do so by applying the same principles as any other method of earned value. Each item of work is covered by a Work Breakdown Structure (WBS), which is tied (hopefully) to an Integrated Master Schedule (IMS). A Performance Management Baseline (PMB) is applied to the WBS (or sometimes thought a resource-loaded IMS). If we have properly constructed our Integrated Management Plan (IMP) prior to the IMS, we should clearly have tied the relationship of technical measures to the structure. I acknowledge that not every program performs an IMP, but stating so is really an acknowledgement of a clear deficiency in our systems, especially involving complex R&D programs. Since our work is measured in short increments against a PMB, we can claim 100% of a technical specification but be ahead of plan for the WBS elements involved.

It’s not as if the engineers in our industrial activities and aerospace companies have never designed a jet aircraft or some other item before. Quite a bit of expertise and engineering know-how transfers from one program to the next. There is a learning curve. The more information we collect in that regard, the more effective that curve. Hence my emphasis in recent posts on data.

For testing, the approach is the same. A test can fail, that is, a rocket can explode on the pad or suffer some other mishap, but the components involved will succeed or fail based on the after-action report. At that point we will know, through allocation of the test results, where we are in terms of technical performance. While rocket science is involved in the item’s development, recording technical achievement is not rocket science.

Thus, while our measures of effectiveness, measures of performance, measures of progress, and technical performance will determine our actual achievement against a standard, our fiscal assessment of value against the PMB can still reflect whether we are ahead of schedule and below budget. What it takes is an understanding of how to allocate more rigorous measures to the WBS that are directly tied to the technical specifications. To do otherwise is to build a camel when a horse was expected or–as has been recorded in real life in previous programs–to build a satellite that cannot communicate, a Navy aircraft that cannot land on a carrier deck, a ship that cannot fight, and a vaccine that cannot be delivered and administered in the method required. We learn from our failures, and that is the value of failure.

*There are colloquial expressions that allow for 100% to be exceeded, such as exceeding 100% of the tolerance of a manufactured item or system, which essentially means to exceed its limits and, therefore, breaking it.

I’ve run into additional questions about scalability. It is significant to understand the concept in terms of assessing software against data size, since there are actually various aspect of approaching the issue.

Unlike situations where data is already sorted and structured as part of the core functionality of the software service being provided, this is in dealing in an environment where there are many third-party software “tools” that put data into proprietary silos. These act as barriers to optimizing data use and gaining corporate intelligence. The goal here is to apply in real terms the concept that the customers generating the data (or stakeholders who pay for the data) own the data and should have full use of it across domains. In project management and corporate governance this is an essential capability.

For run-of-the-mill software “tools” that are focused on solving one problem, this often is interpreted as just selling a lot more licenses to a big organization. “Sure we can scale!” is code for “Sure, I’ll sell you more licenses!” They can reasonably make this assertion, particularly in client-server or web environments, where they can point to the ability of the database system on which they store data to scale. This also comes with, usually unstated, the additional constraint that their solution rests on a proprietary database structure. Such responses, through, are a form of sidestepping the question, nor is it the question being asked. Thus, it is important for those acquiring the right software to understand the subtleties.

A review of what makes data big in the first place is in order. The basic definition, which I outlined previously, came from NASA in describing data that could not be held in local memory or local storage. Hardware capability, however, continues to grow exponentially, so that what is big data today is not big data tomorrow. But in handling big data, it then becomes incumbent on software publishers to drive performance to allow their customers to take advantage of the knowledge contained in these larger data sets.

The elements that determine the size of data are:

a. Table size

b. Row or Record size

c. Field size

d. Rows per table

e. Columns per table

f. Indexes per table

Note the interrelationships of these elements in determining size. Thus, recently I was asked how many records are being used on the largest tables being accessed by a piece of software. That is fine as shorthand, but the other elements add to the size of the data that is being accessed. Thus, a set of data of say 800K records may be just as “big” as one containing 2M records because of the overall table size of fields, and the numbers of columns and indices, as well as records. Furthermore, this question didn’t take into account the entire breadth of data across all tables.

Understanding the definition of data size then leads us to understanding the nature of software scaling. There are two aspects to this.

The first is the software’s ability to presort the data against the database in such as way as to ensure that latency–the delay in performance when the data is loaded–is minimized. The principles applied here go back to database management practices back in the day when organizations used to hire teams of data scientists to rationalize data that was processed in machine language–especially when it used to be stored in ASCII or, for those who want to really date themselves, EBCDIC, which were incoherent by today’s generation of more human-readable friendly formats.

Quite simply, the basic steps applied has been to identify the syntax, translate it, find its equivalents, and then sort that data into logical categories that leverage database pointers. What you don’t want the software to do is what used to be done during the earliest days of dealing with data, which was smaller by today’s standards, of serially querying ever data element in order to fetch only what the user is calling. Furthermore, it also doesn’t make much sense to deal with all data as a Repository of Babel to apply labor-intensive data mining in non-relational databases, especially in cases where the data is well understood and fairly well structured, even if in a proprietary structure. If we do business in a vertical where industry standards in data apply, as in the use of the UN/CEFACT XML convention, then much of the presorting has been done for us. In addition, more powerful industry APIs (like OLE DB and ODBC) that utilize middleware (web services, XML, SOAP, MapReduce, etc.) multiply the presorting capabilities of software, providing significant performance improvements in accessing big data.

The other aspect is in the software’s ability to understand limitations in data communications hardware systems. This is a real problem, because the backbone in corporate communication systems, especially to ensure security, is still largely done over a wire. The investments in these backbones is usually categorized as a capital investment, and so upgrades to the system are slow. Furthermore, oftentimes backbone systems are embedded in physical plant building structures. So any software performance is limited by the resistance and bandwidth of wiring. Thus, we live in a world where hardware storage and processing is doubling every 12-18 months, and software is designed to better leverage such expansion, but the wires over which data communication depends remains stuck in the past–constrained by the basic physics of CAT 6 or Fiber Optic cabling.

Needless to say, software manufacturers who rely on constant communications with the database will see significantly degraded performance. Some software publishers who still rely on this model use the “check out” system, treating data like a lending library, where only one user or limited users can access the same data. This, of course, reduces customer flexibility. Strategies that are more discrete in handling data are the needed response here until day-to-day software communications can reap the benefits of physical advancements in this category of hardware. Furthermore, organizations must understand that the big Cloud in the sky is not the answer, since it is constrained by the same physics as the rest of the universe–with greater security risks.

All of this leads me to a discussion I had with a colleague recently. He opened his iPhone and did a query in iTunes for an album. In about a second or so his query selected the artist and gave a list of albums–all done without a wire connection. “Why can’t we have this in our industry?” he asked. Why indeed? Well, first off, Apple iTunes has sorted its data to optimize performance with its app, and it is performed within a standard data stream and optimized for the Apple iOS and hardware. Secondly, the variables of possible queries in iTunes are predefined and tied to a limited and internally well-defined set of data. Thus, the data and application challenges are not equivalent as found in my friend’s industry vertical. For example, aside from the challenges of third party data normalization and rationalization, iTunes is not dealing with dynamic time-phased or trending data that requires multiple user updates to reflect changes using predictive analytics which is then served to different classes of users in a highly secure environment. Finally, and most significantly, Apple spent more on that system than the annual budget of my friend’s organization. In the end his question was a good one, but in discussing this it was apparent that just saying “give me this” is a form of magical thinking and hand waving. The devil is in the details, though I am confident that we will eventually get to an equivalent capability.

At the end of the day IT project management strategy must take into account the specific needs of classes of users in making determinations of scaling. What this means is a segmented approach: thick-client users, compartmentalized local installs with subsets of data, thin-client, and web/mobile or terminal services equivalents. The practical solution is still an engineered one that breaks the elephant into digestible pieces that lean forward to leveraging advances in hardware, database, and software operating environments. These are the essential building blocks to data optimization and scaling.