Energy Efficiency may be the fifth fuel, but different than carbon, nuclear, wind, and solar, energy efficiency cannot be measured or metered - by definition, energy efficiency is the absence of the use of energy, which can only be calculated by comparing what might have been with what is. It is the delta between these values that comprises energy efficiency, which makes it a calculated value.This complexity leaves energy efficiency open to interpretation. One can always debate what might have been if energy efficiency actions had not been taken. There are many complicated and interactive factors that can play into the resulting energy efficiency including, weather, building usage, building changes, changes to resource costs, and occupant behavior. One can compare past bills against future results, but the savings are often obscured by significant noise that is common in building energy bills, and coming up with a bankable number that everyone can agree often not straightforward or clear.Fortunately, there are many examples of markets investing in calculated metrics. In fact, globally $1.2 Quadrillion Derivatives Market Dwarfs World GDP. While energy efficiency is complicated, it is possible for markets to agree on a standard set of “weights and measures” and is a foundational step that is underway - one project in particular to watch is the EDF Investor Confidence Project, a methodology that could be be equally applied to residential and commercial energy efficiency (full disclosure, I run this project for EDF).

Energy Efficiency is expressed in every transaction as a prediction or estimate of savings to come. This estimate is part of a calculation made by investors (Including: building owners, financing firms, utilities, energy services companies, insurance providers) and the validity and transparency is critical to create investor confidence and ensure real results are being delivered.

Given the complexity, I wanted to go over some of key terms necessary to have a real conversation about energy efficiency. There are in fact, a number of ways to express energy efficiency, and there is not a single right answer.

Realization Rate:Realization Rate is a comparison between predicted and actual energy usage, generally corrected for weather, and sometimes societal norms with a control group. A 100% realization rate means that on average savings were delivered as expected. A 60% realization rate would mean that on average from every 100 predicted kWh of savings, only 60 kWh were delivered. Average is highlighted above, as there can be considerable differences between energy efficiency portfolio’s that have the exact same realization rate. We will discuss Variance more in the next section, but it is possible for two portfolios with the same realization rate to have substantially different numbers of winners and losers.

It is also important to recognize that realization rate always comes with a confidence interval. Sufficient pre and post data is required to calculate savings vs. the baseline. A single or small number of projects may not be indicative of overall results, and it is necessary to get enough heating and cooling degree days to make sure we can calculate realization rate for different end uses and fuel types.Variance:Variance is the expression of not our average performance, but instead the number of outliers. This is often expressed as standard deviation.

Variance rates are critical in combination with Realization Rate to understand the actual performance of a portfolio. It is possible to get a 100% realization rate while having a lot of winners and losers. If 25% of projects were over by say 50%, and another 25% under by 50%, it is still possible to have a 100% realization rate.

In the context of program design and markets, both realization rate and variance are important, though potentially for different stakeholders. Utilities and public programs, for example, may be most interested in average performance as that is what drives fewer power plants, however they also have interest in protecting ratepayers, so variance becomes important. Individual consumers may care about realization rate in a general sense, however when it comes to what they should expect on their personal investment, variance level may be just as important as realization rate in their decision making.

As anyone who has worked with real live clients knows, even if one is perfectly correct on average, if there is a lot of variance (winners and losers) your phone will still ring on the weekends. If a client saved $25 a month, when they expected $50, they are no happier to learn their neighbor savings $75.

What Kind of Savings Are We Talking About?It turns out that in addition to how we measure savings, there are number of ways we talk about the savings themselves:

Gross Savings:Gross Savings is how utilities and regulators think about savings. This is the volume of savings in terms of units of energy saved. While this sounds simple, to consumers who generally barely understand the difference between a therm of gas or a kWh of electricity, gross savings may be confusing.

Percentage Savings:Many programs (including federal legislation) look at savings in terms percentage reduction for a specific building. This approach can be applied using site, source, or cost (more to come on that).

While this approach sounds really simple, it can have some interesting unintended consequences, such as rewarding smaller projects on home’s with lower bills at the same level as project on larger consumers that may in fact cost a lot more, and save a lot more gross energy. This can result in a selection bias that favors projects that save less energy.

Site Savings:Site Savings looks as savings in terms of reduced BTU and kWh at a specific building. While on the surface it is very simple, in reality because different fuel types may have different costs, percent reduction in site energy may not have a lot to do with percentage reductions in bills. This measure can also encourage fuel switching when it may not always be in the societal or even the customers best interest to do so.Source Savings:Source Savings looks at energy savings based on reductions in generation, not end use consumptions. In many cases and locations, there may be as many as three kWh generated for every one kWh that is ultimately consumed in a building (the other 66% being lost to grid inefficiencies). While source makes a lot of sense to utilities and policy makers, it is often very confusing to consumers, and hard to apply in a way that is accurate due to varying fuel mixes around the country and even in different places in a single region or time of day.

Cost Savings:Money saved is what customer care about most and understand best. It also tends to split the difference between site and source (as cost is also a reflection of energy production). This approach is not used widely, but is built into new federal legislation.

Making Energy Efficiency a Resource:This energy efficiency stuff is a lot harder than it would seem on the surface. While I fully support the basic notion of more transparency, registries, and public accounting, energy savings do not just pop out of the numbers. We need to define our terms and agree on the key metrics we are applying, and recognize with every choice, there are a host of often unintended consequences we must address.

Once we have agreed on a common metric and standard approach to calculating and expressing energy efficiency, and the accuracy of predictions, then we can start using this data to drive behavior (consumers, contractors, and regulators) and most importantly create alignment that encourages private markets to emerge.

There is a raging and healthy (most of the time) debate about the future of home performance and energy efficiency more generally.However, I have found that some of the key terms in energy efficiency and home performance are misused or have many different definitions, resulting in confusion and impeding our ability to move forward.Here are some key terms and how I have come to understand them. Please feel free to debate and discuss - that is the point of this blog!Home Performance: While “home performance” has come to mean a specific method to conducting whole house energy efficiency retrofits on existing building - generally including BPI / RESNET, energy audits and modeling, combustion safety, etc.

However, I believe home performance is not about specific tactics, but is instead an umbrella term that refers to a systems based approach, based on building science, that measures success in terms of results not by individual measures. Home performance is not how you do it, but instead is defined by outcomes including energy, health, and comfort.Performance Testing: We often confuse performance testing of a building and the overall performance of a project. Performance testing can provide more accurate and site specific inputs into a prediction of results or as a design tool, however real performance of the overall system can only be calculated at the meter (or in comfort or air quality), not in a blower door test or whole house assessment.Prescriptive Programs: While the current understanding of “prescriptive programs" tends to focus on programs that specify improvements and values, generally using population averages (deemed savings, product rebates, etc) vs. whole house site based analysis. However, I believe the real definition should be more broadly defined. Prescriptive approaches to energy efficiency, and prediction of savings, include all methods and models that are not calibrated to results at the meter. Employing a whole house methodology, or a simulation model to estimate overall savings, does not make a performance project. In fact, simulation models and “home performance” as we know it, is entirely prescriptive until such time as there is accountability and transparency to the results. Performance Based:“Performance Based” approaches to energy efficiency are based on past performance, calculated by measured performance data (meter data). However, incentives are driven by upfront predictions and performance risk (savings) is still be shouldered by ratepayers and homeowners and paid in advance of actual results.Past performance may be used to calibrate incentive levels and provide feedback to contractors / vendors / and ultimately consumers to drive improvement in the system, however fundamentally we are still betting with other peoples money, which triggers regulation and oversight - contractors and industry are not sharing the actual performance risk. Performance Contracting: “Performance Contracting” is a system where there is accountability directly to actual savings calculated based on the meter. Rather than ratepayer fronting incentives, and consumers taking performance risk on their investment in EE paying off, private parties will front incentive values to customers and put their money where their mouth is - taking full performance risk. Utilities will procure energy efficiency through demand side capacity contracts, and buy real and documented savings from aggregators or contractors directly. This will massively reduce program costs, as managing the risk becomes a private sector activity, and will result in dramatically increased value of the negawatt.By paying for delivered savings, the program and utilities roles change from defining business models and micro managing programs to attempt to regulate good results, into simple consumer protections, and ensuring that savings being procured are real and calculated correctly. Delivery models, software, measures, training, and quality assurance, all traditional program roles, will become functions of the private sector instead.Moving Toward Performance: Many of the internal Home Performance industry debates center on us all using the same terms, but in different ways. In particular, there is a loud contingent focused on getting quickly to “performance.” However, they tend to want avoid regulations in the process, when in fact what they are proposing is in fact a “performance based” model, where there is scorekeeping on predictions, but risk still flows to ratepayer and homeowners. Personally, I believe in moving to Performance Contracting and an Energy Efficiency as a Resource model for home performance (lowercase home performance - performance not a specific approach) and that there is a strategy and series of steps that will allow us to move from our current prescriptive approaches to home performance, to a reasonable middle ground where incentives are calibrated to past performance and results are reported transparently on an ongoing basis. I believe that the dataset that emerges from this Performance Based approach will be critical in defining the actual energy efficiency yields for whatever models work for industry and customers, and will provide the critical actuarial data that will facilitate a move to Performance Contracting business models. We must convert Energy Efficiency from uncertainty to manageable risk before private markets can engage.However, tracking performance is considerably more complicated than people tend to giving it credit for, and it will take some time to make this transition. Energy Efficiency is not something you can meter or measure and is easy to confuse with a range of external changes like weather, building use, and even resource price changes. Energy Efficiency is in fact a calculation and derivative value that requires relatively significant amounts of data to be statistically valid, and is not nearly as easily to “see” through the noise as many would make it out to be.Given all of this, I think there is actually a lot more alignment than one might believe reading some of the industry blogs and forums. I would like to challenge everyone to move from high level theory by translating ideas into actionable steps and processes. Nothing happens overnight, and we have to make sure we have a solid foundation in place to support the transition from prescriptive programs to performance based markets.

There has been an ongoing debate about the use cases and technical differences between energy models designed to produce a rating or asset label, designed generally for use in code or as a way for consumers to compare the energy use of two buildings, and operational models, used to predict energy usage and savings on an individual building, either for to inform an investment decision, or to qualify a project for participation in an incentive program.Recently, during an interesting conversation on the complexities of estimating r-value for assemblies in the BPI RESNET Linkedin group, a point was made by David Butler that I found exceedingly interesting and important.How one builds a conservative energy model for an asset rating is exactly the opposite of how one would do it in an operational setting. When building a model for a ratings or an asset score, the definition of a conservative assumption is a low value. If you pick a lower r-value there will be a lower and therefore more conservative score.In an operational model, where the purpose is typically to estimate the delta between the base case (where the house started) and an improved scenario (post retrofit), the definition of a conservative assumption is a high value. If you estimate a higher r-value, then there will be a more conservative estimate of savings. Conversely, if you use a low value (like you would in a conservative asset score) you will actually come up with a much more aggressive prediction of savings.Software for asset ratings are designed with low values which tends to drive lower scores for those homes that can use the most improvements, and raters are typically trained to default to low values if they are uncertain. This makes reasonable sense in the context of a label or for code enforcement, where the goal is to encourage folks to take action and lower scores are more likely to encourage folks to improve. However in an operational system where the goal is to give accurate predictions of savings to consumers, as well as project savings that will drive ratepayer incentives, this tendency in an asset model, leads to an underestimation of building performance, that in turn results in often drastic overestimations of energy use, and therefore potential savings.This issue confirms the notion that asset and operational scoring tools should not be one and the same. It also explains some of the realization rate problems we see when attempting to use rating software to predict retrofit savings.

The accuracy of energy savings estimates is an important aspect of energy efficiency public policy. Billions of dollars of ratepayer and taxpayer funds are spent on the basis of energy savings that bring benefits to homeowners and our energy infrastructure as a whole. Nearly all of these investments are made in advance, in the form of a rebate or other incentive, based on a prediction of savings to come.

There are many factors that impact the accuracy of energy savings estimates; this post is focused primarily on software in general, and input accuracy in particular. A software tool’s "accuracy" breaks down to a combination of the validity of the software’s approach and algorithms, combined with the quality of data inputs. For a software product to deliver consistent and accurate results, there needs be both a valid predictive model and reliable data.

If the values assigned to a building’s components are incorrect, then the predictions of savings for any given measure or combination of measure for that house are likely to be off. This is true even if the model has been calibrated using actual energy usage data.

For example, if a home energy auditor were to under-estimate the r-value (measure of energy resistance) of an existing insulated attic, then an improvement made to that attic would show a disproportionate level of savings. This input error will lead to incorrect expectations being set with customers as well as producing fewer public benefits than promised via utility efficiency portfolios that are approved by regulators and funded by ratepayers. For a program like Energy Upgrade California, where rebates are based on percentage reduction (similar to the Homes Act, and 25E in the US Congress), this issue is particularly pronounced and may results in incorrect rebate payments being made to homeowners.

Recent analysis on the Energy Upgrade California program has returned some surprising results related to contractor performance. While there was a high level of overall variance and an over-estimation of savings on average for the program, when the data was broken down by contractor, realization rates (billing analysis vs. predicted savings) by contractor was tightly clustered, with almost all contractor values overlapping when confidence interval was taken into effect. There was little apparent difference between contractors in the program.

The analysis tells us the the model being used in CA has a built in propensity to over predict baseline and therefor savings, however it also tells us that across all contractors, the way the model is used is consistant. Even though much as been made about how hard the current CA software tool is to use, it appears from analyzing over 1000 homes, including modeling data and actual bills, that two contractors using the same tool on the same set of houses, are likely to have similar results based on the fact that assumptions such as r-values of assemblies are consistent.

Like most software tools, the software used in CA (EnergyPro 5) uses look up tables that translates assembly attributes an auditor sees in a home (e.g. 2x4 wall, no insulation, stucco exterior) into an r-value from a library of consistent values. The auditor or contractor inputs what they are looking at - which in some cases also include attributes like quality - and an r-value is pulled. If those assumptions are wrong, then the overall model may have errors, but it would appear that the CA system is managing one of the major sources of error, which is a lack of consistancy in input value.

The question is, how important is it to standardize input values such as the r-value of wall assembly, and can contractors or auditor reliably or accurately estimate these values without the aid of look-up tables?

In an effort to get some data to attempt to answer these two question, a simple poll was created (SEE POLL), that shows participants a picture of an assembly (wall, attic, floor, IR vault), and a simple description of construction and insulation details in said assembly. Poll takers were then asked to give their estimate as to r-value for each assembly, much like they might have to do in the field.

The goal of this poll is to find out if contractors or auditors looking at the same wall, floor, attic, and vault come up with:

A effective assembly r-value

A consistent set of values

The poll was announced on the BPI / RESNET Group on Linkedin, as well as sent to a list-serve of about 190 home performance contractors in CA. This group is primarily comprised of fairly experienced auditors and contractors, and while this poll is not scientific, the results speak for themselves and should give us pause if we are relying on field estimates for r-values as the basis for a prediction. Answers were received over a 3 day period with 40 participants.

Question:

If you give 40 different auditors the same common four building assemblies (photos and descriptions), and ask them to give you what they think is the most correct r-value, will the results be consistent?

Figure 1: Predicted R-Value by Assembly

We can see from the range of these responses that there is a very significant variance in the results of this exresize.

To make it a bit easier to read, the graph below is reorganize and grouped by predicted r-value. Now one can see that there are some values where there is more agreement than others, but it is a pretty broad standard deviation - not a lot of clear agreement, and a lot of outlying values.

Figure 2: Predictions Sorted By R-Value

This spread is likely not a function of inexperience or lack of knowledge in the participants of this poll. It is more likely a case where participants are being asked to do something that is very hard - essentially memorize tables of values and make complex judgement calls to generate a value that has many variables.

Based on this initial data, It is clear that there is substantial variance on auditor's estimates of r-value. While a more detailed analysis would be welcome, these results are extremely pronounced and argue for software tools to provide r-values for assemblies rather than leave an empty box for the contractor or auditor to put in the r-value.

Until there is model for home performance where actual performance is tracked and used to calibrate software results, programs that rely on software predictions to drive consumer decision-making or rate payer funds, should require that input values are consistent, regardless of auditor, and based on imbedded assumptions. It may be fine to allow contractors to over-ride values, but this should be the exception and trigger QA, or should be through established methods like a quality rating that uniformly makes changes to the underlying assumption.

Additionally, if only an r-value is recorded without some sort of description of the assembly construction, it may be difficult or even impossible QA these inputs after the fact, as there may not be any way to verify the attributes of the assembly being estimated.

Vetting a software vendors model that does not include standardized assumptions through an engineering review that is conducted assuming reasonable or pre-established r-values is simply not sufficient - when in the real world it is now apparent how much variance we can expect input into those fields. This analysis strongly suggests that input quality in the field if not constrained will vary substantially from those used in an upfront review, and therefor render conclusions reached without factoring in quality of inputs in the field invalid.

For those tools who currently require an auditor to directly input r-values, vendors should be required to develop a set of look up tables that generate a value based on a description of the assembly. There are multiple places a vendor could go to get established assumption values (NREL, ACCA Manual J, etc.), and the fix could be as easy as adding a drop down menu to the software - which should also have little impact on an auditor in the field, and may in fact reduce the time it take to complete the model's inputs.

Recommendation: All software used in programs to predict savings should use standard assumptions based on assembly characteristics in order to improve the consistency and quality of input values.

On that note, I wanted to leave everyone with something to discuss. Given the fact that in this exercise, there really is no "right" answer, here is a cut of how we did on average for each assembly type.

Question: How is our collective intelligence based on these average predicted values?

Average Estimated R-Value by Assembly

POLL RESULTS UPDATE - October 12, 2013

Here are some updated charts based on 75 people who have filled out the poll as of Oct 12, 2013. I think it reenforces the conclusions in the blog.