This article was originally published in the March/April 1993 issue of Home Energy Magazine. Some formatting inconsistencies may be evident in older archive content.

| Back to Contents Page | Home Energy Index | About Home Energy |
| Home Energy Home Page | Back Issues of Home Energy |

Home Energy Magazine Online March/April 1993

METERED EVALUATION

Keeping a Running Score on Weatherization

by William W. Hill

Bill Hill is a senior researcher with the Ball State University Center for Energy Research/Education/Service in Muncie, Indiana.

An evaluation built around simple run-time metering of furnaces can tell a program manager if weatherization work is headed in the right direction.

Careful evaluations of low-income weatherization programs undertaken in the 1980s are in large part responsible for the impressive improvements many programs have achieved in recent years. But there is still much to learn, and all in the weatherization field can benefit from ongoing evaluations of their work.

One particular way to provide timely feedback on conservation measures uses elapsed timers--or run-time meters--which measure furnace consumption before and after weatherization. This type of short-term evaluation of weatherization stands as an alternative to PRISM (PRInceton Scorekeeping Method). (See box,Why Run-Time Meters Instead of PRISM?)

A Simple Approach

Short-term methods using run-time meters are very simple in concept. A run-time meter--either an elapsed timer wired in parallel to a circuit through which current flows when the furnace is firing, or a special recording thermostat--is installed 5-6 weeks prior to weatherization. Someone from the local weatherization agency calls the client once each week for the duration of the study and asks the client to read the meter--and thus the furnace run-time--over the phone.

The caller then multiplies this run-time by the furnace firing rate to get Btu consumed that week. Dividing these Btu by the heating degree-days (HDD) for the week and the heated floor area of the home produces the heating energy intensity for the home (Btu/ft2-HDD) for that week (see accompanying article `Read Me Your Thermostat': Short-Term Evaluation Tools, p. 33).

Data are gathered for 5-6 weeks prior to weatherization (the pre-retrofit period) and then again for 5-6 weeks following weatherization (the post-retrofit period). The change in the pre- and post-retrofit energy intensities is a measure of the energy savings in the home, which is then compared to changes in energy consumption in homes in a control group to get net savings from the retrofit measures (see box Double Duty Control Groups).

Design The Evaluation Carefully

The program manager who uses a run-time metered evaluation should keep some things in mind:

Research design requires careful thought. Why is the study being done and what are the goals?

Personnel shouldn't hang lots of run-time meters or fancy thermostats and only later think about what to do with the data.

Short-term methods utilizing run-time meters are great in some circumstances, and not appropriate in others. If the goal of the evaluation, for example, is to provide total annual program savings, this may not be the method of choice for two reasons. No matter how well it's done, this approach still measures only savings in heating; any savings in energy used for hot water will not be counted. And it is not at all obvious that the savings calculated using this method, using only a 10-15 week study period, translate easily into annual savings.

`Close Enough' is Not Good Enough

Some who use run-time metering for evaluation believe it lends itself to measurement and calculations which, though not precise, are close enough. If the purpose of an evaluation is only to provide feedback for field level personnel, there's no need to get everything exactly right, they reason, and that it's not essential to calculate exact savings, anyway. They just want to know which homes save lots of energy, and which don't.

This logic is seductive, but it's wrong. If the objective is to determine savings of individual homes, the evaluation needs to be done even more carefully than if the objective were program savings. This paradox has as its basis the power of large sample sizes, which can compensate for a multitude of sins. Large sample sizes blur the variation in individual home energy consumptions, and they are also quite forgiving when it comes to errors in individual homes--just so long as the errors occur randomly. With a large enough sample size, the mean and median heating energy consumptions can be very accurate, even if the measures for some homes are very wrong.

Getting It Right

The fine points on data quality and procedure which follow are not terribly important if the goal is to attain average savings for all the homes in a sample. However, to learn something about each home with a high degree of certainty, the following points are critical.

Avoiding `Garbage In, Garbage Out'

Not only does the run-time approach make use of field measurements of the furnace firing rate and heated area of the home, it requires meter data provided by the client. On both these counts, plus acquiring temperature data, high quality of data should be a priority.

Field measurements. The furnace firing rate and the heated area of the home are both easy to measure, and easy to get wrong. It's a good idea to measure both of these things twice, and ask, after completing each measurement, Does it make sense? This is easy to do, but all too often it's not done. It's important to stress to field personal the importance of getting it right, and to lay out exact procedures to follow.

Temperature Data. Local temperature data--whether from the local weather station, the newspaper, radio or TV--should be used with caution. The quality of information from local stations is typically not as good as from the major National Weather Service stations. Local temperature data should be utilized to compute the heating degree-days used in the weekly phone calls. However the evaluator should use temperature data from a major weather service station in the final savings computations (see Temperature Data Concerns in Short-term Metering, HE, Nov/Dec '92, p.7).

No matter how well designed the evaluation, there's no way to control the weather; this method requires cold weather to work. Weeks that are too warm cause problems. How warm is too warm? No one has a definitive answer, but we suggest that weeks with fewer than 140 HDD not be used unless absolutely necessary. While these warmer weeks may be fine, one should gather more data if at all possible.

High Quality Data--Lots of It. Compensating for warm weather isn't the only reason to gather more data. The cost of a few additional phone calls is pretty insignificant and it's a shame to lose homes from a study sample for lack of another week or two of data. This happened in a study we did of 53 homes weatherized in Indiana. The field personnel were told to get five weeks of data but not told why. They were not told that a week which included weatherization work would not count, nor that weeks with abnormal readings might be thrown out. They were not told that a week in which the client zeroed out the reading, erasing the week's data, would not count. They did as they were told, calling the client for the meter reading for each of exactly five weeks and then calling an end to the experiment. When in doubt, gather more data.

The Correct Reference Temperature

One of the principal, and perhaps unappreciated, strengths of PRISM is its ability to find the optimal reference temperature for a home. The reference temperature is the outside temperature at which a home thermostat calls for heat. The usual approach when using run-time meters instead of PRISM for evaluation is to calculate the heating degree-days using an assumed reference temperature of 65deg.F, or sometimes 60deg.F. The truth is, however, the exact reference temperature can make a big difference when looking at individual homes.

Michael Blasnik of GRASP has come up with a straightforward and relatively easy method for determining the best reference temperature. He assumes the best reference temperature is that temperature for which the relative standard deviation is a minimum. We adapted this method for the Indiana study by setting up a spreadsheet which calculated the pre- and post-retrofit energy intensities for a wide range of reference temperatures--50-75deg.F--obtained from the nearest major weather service station. The temperature yielding the smallest relative standard errors was used as the reference temperature for calculating energy intensities.

Figuring Savings

The usual approach for measuring savings is to average the weekly energy intensities of the homes in the pre- and post-retrofit periods. This method, referred to as the average ratio method, weights all weeks equally. However, since most heating energy consumption takes place in cold weather, and since the relationship between energy consumption and heating degree-days is more stable in cold weather, it makes sense to use an averaging method which weights degree-days equally, rather than in weeks. The ratio of sums or R-Sums method does just this. For each period, pre-retrofit and the post-retrofit, the total energy consumption (all the Btu) are divided by the total heating degre-days (all the HDD).

If one uses a procedure which finds the best reference temperature, he or she probably doesn't need to worry about using the average ratio method to calculate savings. However, if one assumes a set reference temperature and must use data from warm weeks, it can be helpful to use the R-Sums calculation method.

Dealing with Outliers

Sometimes the energy intensity for a week appears to be way out of line compared to other weekly readings. One of the strengths of the run-time metered evaluation is that it is often possible to handle such variation in data--an outlier--in almost real time. Is the high reading because Mrs. Jones' grandchildren were visiting that week? Is the low reading because she was away from home for most of the week? Ideally, that week's energy intensity figure is calculated while Mrs. Jones is still on the phone, and the decision whether to keep or discard it can be made then and there.

Of course, an outlier is not noticeable until there are enough other data in hand; an abnormally high or low figure in the first few weeks of any period will not be apparent at the time. For this and other reasons, some outliers likely will have to be dealt with later in the analysis. The important thing is to have a consistent set of rules for dealing with them.

Extra Effort Pays Off

Is all the extra work worth the trouble? We calculated the percentage savings for all 53 homes in the Indiana study two ways--using the rigorous approach advocated here, and using an unadorned approach (see Table 1). Interestingly, the average savings from all 53 homes is the same for both approaches, though the mean standard error is somewhat smaller for the rigorous method. If the objective of the study was to find the average savings of these 53 homes, these data suggest that the extra work of the more rigorous approach is probably not worth the effort.

The difference in savings for individual homes, on the other hand, vary significantly depending on which method was used. The ten homes listed in the table are those for which savings computed by the two methods compared least well. The choice of temperature data and computational techniques makes a big difference.

A Great Method If Properly Used

Run-time metering is a great evaluation approach, capable of providing timely feedback on measured energy savings to weatherization crews and program managers. High-quality data and careful analysis make it successful. We recommend the following:

Impress upon data gatherers that careful measurement and recordings are crucial.

Avoid hanging lots of run-time meters without giving careful thought to the goals of the analysis.

Gather as much data as possible consistent with the weatherization work schedule.

Use locally available temperature data for weekly energy intensity computations, but use data from a major weather service station in final savings computations.

Use a computational approach which finds the best reference temperature rather than assuming the old standbys of 65deg.F and 60deg.F.

Establish criteria--and stick to them--for dealing with outliers, weeks for which results seem too high or too low.

Reference

A discussion of reference temperature concerns and the R-Sums method can be found in W. W. Hill, M. Blasnik, K. M. Greely, and J. Randolph. Short-Term Metering for Measuring Residential Energy Savings: Concerns and Recommendations from Comparison with PRISM, Proceedings of the ACEEE 1992 Summer Study on Energy Efficiency in Buildings, Washington, DC: American Council for an Energy-Efficient Economy, pp. 4.81-4.91.

Why Run-Time Meters Instead of PRISM?

For measuring fuel savings in residential retrofits, PRISM, the PRIinceton Scorekeeping Method, is still considered by most to be the standard by which other evaluation methodologies are measured (see Now That I've Run PRISM, What Do I Do With the Results? HE, Sept/Oct '90, p. 27). PRISM is not without its shortcomings, however.

Though PRISM doesn't necessarily need metered utility data, that is the usual approach. (Some researchers have adapted it for use with deliveries of heating oil, for example.) If the home is heated with something other than natural gas or electricity, PRISM is not typically used, and run-time meters are a logical choice (though not necessarily short-term metering).

The time frame for a PRISM evaluation is long, due to the requirements for (ideally) one year of pre-retrofit data and one year of post-retrofit data. For immediate feedback to help crews improve their weatherization techniques, this is not the method of choice.

Finally, sample attrition can be significant. It can be the result of a number of factors, but the primary culprits are typically poor or missing utility data and movers--clients who relocate in the middle of the two-year evaluation period. Not only do movers result in losses from the sample, but there is evidence that these homes can drop out of the sample in a non-random way, resulting in attrition bias.1

Various short-term methods have been developed to circumvent these problems, ranging from variations on PRISM such as slash and burn,2 to very short-term approaches such as the STEM (Short-Term Energy Monitoring) method which is capable of providing results after only a few days using a combination of metered data and computer simulations. (See A New Method for Building Performance Audits, in HE, July/Aug '89, p.7). The furnace run-time method described in this article typically takes on the order of 5-10 weeks during the heating season.

Control groups allow evaluators to correct the analysis for savings which would have occurred in the absence of the weatherization work. Using them is a relatively simple task in principle. All that's needed is to collect more data from the houses in the pre-retrofit period to use as the control groups. One can collect 10 weeks of pre-retrofit data, instead of the usual five, then divide that 10-week period into dummy pre- and post-retrofit periods of five weeks each, and compute the savings. Assuming that the evaluation is an ongoing project, and weatherization is proceeding at a reasonably steady pace, the evaluator will be collecting control data during time periods parallel to the pre- and post-retrofit periods for homes being weatherized.

An Unobtrusive Measure It's Not

In behavioral research (and energy use in a home has a large behavioral component) unobtrusive measures--methods which measure behavior without the subject realizing he or she is being measured--are best. It is quite possible that this short-term evaluation method, in which the client is both the subject of the experiment and plays a major role in data acquisition, could itself affect the savings being measured. Given the very obtrusive nature of this research design, it is only prudent to question whether there might be a Hawthorne effect here.

The Hawthorne effect, named for the famous study in 1924 of workers at Western Electric's Hawthorne plant, found that subjects of an experiment may try to please researchers by doing what they believe is expected. Are clients on their conservation best behavior while data are being gathered? Mr. Jones liked the auditor and wanted the auditor to be pleased with the results of the weatherization. Is it possible that he is keeping the thermostat lower than he might otherwise until this little experiment is over? It's hard to tell for sure.

One way of dealing with the most obvious behavioral change is to ask Mr. Jones to supply the current temperature from the thermostat at the same time he reads the run-time each week. Again, the potential seriousness of the Hawthorne effect depends on the goals of the evaluation. If the object is to know whether the weatherization saved energy, one can probably ignore this. If the goal is measured program savings, and one is especially interested in the persistence of these savings, there may be cause for concern.