ABSTRACTAn opportunity is available for using home energyconsumption and building description data to develop astandardized accuracy test for residential energy analysistools. That is, to test the ability of uncalibrated simulationsto match real utility bills. Empirical data collected fromaround the United States have been translated into auniform Home Performance Extensible Markup Languageformat that may enable software developers to createtranslators to their input schemes for efficient access to thedata. This may facilitate the possibility of modeling manyhomes expediently, and thus implementing softwareaccuracy test cases by applying the translated data. Thispaper describes progress toward, and issues related to,developing a usable, standardized, empirical data-basedsoftware accuracy test suite.INTRODUCTIONBackground: Why We Are But Not Where, or, WhereWe Are But Not Why?Software accuracy tests play a vital role in the continuousimprovement of residential building energy analysis[Judkoff and Neymark 2006, Judkoff et al. 2010, Polly et al2011, RESNET 2006]. Historically, established softwareaccuracy tests are based on the Building Energy Simulationand Diagnostic Test (BESTEST) methodology [Judkoff andNeymark 2006, ASHRAE 2009]. These types of tests areincluded in ANSJ/ASHRAE Standard 140, Method of Testfor the Evaluation of Building Energy Analysis ComputerPrograms [ASHRAE 2011], and comprise idealized testsuites where programs are compared to each other and/or toanalytical or quasi-analytical solutions. Suchdeterministically oriented test cases work well for findingand diagnosing software errors; however, without directcomparisons to empirical data there is no physical truthstandard of comparison with respect to overall accuracy.So, BESTEST can tell us "why we are" (or at least helpdiagnose why we are having errors), but cannot evaluatetrue accuracy relative to how a real building performs asbuilt and as occupied.A carefully conceived laboratory-based empiricalvalidation study can provide both prediction accuracytesting and diagnostic capability, i.e., it addresses both the"where" and the "why." However, such procedures havebeen developed with only limited success. This is becausesuch tests are an order of magnitude more expensive toThis report is available at no cost from theNational Renewable Energy Laboratory (NREL)at www.nrel.gov/publications.

develop than BESTEST-type tests, requiring substantialdedicated multi-year funding. Because of the expense ofconstructing facilities, such tests can be accomplished inonly a limited number of climates and configurations. Also,many previously published empirical validation studiesfailed to empirically determine fundamental inputs (inaddition to the outputs), and therefore can containsubstantial bias errors [Neymark et al. 2005].Proposed new test cases with measured audit (notlaboratory) data for multiple buildings, applying astochastic approach, provide an as-built, as-occupiedenergy-use target, but not much precision. Figure 1illustrates a preliminary example of the type of accuracyobservable with current data. The blue solid line and theblue dashed lines represent perfect agreement and +40%disagreement between predicted and measured data,respectively. Here we can discern some signal (correlationof predicted versus measured energy consumption) fromthe noise (data scatter related to bias and random error, e.g.,occupant behavior). This type of test suite addresses the"where we are, but not why." That is, we see how well wecan hit the target, but when disagreement betweenpredictions and measured data occurs, there is only limiteddiagnostic capability based on statistical analysis foridentifying causes of disagreements.The remainder of the paper describes development of thenew empirical data-based software accuracy test.Predicted v. Measured Natural Gas Use Data Set Y40003500 *