Solar Power Reliability and Balance-of-System Designs

Tulsa, Oklahoma, USA --
In order to compete more effectively with other energy sources, the solar industry is focused on decreasing the levelized cost of energy (LCOE), a term that refers to the price at which solar energy is valued taking into account all the lifetime costs of the solar power system. This includes the cost of the initial investment, the cost of capital, the cost of system operations and maintenance and repair costs. While there is much visibility on bringing down the purchase cost of solar cell technologies, the cost of maintenance and repairs represents the major variable cost over the lifetime of a photovoltaic (PV) system. In fact, although PV balance-of-system (BOS) components (inverters, trackers, junction boxes, combiners and transformers) represent only about 10 percent of system costs, they have historically been responsible for up to 70 percent of system failures.1,2,3,4 Fixed cost and downtime associated with these failures can have a significant negative impact on solar power economics.

One of the most effective things that BOS component providers can do to help bring down the LCOE is to increase the reliability of their devices in solar plant applications. Such a program needs to start with an analysis of where failures can occur.

Sources and Likelihood of Failure

Some pioneering work to improve reliability was done by PV Powered Inc. in partnership with Boeing on a project funded by the U.S. Department of Energy (DOE) under the Solar America Initiative (SAI). As part of the project, a Boeing engineer performed a system-level failure analysis of a 10 MW plant. A mathematical model of the whole plant was constructed using reliability data provided by the component manufacturers for each subsystem (inverters, disconnects, fuses and so on).5 The model takes into account the estimated probability of failure of all components and the estimated cost and time to make repairs when a component fails.

Time to repair is a key element of the equation because it is directly related to calculating the amount of energy lost during the outage. In some cases, the cost of lost energy production can far outweigh the cost of the failed components.

It is well known in reliability engineering that equipment typically goes through three phases during its fielded lifetime. A complete system-level failure analysis needs to take into consideration multiple failure types. For example, one type of failure is "infant mortality" for which the probability of failure–as the name implies–decreases with time as shown in the traditional "bathtub curve" (see Figure 2). Random failures, such as those due to lightning strikes, have the same probability of occurring at any time in the system's life. Finally, the probability of wearout failures, such as those due to contact oxidation or moving parts wearing out, increases with time.

Because the probability of failure of the system as a whole increases with each component, the number of parts in the system is a major variable contributing to system reliability. It stands to reason that, all other things being equal, systems with fewer components will be more reliable than those with more components. And in this regard, there's a balance between adding devices that can contribute to increasing the energy harvest and the contribution that the additional devices make to increasing the chance of failure. For example, DC-DC converters placed between the arrays and the inverters are being promoted as a way to increase power output from the arrays. But the additional system components must be factored into the reliability equation. A similar observation can be made regarding solar tracking systems that adjust the arrays for maximum irradiation.

For their part, inverter manufactures have different design philosophies resulting in different component makeups and hence different reliability profiles for their products. Therefore, just as developing a reliability model for a solar power system involves taking into account the reliability profile of each system element, calculating the reliability of a particular system element (such as a solar inverter) requires constructing a reliability model of each of its components. Some components may be specially selected to withstand harsh environments and some may incorporate redundancy so that a failure will not affect the operation of the system as a whole.

The operating environment that solar equipment is typically subjected to poses particular challenges to component reliability. To accurately predict component stresses and associated wear-out mechanisms that solar system electronics experience due to natural temperature cycles, a complex time-dependent thermal modeling approach is required. This type of modeling allows component temperature changes to be simulated over a long time period in a particular environment.

To accomplish this, factors such as solar heating, conduction and convection to and from each component must be considered. Any active cooling system control parameters must also be considered since they can affect component temperatures by, for example, turning on a fan. Therefore, many component temperatures do not track ambient conditions, but instead follow a more complex pattern that is a function of ambient temperature changes, geometry of the inverter power profile and cooling control system setpoints. There are a number of methods that can be used to perform thermal simulations. The preferred method at PV Powered is a custom Matlab program that solves the heat transfer equations (convection, conduction and heating rate) given an input file containing a set of component properties, thermal interaction parameters and cooling control law parameters.

In calculating long-term component reliability under changing temperature conditions, simpler constant-hazard-rate and mean-time-between-failure (MTBF) calculations that might apply in other situations are not applicable. More advanced techniques such as cumulative damage modeling can be used, however.

Improving Inverter Reliability

Because of the complexity of the problem and the need to optimize the results, PV Powered started with a clean sheet of paper in designing its commercial and utility-scale inverters. The resulting designs employ between 30 percent and 50 percent fewer components compared to other inverter designs. With fewer components to fail the projected uptime of the system can be extended accordingly.

Another contributor to the PV Powered inverters' reliability is a redundant cooling system. The design principle behind this is to use a single airflow source with redundant fans that allow the inverter to continue to operate at full power if one should fail. Variable fan speed control is used to deliver the necessary amount of cooling air to assure long-term component reliability, but no more, thus minimizing fan power and maximizing fan lifetime. Fan speed, energy use and temperatures are remotely sensed and alerts and faults are generated if problem conditions occur. High capacity air intake filters are used to keep entire inverter clean.

Protection against random events that affect solar system reliability is also a design consideration. For example, lightning strikes have been found to be a significant cause of inverter damage. PV Powered exceeds recommended lightning protection measures internal to the inverter by incorporating dual redundancy and failure detection. But lightning protection needs to be approached from a system perspective and requires much more than just internal protection. Standards such as IEC 62305 can be used as a guide for establishing system-level protection of PV systems from lightning strikes.

Validating System Reliability

The theoretical reliability projections for solar system components can be validated through testing. Two types of testing are performed at PV Powered: Accelerated life testing of designs can improve reliability as part of the qualification process and stress screening is performed during production testing to screen infant mortality defects. Accelerated life testing is performed by operating inverters at elevated stress levels chosen to rapidly and quantifiably accelerate degradation mechanisms. Stress screening, also called burn-in, is performed at maximum inverter operating temperatures under careful monitoring for abnormalities.

The most important thing that solar power component and system designers can do is understand potential failure modes that can arise so that designs may be produced that minimize the chances of those failures occurring. And by increasing the reliability of solar power systems, we're able to decrease system downtime, which is a major contributor to the levelized cost of energy.

L. M. Moore, and H. N. Post, "Five Years of Operating Experience at a Large, Utility-scale Photovoltaic Generating Plant," Progress in Photovoltaics: Research and Applications, 2007

Russell W. Morris, and John M. Fife, "Using Probabilistic Methods to Define Reliability Requirements for High Power Inverters," Proceedings of the SPIE vol. 7412 74120G-2, 2009

Dr. J. Michael Fife is director of Reliability for PV Powered. He has more than 15 years of experience in technology development and failure analysis. His experiences include managing engineer at Exponent Inc. and aerospace research engineering roles at the Air Force Research Laboratory Electric Propulsion Group and NASA Dryden Flight Research Center. He is a licensed professional engineer and holds M.S. and Ph.D. degrees in Aeronautics and Astronautics from MIT and a B.S. from Texas A&M University.

Add Your Comments

This magazine is no longer being published as of May 1, 2012. To subscribe to similar renewable energy content click here. Or, subscribe to our worldwide Renewable Energy World magazine digital edition here. From May 2012, Renewable Energy...