Thursday, January 22, 2015

A Consultant Criticizes NERC CIP

Last year, I wrote two posts (here and here) about what I see as a great sport engaged in by many in the press (and the consultants who egg them on): attacking the electric utility industry for real and imagined failings in their efforts to secure their infrastructure against cyber and physical attacks. I have now found another prime example of this sport, this time engaged in by a longtime practitioner, consultant Joe Weiss. I am referring to his recent blog post[i], which makes the case that the NERC CIP standards aren’t making the grid more secure or more reliable. More importantly, Mr. Weiss blames the industry for both developing and circumventing these standards.

I wish to say at the outset that I certainly don’t think all attacks on utilities for not having proper security in place are unjustified. And I certainly don’t think that attacks on the NERC CIP standards are unjustified; indeed, I think I’m listed in Guiness as the all-time leader in number of complaints about CIP version 5. But as I said in the two posts last year, the attacks need to be based on facts, and they need to make sense logically. Most of the points Mr. Weiss makes in his post don’t meet one or both of these criteria. Because these points are ones that have often been raised by others, and because they all have quite interesting implications, I will spend some time addressing all of them.

I also want to point out that Mr. Weiss bases his post in part on a doctoral thesis (publicly available and linked in the post) by Marlene Ladendorff, titled “The Effect of North American Electric Reliability Corporation Critical Infrastructure Protection Standards on Bulk Electric System Reliability”. Some of the “facts” cited by Mr. Weiss come from the thesis; others come from other sources (not all identified). I have not had the time to go through the thesis, so I will stipulate that Mr. Weiss has accurately represented Ms. Ladendorff’s findings.

...and eating it, too

My biggest problem with Mr. Weiss’ post is that he repeatedly tries to have his cake and eat it, too. That is, he bashes the utilities (or the CIP standards) for doing something, then turns around and bashes them for doing just the opposite. He is like the two ladies at a Catskills resort, in an old joke. The first says, “The food here is terrible.” The second says, “Yeah, and the portions are so small!”

1) The second paragraph of his post provides a perfect example of this. He says, “the exclusions in the NERC CIPs provide a road map to attackers as they identify what is in-scope, and just as important, what is out-of-scope and consequently not addressed.” Let’s break this down. First, he’s saying the CIP v5 bright-line criteria (for High or Medium impact assets) give attackers a “road map”. That is, they let them know what the most important assets are so they can presumably attack them. However, in the second part of the sentence Mr. Weiss complains about just the opposite. There, he says the criteria implicitly give attackers a list of assets that don’t meet these criteria, and are therefore not going to receive protection under CIP v5.

Do you see the problem here? He’s saying that attackers will use the BLC to find the best targets to attack (Highs and Mediums) – and will presumably attack them. But they’ll also use the BLC to find the targets that are easiest to attack (Lows - since the requirements that apply to them are much lighter) – and will also attack them. So the “road map” that NERC is giving to the attackers simply says, "Attack all BES assets!"[ii] Some road map.

2) Here’s a more important example. Mr. Weiss alludes at least three times to the fact that some entities literally removed routable connectivity (especially to substations) in order to reduce their compliance burden under CIP v1 – v3 (since Critical Assets that didn’t have external routable connectivity wouldn’t therefore have Critical Cyber Assets)[iii]. I don’t dispute this assertion at all; it is certainly true (although the number of entities that simply put off plans to implement routable connectivity was certainly much higher than the number that literally ripped it out). And it is also quite unfortunate, since there was probably some negative impact on reliability and security because of this practice.

However, later in the post he makes a completely different argument. He says that the requirements of NERC CIP (presumably v5) meant that “utilities with hundreds to thousands of substations will most likely connect their protective systems to external networks (usually over the Internet) to support a compliance requirement that can actually compromise security.” OK, so in the first case, CIP was bad because it gave utility companies an incentive to remove routable connectivity. Now it’s bad because it gives them an incentive to implement that connectivity! Can’t win for losin’, as they say.

3) A third example of having-your-cake-and-eating-it-too: Mr. Weiss complains “Depending on the cost of the fine compared to the cost to install NERC CIP compliance, some utilities have made the decision to pay the fine rather than make the security improvement.” I don’t doubt that there are some utilities who are doing just that, although I also doubt it’s very many and I’m sure in the long run it’s a very bad idea to do that.

Yet he later states, “Since the NERC CIP guidance requires anti-malware and anti-virus protection, some utilities are mandating protective relays to have malware protection even though adding this function will reduce the effectiveness and function of the relay.” So it seems these same utilities who are doing everything they can to avoid compliance are now going way overboard and actually jeopardizing their own operations by taking the requirements far too seriously[iv]! Now, that is devious. No wonder he’s outraged.

Other Items

Most of Mr. Weiss’ other arguments fall apart when you look at them closely:

1) Early in the post, he says “Electric distribution is excluded (majority of Smart Grid falls under this exclusion).” This is a common criticism of NERC CIP, from people who don’t know any better. But that doesn’t include Joe Weiss, so I’m surprised he’d say this. The CIP standards (and all the other NERC standards) only apply to the BES because that’s what FERC has authority over (of course, FERC’s authority is what makes the NERC standards more than just nice guidelines). Electric distribution is the domain of the state PUCs[v].

So what is Mr. Weiss advocating to fix this problem? Do we need to have a single central regulator for all electric generation, transmission and distribution? Lots of luck getting that through Congress. And should NERC and FERC just drop the idea of cyber security regulation altogether until this happens? At least then there would be consistency on both the BES and the Distribution sides: there would be no regulation at all.

2) Mr. Weiss cites an example from the thesis stating that “an exercise was cancelled by (a utility’s) compliance group, citing potential non-compliance issues with one of the CIP standards as the reason. The logic behind the compliance groups’ (sic) action was that if a potential weakness was found, it may (sic) need to be reported and the entity risked receiving a fine from NERC.” I know exactly what Mr. Weiss and Ms. Ladendorff are talking about, and I agree there are probably at least a few legal departments at utilities who take this attitude: we don’t want to find out what we’re doing wrong, because then we’d have to report it.

On the other hand, this is a very short-sighted strategy, not only from a cyber security but from a legal / compliance point of view. If an entity is out of compliance with a NERC requirement (not just CIP, of course), they need to self-report it immediately. If they don’t, and the NERC Regional Entity discovers this lack of compliance (either through an audit or perhaps as part of an Investigation), things will go much worse for the entity than if they had reported it in the first place. By deliberately not allowing non-compliance to be discovered, this legal team is setting their employer up for a much bigger fall further down the road.

I haven’t personally heard of any case where something like this has happened, although I certainly don’t dispute that it may have. This is certainly a strike against the NERC CIP standards, but it is also a strike against any mandatory regulations of any sort. If an entity has to report when it finds itself to be in violation of any regulation, there will always be a few misguided lawyers who think it’s in the entity’s interest not to know about a violation in the first place. This is an argument against any sort of regulation (or laws, for that matter. If I think I’ve misrepresented something on my taxes, should I investigate to find out if that is really the case -at the risk of then having to revise my filing - or should I not bother to look further and hope the IRS doesn’t either? I don’t have a ready answer for that question, but please don’t tell the IRS that); it is not an indictment of NERC CIP in particular.

3) Mr. Weiss summarizes some other examples from the thesis by saying “’some of the transmission owners….are gaming the system in order to prevent the application of the CIP standards.’ To accomplish this, some companies modified their networks to avoid compliance issues with CIP-003 through CIP-009.[vi]”

This sounds particularly devious, doesn’t it? TO’s are modifying their networks to avoid CIP compliance issues! Hmmm…I thought that was what compliance was all about. For example, the standards say (by implication) that your control network(s) shouldn’t be directly connected to your corporate network – so you modify the network by breaking that connection. Is that a bad thing?[vii]

4) Mr. Weiss states (again referring to the thesis), “Participant 2 in her study found that a company had the most sophisticated network protection he had seen. However, NERC staff reviewed their architecture and wanted them to tear it out. It took the company 6 months to convince NERC that this was the best protection they could do for the control systems the company was operating.”

Here, it seems the NERC staff was getting a little carried away in their zeal to enforce strict compliance with the letter of the requirements, and was trying to get an entity to remove a network protection scheme that was the best that could be implemented under the circumstances. This of course is unfortunate, but clearly neither the utility nor NERC can be accused of lack of zeal for doing the right thing in this case. What fault there is seems to be in the CIP standards, and there the fault is that they are too prescriptive. I completely agree they are too prescriptive, but nothing in this quotation squares with the general tenor of Mr. Weiss’ post – namely, that NERC, the utilities, and the CIP standards themselves aren’t doing anything to increase security.

5) Mr. Weiss complains early on that “the ‘brightline’ criteria exclude smaller facilities.” The BLC apply to all BES facilities, as High, Medium or Low impact. I believe what he is trying to say is that the Low impact requirements aren’t rigorous enough for his tastes; if so, he certainly wouldn’t be the first to feel that way. But he needs to say it explicitly, and also say what would be an adequate set of requirements for Low facilities, consonant with the idea that we can’t devote the entire GNP to complying with NERC CIP.

6) There is one paragraph of the post that I simply don’t understand: “Another example of the inconsistency of the NERC CIP guidance is that when it comes to grid reliability (sic) is the use of ‘black start’ facilities. Black Start facilities are those necessary to restart the grid after a complete grid outage. This function is considered critical by grid planning and operations organizations as well as organizations within NERC. During the review of the NERC CIP Revision 5 process, ISO New England raised a concern that adopting a new requirement for specific controls for Low Impact assets could have unintended consequences, such as the withdrawal of black start resources. This would make the grid less reliable.”

What is Mr. Weiss trying to say here? I at first thought he was saying it was bad that blackstart facilities had been removed as Medium (and made Low) impact in the BLC. But it now seems to me that he may not know that they were removed (even though that happened three years ago, during the drafting process), and he seems to be arguing that forcing blackstart assets to meet Medium requirements means that more will be withdrawn, thus negatively impacting “reliability” (although not having blackstarts doesn’t actually impact reliability, since blackstarts don’t prevent outages. It does impact resiliency, since blackstarts are needed to rapidly recover from a widespread outage).

And if Mr. Weiss does know that blackstarts were removed from the Medium criteria (as I said, the wording is ambiguous) and made Lows, then I don't understand his reporting of what the New England ISO supposedly said: that placing too onerous requirements on Lows means that blackstarts will be withdrawn. The way CIP v5 works now, every BES asset (with at least one BES Cyber System) is in scope as either High, Medium or Low. If the Low requirements prove too onerous for blackstarts, then they will have to be removed for all Low assets - meaning we'll go back to just the Low requirement in the original CIP v5 (which FERC was so unhappy with): there must be four policies in place at each Low asset. Is this what Mr. Weiss is advocating?

7) Mr. Weiss states, “Some of the security hardware can affect control system performance. A NERC report identified that a device locking tool used to meet NERC CIP requirements caused a disturbance that resulted in the loss of SCADA services. This is obviously making the grid less reliable and secure.” What is this saying? It seems to be that some device manufacturer developed a device locking tool that actually had negative effects. OK, whose fault is this? The utility’s? NERC’s? The CIP standards’? It seems to me he should file his complaint with the company that made the device.

Alternatively, whatever requirement the locking tool was addressing could just be removed from the standards, along with every other requirement that might possibly lead to implementation of measures that could cause a "disturbance". This would probably result in 10-20% of the CIP v5 standards being removed. Is this what Mr. Weiss wants?

8) Mr. Weiss’ concluding argument states, “Perhaps the most important point is there have already been four major cyber-related electric outages in the US (more than 90,000 customers). If the NERC CIPs were fully implemented, they would not have prevented any of these outages.” First off, I would very much like to hear about these four outages. I certainly never have heard of them before, and Mr. Weiss doesn’t point to any further information.

Second, once Mr. Weiss has given us information on these outages, I would like to know how he draws his conclusion that NERC CIP wouldn’t have prevented these outages. Of course, when he says the outages are “cyber-related”, he’s not necessarily saying these were the results of actual cyber attacks or malware. For that matter, the 2003 Northeast blackout had a couple “cyber-related” causes that NERC CIP wouldn’t have prevented either. This certainly doesn’t mean that CIP is ineffective.

Summing Up

You might get the idea that the only thing I like about Joe Weiss’ post is the font it appears in. Believe it or not, I regard the post as a flawed one that could actually have had some validity. He makes some perfectly legitimate points about entities removing connectivity to avoid having to comply, about Legal departments not wanting to see any evidence of non-compliance, about Distribution not being included, etc. But in his zeal to strike out against NERC, most utilities, and above all the CIP standards, he has simply thrown any and all arguments that come to mind into a single pot, with the hope that they’ll magically form a coherent stew. They don’t.

Note 1/23: This post originally had a sentence mentioning Senator Joe McCarthy. I realized this morning that, while it was not my intention to compare Mr. Weiss to McCarthy and the wording didn't state that, some readers might have drawn that inference. I sincerely apologize to Mr. Weiss for having included that sentence in the first place.Note 1/25: I just modified the section marked "6)" above. It tries to make sense of Mr. Weiss' paragraph regarding blackstarts. When I wrote it, the only possible interpretation I could see was that Mr. Weiss didn't know blackstarts were no longer included in the criteria for Medium impact. However, I just realized this may not be the case, and Mr. Weiss was actually arguing for lesser requirements on Low impact assets. That doesn't make sense either (especially with what I have heard to be his opinion on the Low requirements), but I want to show I considered that possibility as well.

The views and opinions expressed here are my own and don’t necessarily represent the views or opinions of Honeywell.

[ii] Of course, the criteria don’t list individual assets, nor do NERC or the regions publish such lists; the attackers will presumably have to go elsewhere to find out where to direct their attacks.

[iii] Quoting Mr. Weiss, who quotes the thesis, “Some entities were trying so hard to keep equipment out of scope that they spent money to ‘rip out fiber and CAT-5 [networking cable] and replaced it with serial [cable] to get away from routable protocols’ that would have brought networks into the compliance scope. Entities calculated that it would be cheaper to replace fiber and CAT-5 network cable with serial cable in order to remove equipment from the CIPs scope. Doing so eliminated the requirement to comply with CIP standards for those networks and equipment.”

[iv] CIP v5 makes it very clear that there is no requirement to load anti-malware software on a device that isn’t capable of loading or using it. In fact, in v5 the entity doesn’t have to take a Technical Feasibility Exception for this, as they did in v3.

[v] Actually, the PUCs only have authority over the IOU’s in their states, not the coops and municipals. So you could say that nobody regulates those entities, other than presumably their members or citizens.

[vi] The sentence in single quotes is presumably from the thesis. The second sentence is presumably Mr. Weiss’s.

[vii] There theoretically could be network modifications that might be taken to serve no purpose other than avoiding having to comply. But Mr. Weiss doesn’t say that is the case, and my brief review of the other examples in the thesis that he cites didn’t turn up any such modifications other than two cases which he addresses separately (and which I also discuss in this post). However, my point remains: entities are supposed to modify their networks to comply with the CIP standards. There is nothing at all sinister about modifications per se.