Thanks to super-charged networks like the US Department of Energy's ESnet and the consortium known as Internet2, scientists crunching huge bodies of data finally have 10Gbps pipes at the ready to zap that information to their peers anywhere in the world. But what happens when firewalls and other security devices torpedo those blazing speeds?

That's what Joe Breen, assistant director of networking at the University of Utah's Center for High Performance Computing, asked two years ago as he diagnosed the barriers he found on his organization's $262,500-per-year Internet2 backbone connection. The network—used to funnel the raw data used in astronomy, high-energy physics, and genomics—boasted a 10Gbps connection, enough bandwidth in theory to share a terabyte's worth of information in 20 minutes. But there was a problem: "stateful" firewalls—the security appliances administrators use to monitor packets entering and exiting a network and to block those deemed malicious—brought maximum speeds down to just 500Mbps. In fact, it wasn't uncommon for the network to drop all the way to 200Mbps. The degradation was even worse when transfers used IPv6, the next-generation Internet protocol.

"You're impacting work at that point," Breen remembers thinking at the time. "So when you're trying to transport 200 gigabytes up to a terabyte of data, or even several terabytes of data, you can't do it. It becomes faster to FedEx the science than it does to transport it over the network, and we'd like to see the network actually used."

With technologies developed or funded by the National Energy Research Scientific Computing Center, ESnet, the National Science Foundation, and others, the University of Utah set out to find a new security design that wouldn't put a crimp on bandwidth. Called "Science DMZs," the architecture puts the routers and storage systems used in data-intensive computing systems into a "demilitarized zone" that is outside the network firewall and beyond the reach of many of the intrusion detection systems (IDSes) protecting the rest of the campus network.

Unconstrained bandwidth

"What we're trying to do with the Science DMZ concept is formalize the idea of: secure your campus, secure your student systems, secure your dorm networks, everything that you need to run the business of your network or your institution," said Chris Robb, director of operations and engineering at Internet2, an alternative Internet maintained by a consortium of universities, governmental organizations and private companies. "Lock that down as much as possible, but for the love of God, give your researchers access to unconstrained bandwidth."

The idea is simple. Move the gear storing and moving data as close as possible to the network edge, preferably into the data center itself. Unplug stateful firewalls and in-line IDSes. And install devices that give detailed information about the rate of data flows traversing the system so any bottlenecks that develop can be diagnosed and fixed quickly.

It may seem counterintuitive at first to run high-performance computing systems outside the firewall. It's tempting to compare the idea to medieval warfare in which the equipment, archers, and other most-prized assets are kept outside their castle walls—a bad idea. But frequently, the threats facing high-bandwidth systems carrying gigabytes of data concerning the Bolivian tree frog differ dramatically from those facing the point-of-sale terminals that process student credit cards. If a typical enterprise or medium-sized business network is a bundle of drinking straws, science networks are three or four firehoses. The idea of Science DMZs isn't to ignore security, but to adapt it to an environment that's free of e-mail, Web servers, and e-commerce applications.

To 10Gbps... and beyond

After rebuilding the University of Utah's high-performance computing (HPC) network over the past 18 months, Breen said that bandwidth has shown dramatic improvements. The system now achieves overall rates of 10Gbps, with single end-to-end connections regularly reaching 5Gbps. The university is in the process of transitioning to a 100Gbps network, and Breen estimates that lofty goal could be accomplished in the next 18 months.

Indiana University Chief Network Architect Matt Davy has also achieved similar results by following a similar path. He embarked on it more than ten years ago, before Science DMZs were even a part of the engineering vernacular.

The segmented subnetwork that hosts his university's HPC and high-bandwidth storage systems has its own external connection to Internet2. It has non-stateful firewalls that run mostly on Linux servers, and it also relies on access control lists to block IP addresses or port numbers observed to foster abuse. The system relies on Cisco Systems' NetFlow analysis tool to spot patterns of attack.

The network, which is able to completely fill its 10Gbps connection, puts a cluster of IDS devices into passive mode, as opposed to the much more common "in-line" mode (when data passes directly through them). A passive IDS can't monitor every packet, but it digests enough that it can quickly instruct routers to drop connections that are judged to be malicious. More recently, Davy's team improved the architecture by building custom-designed load balancers that run on the OpenFlow switching specification. Their solution works with an IDS cluster of 16 10Gbps-connected servers.

"We greatly increased our ability to catch that traffic and analyze it," he said. "If you normally make all your traffic flow through the intrusion detection system, that single box has to scale up to that level, whereas we're using a passive system that's offline and then we're clustering it. So we can have a whole cluster of servers analyzing that data so we can scale it up."

During a recent proof-of-concept demonstration, he used the architecture to achieve 60Gbps data flows that traveled from Seattle to Indiana University's Bloomington data center. The university is also in the process of upgrading to a 100Gbps connection.

This is very interesting. I work in Telecommunications but only with small rural providers who generally speaking, don't do something like place a firewall in front of all customers. However, I recently learned that a local cable ISP does indeed have a firewall between their subscribers and the public net.

Does anyone have any insight as to how much bandwidth traditional firewalls scale to and what sort of magnitudes of cost this leads to?

With virtualized firewall platforms coming, I fail to see how this gains us much...other than another potentially exploitable path.

Who maintains these "automated scripts?" How do they respond to zero day exploits in accessible services? Who runs and maintains these policies and procedures? Do we really want to be paying scientists to do our network eng/ops guy's jobs?

Yes, OpenFlow/SDNs can do some amazing things, but effectively you'll still need a network across which you actually send data. How are we going to define what science (tree frog example) is okay, versus what is not (say, nucler fallout dispersion modeling).

I've tested some commercial devices (IDS/IPS, Stateful firewalling, and DPI/filtering all-in-one) that were pushing 9+Gbps real throughput. Look at some of these new platforms, and you'll be able to exceed that quite easily. In other words, sometimes it *is* just a better idea to increase throughput on existing systems rather than parallelizing paths, tasks, and administrative domains.

I was hoping for something bit more interesting than "take out the firewall". Meanwhile, stateful packet inspection is enormous overkill for something like this. Given that they likely have a small set of known IP's this traffic is going to, all that is needed is a simple IP filter, which can typically be implemented at wire speed in a Cisco MSFC or equivalent.

People throw stateful inspection at everything without considering whether that type of design (and the bookkeeping overhead and resulting throughput limitations) is truly necessary.

The OpenFlow-based IDS setup has me more interested, as until very recently I'd found it very difficult to scale intrusion detection solutions above a couple gigabits. The devices I've worked with didn't handle load balancing traffic across IDS devices well at all.

It looks to me like the people quoted in the article doing this stuff are net ops.

We're a long way off from having qualified SDN folks in every-day network ops positions. It's an entirely new skillset that has to be built on top of the knowledge of networking fundamentals.

Quote:

Were they available 2 years ago? Will they scale up to 100 Gbps in the next 18 months? What kind of cost will be associated with these devices compared to monitoring a science DMZ?

There were coming of age (read: under development) two years ago. As for 100G 18 months? I'd say yes...check out crossbeam. It may need to be LAGs rather than native 100G interfaces, but they won't be too terribly far behind.

Quote:

It just seems odd to see you so dismissive about the concept altogether.

I'm somewhat dismissive of anything that proposes bypassing 20 years of data security concepts. Honestly, SDN has a *lot* of promise, but going backwards to simple ACLs and just assuming SDN will solve all of these throughput problems seems, at best, handwavy. Simply because you can (or may be able to) do something, doesn't mean it's necessarily a good idea.

The network, which is able to completely fill its 10Gbps connection, puts a cluster IDS devices into passive mode

Sounds very similar to the "Great Firewall of China".

This is not really "passive mode" in the traditional sense. What it really is that instead of using the IDS inline where its hardware/software limitations impact the flow of traffic, you simply copy packets with a span port to a passive listening port on the IDS and then through a second dedicated interface on an IDS, you send snmp or other control messages to routers and firewalls in response to a detected attack, telling them to black hole certain traffic (route to null0) or drop it by dynamically adding a firewall rule or acl statement, terminate the connection with a reset packet, or some other response that is configurable on the IDS and the router/firewall.

The GFoC is basically a big IDS that mostly is looking for text strings that the government wants to block.

Network perimeter security has always been a workaround. Fixing applications is the right way. Apparently part of what these folks do is exactly that, only the emphasis is on the permeter defense part in the article.

It is interesting to note a trend of less and less control over the traffic. First we had bastion hosts. Then we had real firewalls with application level gateways. While everyone with a little clue on the topic knew that to deliver at least some of the promises of the firewall, they should go high in terms of OSI layers, the race turned to the direction of bandwith. First stateful packet filters, then IDS/IPS technology. Now we should be happy that these systems can detect some of the threats a decent firewall would never allow to happen in the first place.

It is also true that a really decent application level firewall have never been created. Even the best ones (like Zorp and Alf) are only capable to layer 5 or 6, and you really have to know what happens on your network to properly use them.

The network, which is able to completely fill its 10Gbps connection, puts a cluster IDS devices into passive mode

Sounds very similar to the "Great Firewall of China".

This is not really "passive mode" in the traditional sense. What it really is that instead of using the IDS inline where its hardware/software limitations impact the flow of traffic, you simply copy packets with a span port to a passive listening port on the IDS and then through a second dedicated interface on an IDS, you send snmp or other control messages to routers and firewalls in response to a detected attack, telling them to black hole certain traffic (route to null0) or drop it by dynamically adding a firewall rule or acl statement, terminate the connection with a reset packet, or some other response that is configurable on the IDS and the router/firewall.

The GFoC is basically a big IDS that mostly is looking for text strings that the government wants to block.

"This is not really "passive mode" in the traditional sense." - are you talking about the GFoC or the systems described in the article?

This seems like a completely moot (and insecure) idea. Anything based on latest-gen ATCA (CheckPoint 61k, to-be-announced juniper SRX 5800 replacement / new xbeam chassis) will all be capable of 1000gbps with simple stateful filters. 100gbps is a joke for this class of hardware. I'm not sure if this is the 'budget' way of coping with this amount of traffic "securely", but it doesn't seem to be that inexpensive.

With virtualized firewall platforms coming, I fail to see how this gains us much...other than another potentially exploitable path.

Who maintains these "automated scripts?" How do they respond to zero day exploits in accessible services? Who runs and maintains these policies and procedures? Do we really want to be paying scientists to do our network eng/ops guy's jobs?

Yes, OpenFlow/SDNs can do some amazing things, but effectively you'll still need a network across which you actually send data. How are we going to define what science (tree frog example) is okay, versus what is not (say, nucler fallout dispersion modeling).

I've tested some commercial devices (IDS/IPS, Stateful firewalling, and DPI/filtering all-in-one) that were pushing 9+Gbps real throughput. Look at some of these new platforms, and you'll be able to exceed that quite easily. In other words, sometimes it *is* just a better idea to increase throughput on existing systems rather than parallelizing paths, tasks, and administrative domains.

All good points...I feel like this allows them to start small and scale without the huge capital outlay in clustered security solutions/ load balancing appliances today...

All good points...I feel like this allows them to start small and scale without the huge capital outlay in clustered security solutions/ load balancing appliances today...

Yeah, the hardware to handle these kinds of flows are crazy expensive. My work has an F5 unit that can handle 20Gb/s of stateful DPI, while handling 450k/s of HTTPS connections. It's a Firewall+DPI+IDS+router+web-reverse-proxy all-in-one. Not cheap.

This looks like it's just an artifact of EDU IT where IT is extremely far away from the $$. Uni's have the tendency to operate as a loosely federated grouping of small business since the dollars are typically directed straight to professors from grants. The individual schools and the university has pretty limited capabilities to redistribute them.

Individual professors generate don't want/feel the need to pay for IT infrastructure, often even if they're directly using it. It's even less common that they buy into subsidizing a scalable platform that can be shared between different research groups, like a high end FW. They typically don't have a grasp of security (why should they?) and the risk management aspect that plays into it.

IT will often still get beat up for performance or security issues.

This sounds mostly like someone who couldn't find the dollars for a real solution coming up with a cute name to just shove crap in front of the FWs to shut professors up and offering reactive controls if something goes wrong. The exact type of solution you don't typically want to trumpet .

With all due respect to Frennzy, it seems like starting over, away from all those assumptions, is a component of the exercise.

The fundamental hypothesis seems to be that the workflow, risk domain, and attack surface of science data interchange is sufficiently distinct from typical campus networking not to make the same assumptions.

I'd say it never hurts to do a careful reappraisal of such things. Adhering to established procedures is all well and good, but if those procedures were established in a different environment, some of them may not apply. And clearly, while Breen &co started with those common assumptions, they quickly identified specific problems with those assumptions in terms of the workload at hand. I'm just rephrasing the article here.

You have to balance the benefit of the experience of others in a variety of situations against your own judgment of the truly unique aspects of your own immediate scenario. Being able to do that is why the best administrators get the big bucks, amirite?

With all due respect to Frennzy, it seems like starting over, away from all those assumptions, is a component of the exercise.

The fundamental hypothesis seems to be that the workflow, risk domain, and attack surface of science data interchange is sufficiently distinct from typical campus networking not to make the same assumptions.

I'd say it never hurts to do a careful reappraisal of such things. Adhering to established procedures is all well and good, but if those procedures were established in a different environment, some of them may not apply. And clearly, while Breen &co started with those common assumptions, they quickly identified specific problems with those assumptions in terms of the workload at hand. I'm just rephrasing the article here.

You have to balance the benefit of the experience of others in a variety of situations against the truly unique aspects of your immediate scenario. Being able to do that is why the best administrators get the big bucks, amirite?

I'm totally fine with challenging all assumptions...that's how we got where we are today. For example, we used to assume that it was okay to send data unencrypted over carrier networks. We don't assume that today.

It's not simply a matter of blind adherence to policies...I can *already* see issues with their proposed design. I have also already proposed an alternative that meets their requirements (higher speed data sharing) *without* introducing unintended consequences, opening up additional attack surfaces, or creating an additional failure domain. My solution *also* doesn't require additional training, knowledge or skillsets that SDN will.

Fundamentally, edu IT needs to reconsider how they manage their data networks. It's not the wild wooly west out there any more...channel some of those record tuition increases into building out a robust, secure architecture instead of trying to workaround the systems that have been put into place after much consideration and effort.

When discussing the kinds of data flowing over research High Performance Computing (HPC) networks such as those at TeraGrid (now or soon to be XSEDE) and Open Science Grid, the network and security engineers work with the researchers to accurately determine the requirements that the network must meet in order to satisfy the business need (which in this case is research and development of some sort). Not speaking for any site in particular, it isn't terribly uncommon for the higher end HPC organizations demand the ability to pass multi-gigabit speeds in a single flow and have a high sensitivity to what would normally be consider low latency.

Given these requirements, the virtualized platforms buy us little, if anything, since the virtualized platforms depend on 'normal' network traffic patterns (many users, large numbers of flows, no single flow passing more than 10's or possibly low 100's of Megabytes of data per second. I've been involved in tests of some of the devices mentioned by NetworkNubbin, and they haven't yet met the requirements because the traffic pattern within and between HPC networks deviate so wildly from the norm. I've witnessed flows among HPC networks that approach 10Gb/s in a single flow -- the idea that an inline IDS/IPS/IDP solution can truly examine that a line rate is absurd right now -- and I suspect when it is no longer absurd the requirement will be 100Gb/s. I'm not claiming Bro can analyze line-rate 10Gb/s individual flows, in fact they are often ignored based on the rules a local site configures in Bro, but since the boxes aren't inline, they presumably could analyze the traffic offline over time without affecting the traffic as it moves along the wire from source to destination.

My point here is that the depth of analysis that is possible with a Bro Cluster on a passive network tap rivals that of so-called "Next Generation Firewalls", but Bro keeps analyzing long after such inline systems fall over and/or increase latency to an unacceptable level.

The obvious downside to passive approaches such as the one discussed in this article is that significant data exfiltration can occur before a black is triggered and then put into affect. As an example: Adding BGP routes as described in the article takes time, sometimes up to 60 seconds, before they take affect (I've generally seen this implemented as a single (or pair for HA) blackhole router that advertises itself as accepting the 'bad' routes to the border devices, and generally uRPF must also be enabled on the border devices to cause ingress traffic to be blocked - otherwise BGP routes will simply blackhole egress traffic and allow the ingress traffic right through). Unfortunately, assuming 40Gb/s of capacity, that means up to 300 Gigabytes (GB with a big B) of data can be exfiltrated between the time a passive IDS system sets a BGP null route and the time the block actually takes affect on the exit routers. Certainly there are ways to speed up this process and there are methods to reduce the problem (crafting a RST packet to reset the connection, for example, though that doesn't work for UDP traffic), so some data exfiltration can occur and must be an acceptable risk and/or have other compensating controls to manage this vulnerability.

Also, with regard to costs, my experience with Bro Clusters and SRX5800-like devices has shown that a Bro cluster is significantly (one or more orders of magnitude) less expensive, and Bro does dynamic protocol detection (similar to Palo Alto systems) rather than depending on ports.

Last, diamondsw, you may want to have a look at an Educause presentation by Keith Lehigh and Ali Khalfan (I'm not sure if links are allowed here, so you'll have to google). It specifically discussed a Bro installation at Indiana University that used Openflow switches to load balance traffic flowing to different nodes in the Bro cluster.

No disrespect intended here, but I did note that I was referring to 'latest-gen' ATCA, which the SRX 5800 is not. The only one on the market right now is the CP 61000, and that can absolutely handle 10gbps flows extremely easily, even with IPS inspection going on. This is of course assuming the policy is tuned correctly, and not simply created without knowing how the underlying technology works, which is generally what happens.

What we were initially touching on was not even related to IDS/IPS, it was the stateless firewall filters that were really bothering us. One generally may not even *need* inline IPS functionality if you've got a stateful firewall configured correctly. Stateless filters are terrifyingly insecure.

What you are saying regarding price doesn't surprise me though, as that is what I expected.

Also, it seems as though you're thinking HPC is in a realm of it's own. Financial institutions (especially onsite STX DCs) have to deal with significantly more strict/higher-performance requirements than those posed in the article, and they can cope just fine with 'normal' tech.

No offense taken, I didn't understand you were talking about post-5800 type devices... I've largely left the HPC world these days, so I haven't evaluated or even thought about the CP 61000 beyond downloading and reviewing the data sheet a few seconds ago. It seems like an fairly impressive system, but I suspect it costs significantly more than a Bro solution and may not provide significant benefits when other security controls are taken into account.

My point here wasn't to push Bro or slam vendor IDS/IPS, it was more to point out that the HPC/Research folks are pushing the cutting edge of what's possible in networking, and network security functionality often can't keep up.

I recognize the stateless firewall filter issues, and that is a clear weakness, but it's one that a good IDS can detect and good network engineers can manage if the IDS intelligently reconstructs and analyzes a traffic stream.

A solution that removes a stateful firewall is certainly non-traditional, but a non-traditional approach is increasingly necessary. HigherEd has been dealing with what is now being termed "BYOD" (I hate that acronym) for some time now, and many of us recognize that border security involving inline devices on really fat pipes are increasingly difficult to cost justify for the research portions of our networks.

As an example, which yeilds a better net result:

A) Pay for a firewall/IDS/IPS product capable of 60Gb/s line rate filtering.

B) Pay for a Bro cluster, a Collective Intelligence Framework (CIF) cluster that we can be used to collect and feed actionable security data into Bro and other tools, install OSSEC on the hosts and centrally log data back to a database that's indexed with elasticsearch and use data from OSSEC to further feed still more data into the CIF cluster, and site license BigFix so we can do endpoint management correctly?

In my experience, option B can come in for as little as half the price of A when discussing hardware. Personnel costs are higher with B on the security side, but those increased costs are often offset in part or in whole by the increased efficiencies realized by doing good endpoint management, making B a much better wholistic (though non-traditional) approach.

I have a question about networking and transfer speed: If data moves at about the speed of light, how come speeds aren't always insanely fast like they are working with here? I imagine it has something to do with the transfer medium. (first-year CS student who hasn't gotten into the heavy-duty stuff yet) What do they do to make data transfer faster?

I have a question about networking and transfer speed: If data moves at about the speed of light, how come speeds aren't always insanely fast like they are working with here? I imagine it has something to do with the transfer medium. (first-year CS student who hasn't gotten into the heavy-duty stuff yet) What do they do to make data transfer faster?

Signals (bits) travel through typical network media at around .5 to .7c, depending on whether it's fiber or copper. However, it's not a 1-1 correlation with how fast data travels through the same media. That depends on your signaling rate and encoding mechanisms.

Think of it this way...I can hook up a 10Mbit interface to a Cat5 cable, and then do the same on the other end of that cable. When I send data, the signals travel down that wire at about 65% of the speed of light in a vacuum. I can send somewhere around 7MBytes of data per second, depending on packet size distribution and protocol overhead. Now let's say I upgrade those two machines to have 1Gbit NICs. The signals (bits) still travel over that Cat5 cable at roughly 65% of the speed of light, but the signaling rate is much faster (125Mbaud per pair) and the encoding is different. So even though the physics haven't changed, the sheer amount of data I can transmit in the same amount of time...in many cases over 100MBytes of data per second.

Quote:

The obvious downside to passive approaches such as the one discussed in this article is that significant data exfiltration can occur before a black is triggered and then put into affect. As an example: Adding BGP routes as described in the article takes time, sometimes up to 60 seconds, before they take affect (I've generally seen this implemented as a single (or pair for HA) blackhole router that advertises itself as accepting the 'bad' routes to the border devices, and generally uRPF must also be enabled on the border devices to cause ingress traffic to be blocked - otherwise BGP routes will simply blackhole egress traffic and allow the ingress traffic right through). Unfortunately, assuming 40Gb/s of capacity, that means up to 300 Gigabytes (GB with a big B) of data can be exfiltrated between the time a passive IDS system sets a BGP null route and the time the block actually takes affect on the exit routers. Certainly there are ways to speed up this process and there are methods to reduce the problem (crafting a RST packet to reset the connection, for example, though that doesn't work for UDP traffic), so some data exfiltration can occur and must be an acceptable risk and/or have other compensating controls to manage this vulnerability.

And this is the handwavy stuff I'm referring to. Firstly, you can *easily* handle 10Gbit flows with todays hardware. 100G is on the near term roadmap. In every case, you'll almost certainly doing this with a tap (like the ones from Anue) that are completely configurable...because you don't want to start stacking up inline filters/tools. Any system worth its salt isn't going to simply rely on null-routing to prevent dataloss or C&C blocking...you'll simply start sending RST packets or actively reconfiguring the switch and/or router to just drop packets...L2 or L3, doesn't matter). You can't just say "well, we may lose up to 300GB of data before this SDN kicks its DLP in...so we may need to accept that or "have other compensating controls." <---That...that right there is the waving of hands...and it sounds way too much like "oh well, we'll burn that bridge when we come to it."

In short, "passive approach" isn't the issue here...it's the fact that you are duplicating effort, creating additional attack surfaces/loss points, and adding additional failure domains. Why? To save maybe 50% of CapEx? The devices we're talking about run in the $200-$400k range. So this "Bro" system can do this for half of that? Maybe saving us $100-$200k up front? Guess how much it's going to cost to pay people to run and manage those Bro instances (sysadmins, IDS/IPS experts, SDN experts/scripters, etc.)

Again, I don't doubt you CAN do it...I simply doubt that a true, rational analysis of this system would reveal any actual ROI/cost savings...and I *know* it will increase your problem count in the realms of attack surface, efficiency, and failure domains.

editing for additional thought: Keep in mind, you aren't going to get rid of your regular NetOps folks. You still have to rack and stack gear, configure IPs, setup VLANs, routing, etc. SDN doesn't replace any of that.

But let's walk through a real, fundamental problem.

SuparSDNScientist is busily generating and sending data. He's so good he's sending a constant stream of 60Gbps to the federal government so they can crunch his numbers on their MASSIVE PUTAR! This data is vital to the future of our economy, as it is from a mortgage collapse modeling experiment. All of a sudden, SuperSDNScientist gets a call from his contact in Washington DC, who is screaming "WHERE IS ALL THE DATA? I'M NOT GETTING ANY DATA! THE PRESIDENT DEMANDS HIS DATA!"

SuperSDNScientist thinks "Hm, that's odd...where's that grad student I've got who knows how all this stuff works? Oh, right he graduated. Well, who took over for him? Oh yeah, LiL Johnny." He then goes to LiL Johnny and demands to know where all them datas went.

Lil Johnny is pretty good with scripting, and knows a little bit about SDN. He checks some things, and realizes that there is a lot going on in this SDN setup that he doesn't understand. "Hell with it," he sez, "let's blame the network guys. I mean, this shit worked yesterday!"

So he goes to NetOps, who check around and can find nothing wrong. I mean, their shit hasn't changed since yesterday...but wait...what's this shit? Some sort of script seems to be fucking with our configs!

I'll just stop there and let you imagine the continued finger pointing and jackassery that will come out of this...meanwhile, that guy in Washington has lost his job, and SuperSDNScientist has lost his funding and tenure. All because they *thought* they knew how to effectively run and manage a network.

Quote:

The fundamental hypothesis seems to be that the workflow, risk domain, and attack surface of science data interchange is sufficiently distinct from typical campus networking not to make the same assumptions.

And this is the handwavy stuff I'm referring to. Firstly, you can *easily* handle 10Gbit flows with todays hardware. 100G is on the near term roadmap. In every case, you'll almost certainly doing this with a tap (like the ones from Anue) that are completely configurable...because you don't want to start stacking up inline filters/tools. Any system worth its salt isn't going to simply rely on null-routing to prevent dataloss or C&C blocking...you'll simply start sending RST packets or actively reconfiguring the switch and/or router to just drop packets...L2 or L3, doesn't matter). You can't just say "well, we may lose up to 300GB of data before this SDN kicks its DLP in...so we may need to accept that or "have other compensating controls." <---That...that right there is the waving of hands...and it sounds way too much like "oh well, we'll burn that bridge when we come to it."

Sounds like we're in agreement that inline tools can't solve this problem because they can't pass this data quickly enough, I already pointed out that things like RST packets are an option if it's agreeable by all parties, and I didn't say "we may lose 300GB of data before this SDN kicks its DLP in." I pointed out that as an individual component, this particular system has weaknesses that must be properly accounted for in other areas. I'm not sure how a comprehensive security plan is 'handywavy' stuff. The security plans with which I have worked contain a clearly articulated set of business needs, and result from hours and hours of work among IT architects (IT Security Architects, Network Architects, and Systems Architects) as well as other appropriate team members that work together to develop a complete plan, not just one piece of it. If that's 'handywavy' I'm at a loss with regard to how to explain it further.

Quote:

In short, "passive approach" isn't the issue here...it's the fact that you are duplicating effort, creating additional attack surfaces/loss points, and adding additional failure domains. Why? To save maybe 50% of CapEx? The devices we're talking about run in the $200-$400k range. So this "Bro" system can do this for half of that? Maybe saving us $100-$200k up front? Guess how much it's going to cost to pay people to run and manage those Bro instances (sysadmins, IDS/IPS experts, SDN experts/scripters, etc.)

I refer back to my previous post, as I addressed this already.

Quote:

But let's walk through a real, fundamental problem.

SuparSDNScientist is busily generating and sending data. He's so good he's sending a constant stream of 60Gbps to the federal government so they can crunch his numbers on their MASSIVE PUTAR! This data is vital to the future of our economy, as it is from a mortgage collapse modeling experiment. All of a sudden, SuperSDNScientist gets a call from his contact in Washington DC, who is screaming "WHERE IS ALL THE DATA? I'M NOT GETTING ANY DATA! THE PRESIDENT DEMANDS HIS DATA!"

SuperSDNScientist thinks "Hm, that's odd...where's that grad student I've got who knows how all this stuff works? Oh, right he graduated. Well, who took over for him? Oh yeah, LiL Johnny." He then goes to LiL Johnny and demands to know where all them datas went.

Lil Johnny is pretty good with scripting, and knows a little bit about SDN. He checks some things, and realizes that there is a lot going on in this SDN setup that he doesn't understand. "Hell with it," he sez, "let's blame the network guys. I mean, this shit worked yesterday!"

So he goes to NetOps, who check around and can find nothing wrong. I mean, their shit hasn't changed since yesterday...but wait...what's this shit? Some sort of script seems to be fucking with our configs!

I'll just stop there and let you imagine the continued finger pointing and jackassery that will come out of this...meanwhile, that guy in Washington has lost his job, and SuperSDNScientist has lost his funding and tenure. All because they *thought* they knew how to effectively run and manage a network.

Some universities and some research projects use Graduate Students as grunts (maybe all?), but I have yet to experience one in the last 10 years that expect to use graduate students to protect and defend the kinds of projects described in this article. I've seen graduate students used, but none have been tasked with significant primary operations such as you discuss. I also haven't seen a situation where a netops team has allowed the security folks to modify router or switch configurations on the fly other than by setting up very carefully restricted null routes and/or carefully designed systems that interface with the L2 switches and meticulously audit changes. Experiences vary, I'm sure.

What I'm most bothered by in this exchange is that you seem to view your single solution as the only one that's viable. When one looks at the whole picture, you are concentrating on a very small part that doesn't account for a massive portion of the attacks HPC systems must repel. How does a firewall (or an IDS) detect stolen credentials or stolen ssh keys? It doesn't -- especially if they are being used from the same computer the 'normal' user of those keys uses. Other security measures can be deployed that deal with credential theft, but may not be affordable after a $400K firewall is purchased.

Information Security groups must now understand the value of the data in question, the types of individuals (or nation states) that may target the data as valuable, the processes (both physical and technical) that are used to process, store, and transfer the data, the capabilities of the individuals and systems being used in those processes, and must still have the business sense to recognize the cost of implementing security is part of the cost of doing business and therefore can't exceed the value of the data being protected. If you're all about the firewall, I can understand how that sounds 'handywavy' just as Unix seemed 'handywavy' to the VMS admins...

What I'm most bothered by in this exchange is that you seem to view your single solution as the only one that's viable

If I gave you that impression, mea culpa. That is not my belief...what I'm suggesting is that the solution discussed in this particular article has some pretty glaring holes. We are in agreement that firewalls alone aren't a solution. What I am suggesting, however, is that the article as presented seems woefully misinformed on what current tech is capable of with respect to throughput. It also seems to gloss over many of these other concerns that you and I have both discusssed.

Quote:

Information Security groups must now understand the value of the data in question, the types of individuals (or nation states) that may target the data as valuable, the processes (both physical and technical) that are used to process, store, and transfer the data, the capabilities of the individuals and systems being used in those processes, and must still have the business sense to recognize the cost of implementing security is part of the cost of doing business and therefore can't exceed the value of the data being protected. If you're all about the firewall, I can understand how that sounds 'handywavy' just as Unix seemed 'handywavy' to the VMS admins...

InfoSec groups that don't do all of those (and more) are more problem than solution. And no, I am not all about firewall...that is just one tool in the arsenal of a well constructed defense. The problem is, as posited in the article, they are bypassing pretty much all perimeter defense, with the sole excuse of "but but throughput!"

I currently have several high end configurable taps on some of my larger egress points. Those taps are handling a *lot* of tasks and traffic....IDS/IPS, Firewall, Malware detection, even some URL filtering. If someone in my company came to me and said "look, we need to pass 'x' amount of traffic, and we think your stuff can't handle it, so we're bypassing your stuff,"....I'd laugh them right out of my office. First, unless they've already been working with me, then they are working without adequate knowledge to design a solution. Second, they need to understand that the whole is greater than the sum of its parts, and even though they might be a particularly pretty snowflake, they are neither unique nor exempt from data security and governance. I get that university folks tend to live way out there on the edge...but anything that has the potential to bring down *other* systems, much less leak what may be very sensitive or confidential data falls under the larger domain. I'm going to assume they aren't experts on such matters...their point solution may meet their minimum requirements, but it certainly doesn't meet mine.

InfoSec groups that don't do all of those (and more) are more problem than solution. And no, I am not all about firewall...that is just one tool in the arsenal of a well constructed defense. The problem is, as posited in the article, they are bypassing pretty much all perimeter defense, with the sole excuse of "but but throughput!"

It sounds as though we are in agreement on all but one point. If I understand you correctly, you believe a firewall is not just a tool in well constructed defense, but instead believe it's a mandatory tool. I believe Firewalls and IPS systems are very valuable in the vast majority of circumstances and I recognize that there are some edge situations where cost effective comprehensive security solutions can properly compensate for the loss of the Firewall/IPS by using other tools, thereby enabling lower latency and higher bandwidth connections that would otherwise be affordable.

I'm also skeptical about this. Until I see it demonstrated, benchmarked, and compared...I have strong doubts that you gain much in this sense. We're talking about inter-agency traffic...where the VAST majority of latency lies external to the agencies in question. Damned pesky physics.

Physics is gonna getchoo' as Frennzy noted. I highly doubt most issues are localized to the environment.

Also, I'm going back to stock exchanges here, but they have FAR lower tolerances for latency than HPC, and there has been existing tech to keep it under 4-5µs for a few years now. It's not cheap, but it does exist.

I'm still not sold on the "affordable" factor here. At least in my semi-faux calculations, additional salaries are going to far outweigh the total cost of multiple high-end devices after a few years easily.

Physics is gonna getchoo' as Frennzy noted. I highly doubt most issues are localized to the environment.

Also, I'm going back to stock exchanges here, but they have FAR lower tolerances for latency than HPC, and there has been existing tech to keep it under 4-5µs for a few years now. It's not cheap, but it does exist.

I'm still not sold on the "affordable" factor here. At least in my semi-faux calculations, additional salaries are going to far outweigh the total cost of multiple high-end devices after a few years easily.

I think I mentioned this before, but we have gear that can guarantee 600nanosecond port to port latency, allowing you to design deterministic latency within fabrics. And yes, it is in high demand in trading applications.

It's not simply a matter of blind adherence to policies...I can *already* see issues with their proposed design. I have also already proposed an alternative that meets their requirements (higher speed data sharing) *without* introducing unintended consequences, opening up additional attack surfaces, or creating an additional failure domain. My solution *also* doesn't require additional training, knowledge or skillsets that SDN will.

I've seen enough outages due to stateful filters and firewalls to know that you cannot honestly make that statement. Stateful inline devices can and do cause outages in the real world. In fact my experience is that it's one of the more common causes of campus-wide outages, whether it's the IDS deciding that it doesn't like some BGP options, or deciding it's OK to shuffle packets, or turning a single-host DoS into a campus-wide DoS because the device can't keep up. These and other types of middle boxes can and do introduce unintended consequences and create additional failure domains. I suspect you know as much as well.

I'm on board with Frennzy here. Lots of hand waving and if I'm ignorant of the situation, I can't be responsible type of thinking going on.

Sure, it sounds like no one would target purely scientific information from let's say the LHC or Arecebo for manipulation or theft. Then try to consider all of the pharmaceutical and medical research done at universities. Or semiconductor research. Or materials science. Or global finance. The list goes on and on of material which could either be stolen or easily manipulated on such an ignorantly built network. Heck, half of the economically valuable research at universities is already walking out the door with regularity on USB flash drives. At least that way, some grad student is getting paid off with a job.