Over the Internet of Things Hovers the Specter of Legacy Code

The move from IPV4 to IPV6 means that small-memory footprints are back, and a nightmare encounter with legacy code.

With the advent of the Internet of Things (Iot), small-memory footprints have returned, and the need to deal with the requirements of a resource-constrained environment is back with a vengeance. Unfortunately, the understanding that “memory is important” is part of a skill set that has largely been lost because, as far as the typical Java developer is concerned, memory is infinite.

Memory management will be key in the conversion of the massive legacy code base being made necessary by the transition from the now depleted IPV4 network address space to IPv6, which provides an inventory of new IP addresses.

Whether we are talking about a web server, a data collection application, or something else that is aware of the network, there are probably billions of lines of code existing that will see an impact.

The language that the code is written in -- whether Java, Python, Perl, or C -- is relatively immaterial because the core data structures will have to change. This means that all code that manipulates those data structures has to change. That is not a trivial exercise. It isn’t something that you can fix by saying, “Just run this macro against this code and it will magically convert everything from IPv4 to IPv6.”

The issue is that with IPv6, unlike IPv4, there is a function call for address lookup and resolution that dynamically allocates memory in order to return a linked list of all the potential addresses. Since most people have never seen the new APIs, they don’t really understand the nature of the function call and forget they need to free the memory that was allocated. This creates a memory leak.

In some cases, it’s not a big deal to leak 20 or 30 bytes. But any memory leakage at all is a horror show for safety-critical systems and real-time operating systems that have no dynamic memory allocation at all. If I were to tell an expert on safety-critical systems that I was interested in doing dynamic memory allocation or reallocation in a safety-critical system, his or her head would explode.

It’s also problematic for any operating system that has a fixed pool of memory available.

When you have a user application that is allocating memory and not freeing it, that application continues to run until all the free memory is used up. The next time the OS needs to do something -- guess what? It will fail.

Compounding the problem is the fact that much of this legacy code could be decades old. The guy who wrote that code is long departed and any documentation is probably way out of date -- including the documentation that is in the code. So now you have what I call a software archaeology project, in which you have to go in very diligently with toothbrushes and air puffers and try to uncover what the code actually does, as opposed to what it was originally intended to do.

You have to tread carefully, because you don’t know what sorts of bugs are in that code. If you start making significant changes, the chance of your introducing new errors is off the charts.

You could just simply say, "Well screw it. I am going to write everything from scratch." Very few choose this path, however, since it is extremely expensive to rewrite two million lines of code. It also means you will have your own new coding errors with no mileage on that code.

Given how the new functions work, errors are almost a sure thing. The question is how long it will take to find and fix them. Moreover, the error in the code pales in comparison to the potential security vulnerabilities that you just introduced.

The tricky part in all of this is that you will only need to change pieces of code. If we think about only the networking code, the re-examination of billions of lines of code will be required -- and the skill set that’s required to do this is going to be found among people who speak C and C++.

The issue described may or may not be a problem, but it has little to do with the big picture of IoT, the success of which will be dominated by the availability of effective bridging technologies.

Bridging will be used to link much more space and processor efficicient protocols than IPv? with the rest of the world. This is happening already and causes problems not because of memory leaks but because there are few standards and those that do exist are exceptionally poor at the job (Zigbee for example).

Much in the same way there never was an IPv4 crisis because it is a trivial thing for gateways to bridge between IPv4 and IPv6 allowing IPv4 addresses to be mapped and reused, so effective bridging will empower IoT.

Jack, you are right about bad programming having been there all along.

My concern now is the huge number of discrete addresses that will be assigned to inanimate things with short lifetimes, which will result in a bloat of addreses that will be taken but useless. And somebody is going to find some way to do something bad with them, although I have not figured out what just yet. But when we get to having a 256 character internet address it will probably be inconvenient and take up lots of memory just to store addresses. That is the biggest danger that I see.

This is true, but the article makes it seem like memory leaks are a new issue and they are not. It is something we have been dealing with for literally decades. Good programming practices deal with it, bad ones do not. There is no doubt lots of bad code in the public domain with memory leaks, and lots without. The article (blog) practically implies its a done deal it going to happen all the time, and I do not think that is true.

Jack, the problem that I see is that when there are huge amounts of memory available that bad code that constantly consumes memory will be aable to run for a long time before it crashes. The result will be that a lot of it will enter the real world public domain and be causing problems there. And unfortunately there is no mechanism in place to prevent this kind of problem.

Memory leaks have and always will be an issue with programming. The fact that a memory leak can occur in an IPV6 implementation is just that an implementation detail and we have to ask is it any worse than any other memory leak that causes a crash? I do not think so. There will always be mistakes made.

As pointed out, there are ways to encapsulate IPV4 within subnetworks and one has to expect that much of the time, this is exactly what will happen. I do expect a lot of upgrading of routers and the like.

Most of the time, if a device is implemented with an 8051, etc. it will just not be feasible to "upgrade" the firmware to support IPV6. That said, how many internet enable 8051 powered devices a) Exist and b) Can be remote/field ugraded. To that end, talking about the issue is almost a non-starter in most cases. The device will either be encapsulated by the router or upgraded to newer hardware. No other solution will exist.

The bad news is that a lot of legacy code will have to be rewritten. The good news is (also) that a lot of legacy code will have to be rewritten. If you look at modern versions of C/C++ and other languages mentioned in this article there have been tremendous advances in controlling memory leaks by default rather than depending on fastidious programmers to do so. The younger programmers in my group are much more comfortable with these new features, which is a good sign for the code quality of these rewrites in the future.

The real advantage is that we are now pretty much assuming that we are working with at least a 32-bit architecture. Trying to do this kind of code with a Z-80 or 6800 would be an exercise in futility. We are also generally working with megabytes of memory rather than kilobytes. One of the first systems I worked on back in the dark ages had 32 kbytes of RAM. The hardware engineer that I was working with made the comment that he couldn't imagine why anyone would need more than that. This was, of course, before Windows... :-)

The use of all three techniques is more than likely what we'll be encountering in the next several years. However, this will require that developers become aware of the new protocol and its different requirements/operation from that found in IPv4. Fortunately, most of the differences can be found in the set-up and tear-down of the sockets. Once they are configured, the actual sending and receiving of data across the sockets is identical between the protocols.

Where the use of IPv6 becomes more tricky is that the protocol itself is rather complex. Router and neighbor discovery, the lack of a broadcast capability (IPv6 uses multicast heavily) and even just the size of the addresses will put a strain on systems with limited memory. For example, you won't be finding an IPv6 stack that fits in 4K or even 64K of RAM like you can IPv4 implementations. This means that processors like the venerable 8051 need not apply if IPv6 is included.

Even with processors like the ARM Cortex M0/3/4, memory usage will be an important consideration. This means that when it comes to IPv6, we will need to go back to an earlier time when saving bytes in an program implementation may be the difference between an application that fits versus one that doesn't.

There are several transition techniques that already exist for bridgung from IPv4 to IPv6. First, there is a way to encapulate an IPv4 address inside of an IPv6 address. Additionally, there is a NAT 6-to-4 that allows for the use of IPv4 on one side and IPv6 on the other of a given router device. This is in addition to several tunneling techniques for encapulating packets of one type within the other.

Nonetheless, we will likely need to be able to support simultaneous v4 and v6 addressing in our applications for the next decade. So, unlike a a Y2K where there was a single "drop-dead" date, the transition to v6 will be more of a battle of attrition.

For networks that are completely closed, there's no need to convert other than to lessen long-term support costs. For networks that have Internet access though, the need to support V6 will likely be more important as v6 adoption increases.

The threats and worries discussed in the article is worth discussing, but simultaneously we will have to allow ample amount room to the new developments and advancements, this is not the first time some modifications is happening in the internet era. The entire code is always getting modified by a globally community of the programmers.

Also the Architecture of Network Protocol Suite is Modular and Layered so in someone is coming up with a perfectly running code at Network Layer, it will be getting accepted in all the variants of that particular OS.