Coming Soon: Autonomous Nano Code Generators

August 29, 2005

The term nano means very small. For example, a nanometer is one billionth of a meter and a nanosecond is a billionth of a second. The term reasonably applies to code because code runs digitallyalmost ephemeral appearances of electrons, which themselves are each at least as small as a nanometer. In fact, we soon should have autonomous, nano code generators: code generators that automatically write additional code based on patterns of use, operator behavior, changing environmental dynamics, or really, any reason their programmers desire. A nano code generator is a very small bit of code that generates a fragment, the smallest useful bit of code.

Code generation already is in wide use today. A code generator simply is metacode that generates executable code. We can write code generators quickly and easily because of research in refactoring and patterns, and languages like C# and VB.NET that support parametric template classes, reflection, and CodeDOM code generators. Millions of programmers use this technology every day. For example, if you have ever used Microsoft .NETs XML Schema Designer, you have used a code generator. In fact, simple code generators have existed for a couple of decades.

So why no autonomous nano code generators yet? Despite the many tools for generating code, a few unmet technical requirements prevent these generators from working autonomously. This article describes where nano code generators are today, discusses some of their likely benefits and hurdles, and explains how such an evolution may impact the day-to-day lives of computer users and programmers.

How Would Autonomous Nano Code Generators Work?

A nano code generator generates a small fragment of code. For example, you might have a single generator that simply generates a conditional test. An autonomous, nano code generator would contain both the metacode for generating the desired output and the logic for deciding whether the code should be generated. Nano code generators could write small, whole fragments based on simple logical conditions: If A then B, where the predicate condition A must exist before the code B is emitted.

Then, in successively larger and more complex aggregate relationships, molecular assemblers could be designed. These assemblers could collaborate as temporal, cohesive smart mobs to solve algorithmic problems or be composited further to solve problems of increasing complexity and scale. For this to work, generators would have to be created from generatively larger and more complex rules of collaboration and orchestration.

Understanding Generative Code

Software can be grown organically and generatively. Generatively grown code is code that increases in capability and scope as programmers add to it over time. For a practical line programmer, this might be as simple as building critical sub-systems first and then adding additional sub-systems over time.

In this concept, patterns and refactoring technologies are used to ensure that code reaching an arbitrary complexity of n is continually capable of growing in scope and complexity. Patterns are sound solutions that have been demonstrated to resolve classes of problems, and refactoring is a predictable means of changing and improving code.

A few additional technologies help greatly in generative code too:

Parametric templates (or Generics in .NET)

Code generators

Components

Aspects (most recently)

Parametric Templates

Generics are whole algorithmic solutions that are known to work. Simply fill in the data type and the code works every time, reliably.

Code Generators

Code generators write code automatically but at present are initiated manually. If the generator is correct, then the generated code is always correct.

Components

Components are sub-systems big and small, which have been around for a couple of decades. The biggest obstacle to components is that too many companies suffer from the not-invented-here (NIH) syndrome. These companies are easily identifiable because their developers and managers march around saying, we dont use third-party code.

Almost every aspect of modern life is invented by a third party, yet many developers would prefer to roll their own. This is an understandable byproduct of a keen intellect and is understandable from the programmers point of view, but it is definitely not in the self-interest of companies.

Aspects

Aspect-oriented programming (AOP) has to do with the separation of responsibilities in distinct, non-overlapping entities. AOP addresses modularization and the encapsulation of cross-cutting concerns; dividing and orchestrating solutions is critical to managing generationally grown code.

Code Generators Are the Tools

Generative code is not generated code, but code generators are its tools because they can generate code reliably and quickly every time. As most manufacturers know, time to market and reliability are huge factors for any successful endeavor, and they are hugely missing in our industry.

At this time in history, we are just beginning to depend on code generators. Most code is generated at design time. However, it is already possible to invoke code generators dynamically at runtime and load and execute them post-deployment. A functional example is the beta tool CodeDominator that my firm offers. While the tool is imperfect, it contains a huge library of atomic-, molecular-, and component-level generators, and it is capable of creating workingalbeit genericapplications.

Surmounting Hurdles

At present, code generators have practical limitations. They lack the following technical requirements:

Positional assembly  The ability to put code where we want after it has been written and compiled

Massive parallelism  The ability to work together to solve bigger problems

Positional Assembly

Because reflection attaches code generators into existing applications at large, granular intersections, the code is not fully integrated and may not be integratable after deployment. A better solution would be a trusted generator injecting code precisely where needed. Such an injection could occur at the fragment, statement, algorithm, class, namespace, or assembly level, intertwined or attached to existing code. This is the concept of positional assembly.

Suppose that over a period of time an application keeps raising an exception that is unhandled and crashing as a result. This predicament highlights the need for positional assembly. Theoretically, an autonomously generated try..catch block could be wrapped around the crashing block of code and at a minimum the exception could be written to the event log. Better yet, the problem could be diagnosed at an earlier juncture to prevent the crash-inducing bad input.

To support positional assembly, software language vendors would need to implement what I refer to as sleeves. Sleeves are wrappers that permit the positional placement of dynamically generated code. At present, code is four-dimensional. Code is comprised of lines, modules, assemblies, and execution over time. Sleeves are fifth-dimensional entities in a space wrapped around four-dimensional code, and they are permitted to interact at any of the five dimensional points, including across sleeve boundaries.

Note: An alternate way to think of code is in terms of four states: creation, execution instance, execution overtime, and rest. Think of a sleeve as a fifth dynamic state that can interact with code during any of its other states, including inter-sleeve communications.

Current implementations such as .NET support the isolation of applications by application domain in order to prevent contamination. Sleeves would act like layers around application domains and play the role of boundary entities that can interact at any point in an application domain by injecting generated code and across application domains in order to work in parallel and convergently. Obviously, this capability would be greatly dependent on trust and reliability.

Positional assembly also would rely on programming languages and generators supporting the two other concepts as well, massive parallelism and convergent assembly (which were identified by Dr. Ralph Merkle).

Massive Parallelism

Massive parallelism is many autonomous nano code generators working together, perhaps like Howard Rheingolds Smart Mobs. Parallelism is presently supported in hardware and software, but it hasnt been applied to code generation.

Like a colony of ants, code generators would assemble for a period of time to accomplish a common goal, with each code generator performing a very simple task.

Convergent Assembly

All of these small code generators would need to converge and work in orchestrated groups to make small bits of generated code into bigger bits. For small, algorithm-sized problems, a few dozen or maybe as many as a few hundred generators would converge and piece together a solution. As the problem increased in complexity and scale, nano generators would assemble small pieces of code, and molecularor largergenerators would converge to convert the small fragments of code into increasingly larger chunks of the solution. In this way, fragments, algorithms, classes, components, aspects, sub-systems, and possibly whole systems could be generated autonomously.

Getting There from Here Without Stumbling

Implications, both good and bad, will need to be weighed as this technology matures. Lets take a moment to consider some of these.

Benefits of Generators

Several benefits of code generators come to mind. An obvious one is that when metacode is correct all subsequently generated code will be correct. For example, if you write a code generator for managing threads in WinForms, then every generated use instance will work correctly. This means that even if a developer hasnt mastered the nuances of Invoke and delegates, he or she can still use multithreaded behaviors. Further, a nano code generator could discover an optimization point and inject the threaded behavior correctly itself.

Some areas are ready to explore code generators right now. Of course, software development would benefit from the availability of more code generators because this would enhance reliability factors. We programmers could also define plug-in points that look like assemblies, permitting us to dynamically add assemblies later. This is how internationalization works. (Internationalization keys off country codes. If a country code changes, the .NET Framework looks for a specific resource assembly.)

Implications of Automated Code Generators

In the face of automated code generators, programmers might worry about their financial future. Dont worry. For the foreseeable future, software will continue to grow in complexity, and code generators more than likely only will alleviate tedium because the easy tasks will be automatable. A bigger concern is how we cope in a world with software with diverging behaviors.

The end result may be many more support people for all of these slightly different variations, as well as developers searching for unique evolutions. Perhaps even a whole cottage industry will crop up where programmers are paid to test and distribute new evolutions. In addition, someone will have to write all of these generators, and the ratio between metacode and generated code is about 10-to-1. (That is, it takes about ten lines of code-generating code to generate one line of code.) This might mean that we will need even more developers than ever to write all of these generators.

Many problems remain to be solved, including diverging evolutions, and security and trust between code generators and applications. The technology isnt quite there yet. But in this world, software could grow in capability, value, and complexity at geometric rates, resulting in some very valuable and interesting derivatives.

Science Fiction Becomes Fact

Like pieces of a puzzle, the technologies and pragmatic problems involved in designing autonomous nano code generators are slowly taking shape. I think it was Stephen Hawking who said, Yesterdays science fiction is todays science fact.

We already have the science fiction in Michael Crichtons novel Prey, in which he captures an interesting and perverse scenario based on nanotechnology and intelligent nanomachines, and we have most of the science facts in refactoring, patterns, generics, reflection, code generators, and parallel computing. For science fiction to become science fact, we need the capability to generate code wherever it is neededwe need autonomous code generators. Alan Turing demonstrated that complex machines could be created through lengthy binary decision making, and I dont see any reason why simple binary rules couldnt be used to support autonomy.

Collectively, this means that science fact is hinging on positional assembly capabilities in languages like C# and the ingenuity of programmers like you and I.

About the Author

Paul Kimmel has written several books on object-oriented programming and .NET. Look for his upcoming books UML DeMystified from McGraw-Hill/Osborne (Spring 2005) and Expert One-on-One Visual Studio 2005 from Wiley (Fall 2005). Paul is also the founder and chief architect for Software Conceptions, Inc, founded 1990. He is available to help design and build software worldwide. You may contact him for consulting opportunities or technology questions at pkimmel@softconcepts.com.

If you are interested in joining or sponsoring a .NET Users Group, check out www.glugnet.org.