Security

One of the primary concerns people raise about hyperlinked code is that a small program could call libraries that in turn call others that contain malicious instructions. If this system is built incorrectly, this is a real risk, but if it is done right, hyperlinked programs will actually be more secure than programs that are distributed as a single package.

While a detailed discussion of security is beyond the scope of this article, there are two measures we can use to make this a trustworthy system: a validation mechanism that enables a runtime environment to know that a module it fetched from a repository has the same checksum or certificate as the author intended, and a trusted code repository.

In the examples in this article, I imagine that the system uses a simple validation technique--a 32-bit checksum--to detect changes. An author would include the checksums in the program's main module. At runtime, a module swapped out in one of the code repositories would cause a hyperlinked import to fail because the checksum for the replaced module would be different. I know there are even more secure ways to do this, but I wanted to use a simple example.

The second and more important security measure is a trusted code repository. One of its jobs is to remove modules that are flagged as malicious, hopefully before they can inflict damage. The ability to disable modules will help to halt the spread of malicious code and to disable the parts of a program that are dangerous. Imagine, for example, a simple Trojan program that invokes an apparently harmless module that deletes files at a certain date. If this module is identified and reported before that time, the code repository can disable it and flag it as harmful. When D-Day arrives, the runtime environment either cannot retrieve this module or learns that it has been flagged and refuses to run it. Even if millions of people have downloaded this program, the harmful portion will be defunct. With a conventional program distributed as a single package, there isn't an efficient way to recall harmful components once they're in the wild.

One of the reasons I chose to use Google as an example in this article is that it is uniquely qualified to address these issues, having both the intellectual and technical resources required to build a system that is reliable, trusted, and also easy to deal with. The security issues surrounding hyperlinked code aren't trivial, but they are all solvable, and if this is done right, it will be a more efficient way to build and distribute software.

Dynamically Loaded Modules

Unless you need to talk to a low-level hardware API or do something especially CPU intensive, you can do quite a lot without leaving Python. At least, that's been my experience. Where possible, I always tried to write programs without going outside of the core libraries, mainly so that I could distribute the programs onto other computers without running into a rat's nest of configuration and admin issues.

With this system, it will be easy for developers to share code and for people to use shared code without creating a lot of version control and distribution headaches for themselves. Sharing a module will be as easy as uploading the module to a trusted repository and then referencing it in a program. This will look something like the following:

As you might have guessed, this fetches a copy of wellhithere.py (version 2.0.1) and refers to it locally as wellhithere. The program instantiates this object and tells it to say something. It's pretty basic stuff, with the twist that the import directive makes it easy to load external libraries on the fly. The CRC option allows the program to do checksum verification.

When executing this application, the runtime engine will use a procedure like this one:

Look for import statements that point to modules that have not yet been cached locally; download modules that have not been cached and do checksum verification if requested.

Assign a random filename to each locally cached module; encrypt it in a local file store (to prevent malicious alterations to cached modules going undetected).

Hide (1) and (2) from the application, so the process of caching modules is transparent; prevent the application from calling modules that either could not be downloaded or that failed checksum verification (essentially the same condition as the failure to load a locally stored library due to a file path error).