Application virtualization, past and future

An introduction to application virtualization

When you hear the phrase virtual machine today,
you probably think of virtualization and hypervisors.
But VMs are simply an older concept of abstraction, a common method of abstracting one entity from another.
This article explores two of the many newer open source VM technologies:
Dalvik (the VM core of the Android operating system) and Parrot (an open source VM technology
for efficiently executing dynamic languages).

M. Tim Jones is an embedded firmware architect and the author of
Artificial Intelligence: A Systems Approach, GNU/Linux Application
Programming (now in its second edition), AI Application Programming (in
its second edition), and BSD Sockets Programming from a Multilanguage
Perspective. His engineering background ranges from the development of
kernels for geosynchronous spacecraft to embedded systems architecture and
networking protocols development. Tim works at Intel
and resides in Longmont, Colorado.

Platform virtualization vs. application
virtualization

Virtual machines (VMs), in their first incarnation, were created by IBM 60 years ago as a way to share
large and expensive mainframe systems. And although the concept is still applied in
current IBM systems, the popular concept of a VM has broadened and been
applied to a number of areas outside of virtualization.

Virtual machine origins

The first operating system to support full virtualization for VMs was the
Conversational Monitor System (CMS). CMS supported both full virtualization
and paravirtualization. In the early 1970s, IBM introduced the VM family of
systems, which ran multiple single-user operating systems on top of their
VM Control Program—an early type-1 hypervisor.

The area of virtualization that
IBM popularized in the 1960s is known as platform (or system)
virtualization. In this form of virtualization, the underlying hardware
platform is virtualized to share it with a number of different operating systems and
users.

Another application of the VM is to provide the property of machine independence. This
form, called application (or process) virtualization,
creates an abstracted environment (for an application), making it independent of
its physical environment.

Aspects of application virtual machines

Develop skills on this
topic

In the application virtualization space, VMs are used to provide a hardware-independent
environment for the execution of applications. For example, consider
Figure 1. At the top is the high-level language, which
developers use to construct applications. Through a compilation process, this
high-level code is compiled into an intermediate representation called
object code. In a non-virtualized environment, this object code
(which is machine independent) is compiled into the native machine code for
execution on the physical platform. But in an application virtualization
environment, the object code is interpreted within an abstract machine to provide
the execution. The key advantage here is that the same object code can be
executed on any hardware platform that supports the abstract machine (the
interpreter).

Figure 1. Application VM for
platform independence

In addition to creating a portable environment in which to execute the object code,
application virtualization provides an environment in which to isolate the VM
from other applications running on the host. This setup has a number of advantages,
such as detailed resource management and security.

The object code for a VM is also called bytecode, specifically defining an
instruction set that an interpreter executes. The term bytecode evolved
from implementations that efficiently implemented their virtual instruction sets
as single bytes for simplicity and performance.

Now, let's look at some of the historical uses for application virtualization and explore
some of its modern uses.

Virtual machine history

One of the earliest uses of application virtualization occurred in the 1960s for the
Basic Combined Programming Language (BCPL). BCPL was an imperative language
developed by Martin Richards at the University of Cambridge and was a precursor
to the B language that evolved into the
C language we use today.

BCPL, then and now

Although BCPL originated in 1966, it's still under active development today by its
creator, Martin Richards. BCPL's first compiler was written for the IBM 7094
system under the Compatible Time Sharing System, one of the
first time-sharing operating systems developed. Today, you can use BCPL on
a variety of systems, including Linux.

Although BCPL was a high-level language (similar to C),
the intermediate code that the compiler generated was called O-code
(Object code). The O-code could be interpreted on a physical machine (as a VM)
or compiled from O-code to the native machine language of the host. This
functionality provided a number of advantages in the context of machine
independence. First, by abstracting the O-code from the physical machine, it
could easily be interpreted on a variety of hosts. Second, the O-code could be
compiled to the native machine, which permitted the development of one compiler
and the multiple compilers that translate O-code to native machine instructions (a
simpler task). This machine independence made the language portable across
machines and therefore popular because of its availability.

In the early 1970s, the University of California at San Diego implemented the VM
approach for execution of compiled Pascal. They called the intermediate
representation p-code, which sought independence of the underlying
hardware to simplify the development of the Pascal compiler (instead of relying on
an abstract pseudo-machine architecture). The Forth language also applied
VMs, namely, zero-address or stack-based architectures.

In 1972, Xerox PARC introduced the Smalltalk language, which relied on a VM for
execution. Smalltalk was one of the first languages built around the concept of
objects. Both Smalltalk and p-code heavily influenced one of the most prominent
VM-based languages in existence today: the Java language. Java first
appeared in 1995, developed by Sun Microsystems, and developed the idea of
platform-independent programming through the Java Virtual Machine. Since
then, Java technology has become a building block of web applications. From
server-side scripts to client-side applets, Java technology raised awareness of VM
technologies and introduced newer techniques that bridged interpretation and
native execution using just-in-time (JIT) compilation techniques.

Many other languages include the concept of VMs. The Erlang language (developed
by Ericsson) uses a VM to execute Erlang bytecodes and also
to interpret Erlang from the source's abstract syntax tree. The lightweight Lua
language (developed at the Pontifical Catholic University of Rio de Janeiro in
Brazil) includes a register-based VM. When a Lua program is executed, it is
translated into bytecodes, and then executed in the VM. Later, this article looks
at a bytecode standard that can be used for any language.

Virtual machines today

The use of VMs to provide an abstraction to the physical host is historically a common
method and today evolves and finds application. Let's look at some of the newer
open source solutions that push the concept of VMs into the future.

Dalvik VM

Dalvik is an open source VM technology developed by Google for the Android
operating system. Android is a modified Linux kernel that incorporates
a software stack for mobile devices (see Figure 2). Unlike
many VM technologies that rely on stack-based architectures, the Dalvik VM
is a register-based virtual architecture (see Resources
for more information on the architecture and instruction set). Although
stack-based architectures are conceptually simple and efficient, they can
introduce new inefficiencies, such as larger program sizes (because of stack
maintenance).

Figure 2. Simple architecture
of a Dalvik software stack

Because Dalvik is the VM architecture, it relies on a high-level language compiled
into the bytecodes that the VM understands. Rather than reinvent the wheel,
Dalvik relies on the Java language as the high-level language for application
development. Dalvik also relies on a special tool called dx
to convert Java class files into Dalvik VM executables. For performance, the VM
may further modify a Dalvik executable (dex) for further optimizations,
including JIT compilation, which translates the dex instructions into native
instructions for native performance. This process is also known as dynamic
translation and is a popular technique for increasing the performance of
VM technologies.

As shown in Figure 2, a Dalvik executable (along with an
instance of the VM) is isolated as a single process in Linux user space. The Dalvik
VM has been designed to support execution of multiple VMs (in independent
processes) simultaneously.

The Dalvik VM is not implemented on the standard Java runtime and therefore does
not inherit the licenses over it. Instead, Dalvik is a clean-room implementation
published under the Apache 2.0 license.

Parrot

Another interesting open source VM project is Parrot. Parrot is another register-based
VM technology that was designed to efficiently execute dynamic languages
(languages that perform certain operations at run time that are commonly
performed at compile time, such as altering the type system).

Parrot was originally designed as a run time for Perl6, but it is a flexible
environment for execution of bytecodes for many languages (see
Figure 3). Parrot supports several input forms, including
the Parrot Abstract Syntax Tree (PAST), which is useful for compiler
writers; the Parrot Intermediate Representation (PIR), which is a high-level
representation that can be written by people or automatically by compilers; and
the Parrot Assembly (PASM), which is below the intermediate representation
but useful both for people and for compilers. Each form is translated and
executed in Parrot bytecode on the Parrot VM.

Figure 3. Simple architecture of
the Parrot VM

Parrot supports a large number of languages, but one aspect that makes it so
interesting is its support for both static and dynamic languages, including
specific support for functional languages. Listing 1
shows a simple use of PASM. To install Parrot with Ubuntu, simply use
apt-get:

sudo apt-get install parrot

The following session illustrates a simple string manipulation program in Parrot.
Note that although Parrot implements this code as assembly, it's much more
feature rich than the assembly you may be used to. Instructions in Parrot use
the dest,src syntax, so Listing 1
shows a string register being loaded with text. The length
instruction determines the length of the string and loads it into an integer
register. The print instruction emits the argument
to standard output (stdout), and concat implements
string concatenation.

Listing 1. PASM example

You'll find a rich set of instructions within Parrot (see Resources
for more details). The authors chose richness of features over minimalism,
making it easy to code and build compilers for the Parrot VM.

Even with the high-level abstraction that PASM provides, PIR is even more
comfortable to high-level programmers. Listing 2 provides
an example program written in PIR and executed by the Parrot VM. This example
declares a subroutine called square that squares the
number and returns it. This process is called by the main subroutine (labeled with
:main to tell Parrot to run it first) to print the result.

Listing 2. PIR example

Parrot provides a rich application virtualization environment for the development
of machine-independent applications that also seek high efficiency. You can
find a large number of languages that support compiler front ends designed
for Parrot, including C, Lua, Python, Scheme,
Smalltalk, and many others.

Other uses of application virtual machines

So far, you've seen the historical uses of application virtualization, including two recent
examples. Dalvik is powering application development within current handsets, and
Parrot provides an efficient framework for compiler writers for static and dynamic
languages. But the concept of application virtualization is being implemented in a
number of other areas outside of the approaches explored thus far.

One particularly interesting use is likely running on the computer you're using right now.
Systems that use the new Extensible Firmware Interface (EFI), which is a BIOS
replacement, can implement firmware drivers in what's called the EFI Byte
Code (EBC). The systems firmware includes an interpreter that is invoked
when an EBC image is loaded. This concept was also implemented in Open Firmware
by Sun Microsystems using Forth (a language that includes its own VM).

In the game world, the use of application virtualization is not new. Many modern
games include scripting of nonplayer-character behaviors and other game aspects
using languages that execute bytecodes (such as Lua). But the concept of
application virtualization in games actually goes back much farther.

Infocom, the company that introduced text-based adventures such as Zork,
saw the value in machine independence in 1979. Infocom created a VM called the
Z-machine (named after Zork). The Z-machine was a VM that permitted
an adventure game to be more easily ported to other architectures. Rather than
having to port the entire adventure to a new system, an interpreter would be
ported that represented the Z-machine. This functionality simplified the porting
process to other systems that may have different language support and entirely
different machine architectures. Although Infocom's goal was to ease the pain in
porting between the architectures of their day, their work continues to simplify
porting and results in making these games accessible to a new generation (even
on mobile platforms).

Other game applications of VMs include the ScummVM (which provides a VM environment
for the Script Creation Utility for Maniac Mansion (SCUMM) scripting language
(created in 1987). SCUMM was developed by LucasArts to simplify development of
a graphical adventure game. ScummVM is now used to play a large number of text
and graphical adventure games on a variety of platforms.

Going further

Just as platform (or system) virtualization has changed the way we provision and
manage both servers and desktops, application virtualization continues to provide
efficient mechanisms to abstract an application from its host system. Given the
popularity of this approach, it will be interesting to see an evolution of both
software and hardware to make application virtualization even more flexible and
efficient.

Resources

Learn

Wikipedia provides a great set of resources to learn more about VMs (both platform
and application). Check out the page on
virtual machines in
addition to a page specifically devoted to
p-code machines.

BCPL, the precursor to the B and subsequently
C languages, originated in 1967 from Martin
Richards. You can read the first
BCPL
reference manual online as part of Project MAC. You can also download
the latest version of BCPL from its home
site.

Although Forth has been around since the 1970s, it continues to find applications as
a VM language. You'll find Forth applied in space sciences, embedded systems,
BIOSes, and any other application that exists with scarce resources. Learn more
about Forth at the Forth Interest Group.

Get products and technologies

Dalvik is the VM environment for the
Android operating system. Dalvik was developed by Dan Bornstein and is maintained
by Google as part of Android. Learn more about the Dalvik machine through its
bytecodes (available
in user documentation). You can also learn more about Dalvik in
Introduction
to Android development (Frank Ableson, developerWorks, May 2009).

Parrot is a VM designed to efficiently execute
static and dynamic languages through a variety of intermediate representations
over Parrot bytecode. Parrot is available as open source and can be used with a
number of languages. To learn more about Parrot's instruction set, check out the
opcodes available
within it.

Application VMs are popular in the game development world. One of the earliest uses
was by Infocom in its text adventure games (such as Zork). You can
learn more about the Infocom VM, called the
Z-machine, as well as
interpreters
that exist for various platforms. Another application of VMs was the
SCUMM, used in graphical
adventures by LucasArts. SCUMM has been implemented as open source as
ScummVM and is bringing
older games back to life on new hardware.

Evaluate IBM products
in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment, or spend a few hours in the
SOA Sandbox
learning how to implement Service Oriented Architecture efficiently.

Discuss

Get involved in the My developerWorks community.
Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis.

The first time you sign into developerWorks, a profile is created for you. Information in your profile (your name, country/region, and company name) is displayed to the public and will accompany any content you post, unless you opt to hide your company name. You may update your IBM account at any time.