My "Computing Platform"

Since perhaps 2005,
I have been working on ideas for a "computing platform" that would help
people do their computer work faster and better.
It consists of programming language, desktop/UI, and operating system components
inspired by the common principles of maximizing expressive power,
assisting the user, and overcoming the limitations and tedium of today's tools.
A possible name for the platform would be "Utop", a play on "Universal Desktop"
and "Utopia"; as of 2008-05-27, I couldn't find any other software by this name
on Google.

My ideas for this platform are largely
inspired by my experiences with the strengths and weaknesses of existing tools;
they have already evolved significantly and I expect that they will continue to
evolve as I gain more experience.
Here, for the sake of inspiration and discussion, I'm presenting my ideas as
they currently stand, written in present tense for readability even though
everything is tentative and speculative.
If you have comments or can direct me to past or present work along similar lines,
I would be much obliged if you would email me.

Components

The platform is to have the following components and characteristics:

Language.
A language for specifying and manipulating data and
functions as a special kind of data, in lambda-calculus style. It is more than
a programming language: it is also suitable for writing configuration and
specification files of various kinds
(which are really just specially structured data)
and spreadsheets (which blend data and computation).

The focus on data originated in my work on "constructive-type C++", a
modification of C++ in which I tried to remove idiosyncrasies in the type
system, beginning with the replacement of T *ptr; with the
"constructive" syntax &T ptr;.
Later I noticed that an Eclipse "search for references" to a Java element
finds references in plugin.xml files too. I imagine the Eclipse developers
had to put that in as a special case, but if the language covered
specification files like plugin.xml, such references would be found with no
extra effort.

Formal verification.
The programming language is integrated with a theorem prover so that
developers can formally verify their code as much or as little as they wish.

Side-effect containment.
The language allows code written in imperative style but contains side
effects as much as possible. Side-effect containment in combination with a
proper heap implementation makes it safe and practical to store multiple
execution snapshots/traces and re-run portions of the program as desired.
I foresee two major applications for these capabilities:

The following incredibly powerful form of debugging becomes practical:
run the program from beginning to end, capturing an execution trace,
and let the user browse the trace. But instead of storing the
entire trace (which would require an unreasonable amount of memory), we cache a
few snapshots of the program's state and fill in the gaps as the user browses
by re-running portions of the program.
With good browsing support (the Eclipse debug views provide some inspiration),
this debugging model should help developers
very quickly trace incorrect behavior to its cause.

Developers can code "in light of" one or more example inputs and see the
values that their code produces on those inputs as they work and even as they
autocomplete function calls. (This feature is a seriously souped-up version
of the capability of most spreadsheet formula-editing dialogs to show the
result of a formula as you edit it.) My feeling based on my own experience is
that immediate feedback from examples will make it many times easier for
developers to identify the API that gives them the values they want and
the necessary conditions to tease apart tricky cases.
Depending on the nature of the programming task,
the examples may make the meaning of the code much easier to see.
This development method, which takes test-driven development one step further,
could be called example-driven development.

Side-effect containment is easy for programs that exclusively do computation
(epitomized by programming contest solutions) but a serious challenge for
interactive applications, systems, and especially networks.
I think the challenge is worthwhile and will deepen our
understanding of the issues involved.

Environment.
Each process has an environment through which it discovers its configuration
settings and the libraries, programs, and servers it is to rely on. A
subprocess's environment is inherited from its parent's by default, but the
parent can override parts of it. Actually, the environment itself specifies
the subenvironment to be taken by each kind of subprocess. In this way, one
can pass a special option to a small part of a large system simply by
appropriately configuring the master environment, without having to modify the
entire system to pass the option along. For example, even if my Web server and
rsync use the same environment option to enable debugging, I can pass a
specially tweaked environment to the Web server that causes debugging to be
enabled only for any rsync processes that the Web server directly or indirectly
executes.

A software package manager combining the best ideas of systems such as RPM and
OSGi maintains the software portion of the environment. It can be used at many
levels simultaneously, e.g., for system-wide and user-specific software and for
individual test sandboxes.

Scripting.
All the desktop's an IDE, and all the applications merely expose their APIs
to the user through it. The APIs are integrated with traditional application
GUIs so that a user who wants to manipulate an object in a way not provided for
in the GUI can start calling API on it, interactively or in code
that can be saved for future use. For example, if I have a bunch of emails in
my drafts folder and I want to add someone to the To field of every email, I
can select the emails in question, write a quick loop that appends the desired
string to the To field of each email, preview the results to see that they are
correct, and then confirm the change. In this way, the IDE provides a single
macro/scripting facility for all applications with a smooth upgrade path to
solid, reusable code.

Minimum-overhead bug-fixing.
The process of finding, reproducing, and fixing bugs (in free software at
least) is streamlined. When a UI element malfunctions, the user can pick it
out onscreen using a special keystroke and obtain a precise identifier for the
element to use when searching for known bugs or submitting a new bug report.
Furthermore, she can send a "reproduction bundle" that encapsulates the actions
she took so the developers can see the exact problem she is talking about.

Additionally, the user can jump from a failing API or UI element to its source
code in a single step (and a wait for the system to download the
source code). If the problem is something obvious, she can modify the code and
start using the fixed software in a single step (and a wait for
the system to recompile the software). I currently maintain a custom version of
Fedora's evolution-data-server with the one-line fix for
this minor bug,
and I find the process of patching the RPMs very tedious;
there is plenty of room for better tooling/automation to reduce the overhead.

Search.
Pervasive use of context- and history-sensitive search to help the user
quickly pick out the items she wants without having to go and find them
manually.

Virtualization.
The operating system supports an unlimited number of levels of efficient
virtualization, making it more practical for users to test things in sandboxes.
(Ideally it would be based on a microkernel, but if necessary I'll compromise
to get acceptable performance.)

Permission model.
The operating system has a sane permission model based on the principles
that (1) each user has unfettered access to his own objects (files, devices,
etc.) and (2) a user can share some of her access with others by effectively
running a miniature server to mediate that access. A server enforces
limitations based on a snippet of code in the programming language, so the
limitations can be as simple (e.g., read-only or read/write access to a certain
object) or complex as the user hosting the server wishes. Under this model,
it's very hard for a user to lock himself out of his own stuff (as I have done
three times on real systems), and there are no per-object permissions to get
out of whack (a recurring headache on unix-like systems). The model of
permission sharing via servers might seem dangerously powerful in that it's
hard to anticipate all the ways in which an object might be accessed, but
really, it just reflects what is already happening and will continue to happen
through Web servers; people should get used to it.

OS capabilities.
The OS is free of arbitrary restrictions on the use of its capabilities by
unprivileged users. For example, on Linux, only root is allowed to use the
device mapper and mount and exercise superpowers on filesystems. On my
platform, the user inserting removable media would own its device node and
therefore could use the device mapper on it or mount filesystems present on it,
enjoying the same superpowers on those filesystems that the administrator has
on the system ones.

Deployment.
The IDE can be used to deploy servers and Web sites by composing and configuring
components. A Web server is a single object that the user can instantiate
persistently and configure with a document root and possibly other settings.
It exposes a port that the user can access privately with a Web browser or,
with appropriate privileges, attach to a machine's IP port 80. A Web
application like MediaWiki is likewise a single object that can be wired up to part
of the Web server's URL namespace. The application automatically seeks a
DBMS (such as MySQL) in the environment to host its database; the database is
stored by the DBMS for efficiency but logically belongs to the application and
is copied/deleted with the application. This approach makes a performance-tuned
client-server DBMS as convenient to use as SQLite.

Many deployable components will provide separate administrative and
access-controlled interfaces meant for use by the sysadmin and
attachment to a public port, respectively; the administrative interface
avoids the need for special modes like mysqld's --skip-grant-tables.
A user setting up an application for his own use might just use the administrative
interface for everything. Thanks to the platform's virtualization capabilities,
all deployments are fully portable and can be done in unprivileged accounts,
except to the extent that they need access to privileged services (such as SMTP
and attachment to port 80) to perform their functions. One can always use
"pretend" versions of those services (here, a SMTP server that just accumulates
sent messages in a maildir, and a private port) for testing.

Potentially related work

Here are some existing projects that I or others have identified as
potentially relevant to the platform. The list is not exhaustive,
but I intend for it to grow.

Umut Acar's self-adjusting computation
(techniques for redoing the parts of a computation affected by a small change in the input).

Sage, a programming language that
supports types with expressive constraints on the values.

Acute, a programming
language with good support for type-safe exchange of data between separate
processes.

Epigram, a dependently typed
programming language with an IDE that is generally humble but has some great
interactivity features that I hope to see in the platform.
To program in Epigram, one progressively fills holes ("sheds") in the program's AST,
possibly leaving smaller holes behind.
While one is in a hole, the IDE shows a helpful summary of the variables in
scope and their types as well as the necessary result type. In addition, one
can start using a partially implemented function and go back to fill the holes
as they are actually reached. The platform would extend this capability to
show the values in scope when the program reaches the hole, to help
one fill it correctly. It would be cool if lazy hole-filling worked even for
Web applications so that, when the client hits a hole, the request blocks and the
administrator gets email with the client's values to fill the hole
so the request can be completed.

GNU Hurd and
Coyotos have some similar ideas to the
platform's operating system. (TODO: elaborate.)

The git object database and
Eclipse's ElementTree data structure
give some ideas for the heap implementation.

The Eclipse Java development tooling
gives the developer just about as much help as one can hope for without the
value-based capabilities made possible by side-effect containment.

More broadly, Eclipse is "precedent" for the
platform in that it is a large project that is relatively new but has achieved
great success and popularity for a surprisingly wide range of uses. I hope to
achieve the same with the platform.

Mac OS X's Core Data is
a nice system for specifying data and getting the UI and persistence for free;
all of those capabilities will be useful for the platform.

Additional notes

From Principles of Data Mining by Hand, Mannila, and Smyth,
page 152: "In an ideal world, the data miners would have available a software
environment within which they could compose components (from a library
of model structures, score functions, search methods, etc.) to synthesize an
algorithm customized for their specific applications."
Once the data mining library is written,
I believe the platform's IDE would be ideal for such composing and experimenting.
Realistically, composing all but the most perfectly fitted pieces
will require writing some amount of glue code,
but the IDE is designed to make that as easy as possible.