This paper discusses some relationships between
high assurance software (for security or safety) and
free-libre / open source software (FLOSS).
In particular, it shows that many tools for developing high assurance
software have FLOSS licenses, by identifying FLOSS tools for
software configuration management,
testing,
formal methods,
analysis implementation, and
code generation.
It particularly focuses on formal methods, since formal methods are
rarely encountered outside of high assurance.
However, while high assurance components are rare, FLOSS high assurance
components are even rarer.
This is in contrast with medium assurance, where there are a vast number
of FLOSS tools and FLOSS components, and the security record of FLOSS
components is quite impressive.
The paper then examines why this is the circumstance.
The most likely reason for this appears to
be that decision-makers for high assurance components
are not even considering the possibility of FLOSS-based approaches.
The paper concludes that in the future,
those who need high assurance components should
consider FLOSS-based approaches as a possible strategy.
The paper suggests that
government-funded software development in academia
normally be released under a GPL-compatible FLOSS license
(not necessarily the GPL), to enable others to build on what
tax dollars have paid for, and to prevent the vast waste of effort
caused by current processes.
Finally, developers who want to start new FLOSS projects should
consider developing new high-assurance components or tools;
given the increasing attacks and
dependence on computer systems, having more high assurance
programs available will be vital to everyone’s future.

This paper discusses some relationships between
high assurance software (for security or safety) and
free-libre / open source software (FLOSS).
First, let’s define these key terms.

Definitions

Free-libre / open source software (FLOSS) is software
whose license gives users the freedom to run the program
for any purpose, to study and modify the program, and to
redistribute copies of either the original or modified program
(without having to pay royalties to previous developers).
It’s also called libre software, Free Software, Free-libre software,
Free-libre / Open source (FLOS) software, or open source software /
Free Software (OSS/FS).
The term “Free Software” can be confusing, because
there may be a fee for the Free Software; the term
“Free” is derived from “freedom” (libre),
not from “no price” (gratis).
More formal definitions of
open source
software
and
free software (in the sense of libre software)
are available on the web.
Examples include the Linux kernel, the gcc compilation suite,
the Apache web server, and the Firefox web browser.
Many FLOSS programs are commercial, while many others are not.

For purposes of this paper,
let’s define “high assurance software”
as software where there’s an argument
that could convince skeptical parties
that the software will always perform or never perform
certain key functions without fail.
That means you have to show convincing evidence that there are
absolutely no software defects
that would interfere with the software’s key functions.
Almost all software built today is not high assurance;
developing high assurance software is currently a specialist’s field
(though I think all software developers should know a little about it).
To develop high assurance software you must apply many development
techniques much more rigorously, such as
configuration management and
testing.
You need to use
implementation tools you can trust your life to.
And in practice, I believe that you
need to use mathematical techniques called
“formal methods”
for a product to be high assurance, for the simple reason
that it’s usually hard to create truly convincing arguments otherwise.
A significant fraction of this paper covers formal methods, since they are
rarely encountered outside of high assurance.
There isn’t a single universal definition of the term high assurance,
and products have been labelled “high assurance”
without having any formal methods applied to them.
But this definition should be sufficient for my purpose.
Other terms used for this kind of software are “high integrity”
and “high confidence” software.

Usually high assurance software is developed because of serious
safety or security concerns.
Strictly speaking, software by itself has no safety or security
properties -- it can only be safe or secure in the context
of a larger system.
A nice discussion of this issue from the safety point of view
is in Nancy Leveson’s book Safeware (see section 8.3).
But software tends to control safety and security systems, and such
software is often called
“safe software” or “secure software”.
For this paper I’ll talk
about the security or safety of the software,
with the understanding that this only makes
sense if you understand the system that the software will be part of.

For purposes of this paper, the identity of the software’s supplier
is not part of the definition of high assurance.
By supplier, I mean the
provenance (origin) and pedigree (lineage -- who the software passed through)
of the software.
By keeping the supplier identity out of the definition of high assurance,
I can concentrate on technological issues.
In reality, there may be some people who you wouldn’t
trust even if they’d “proved” their code correct...
so in practice it’s quite reasonable to ask questions like,
“Who developed or modified the software? Can I trust them?”
For both FLOSS and proprietary software, provenance and pedigree can be
considered in exactly the same way --
in both cases, you’d consider who originally developed the
software (in terms of each change), and who controlled the software
from development through deployment to you.
In particular, you’d consider who has rights to modify the software
repository, and whether or not you trusted them.
You might also consider how well the development environment itself
is protected from attack.
Don’t be fooled into thinking that FLOSS is “riskier”
than proprietary software because it can be legally modified by anyone.
Anyone can modify a proprietary program with a hex editor, too --
but that doesn’t mean you’ll use that modified version.
The issue with suppliers is who controls your supply chain,
and FLOSS often has an advantage in provenance and pedigree
(because it is often easier with FLOSS to determine
exactly who did what, and who has modification rights).
But provenance and pedigree issues have to be handled on a case-by-case basis,
and trying to cover those issues as well would over-complicate this paper.
For example, the whole issue of
“who trusts who” varies depending on the organizations
and the circumstances.
In an ideal world this wouldn’t
matter, because the proofs would be true and could be
rechecked everywhere.
Given the massive move to globalization,
I think that would be worth trying to make
who created the software irrelevant.
In any case,
let’s concentrate on the technical aspects in this paper.

Contrasting levels of assurance

More generally, assurance is simply the amount of
confidence we have that the software will do and not do
the things it should and should not.
Sometimes the things it should not do, what I call
"negative requirements" , are the most important.
Any particular piece of software can
be considered by someone to be low, medium, or high assurance.
This is obviously a qualitative difference; two products could be
in the same assurance category, yet one be more secure than another.

For purposes of this paper,
let’s define medium assurance to be software
which doesn’t reach high
assurance levels, but where
there has been significant effort expended to find and remove important
flaws through review, testing, and so on.
Note that when creating medium assurance software,
there’s no significant effort to prove
that there are no flaws in it,
merely an effort to find and fix the flaws.

Medium assurance software must undergo testing and/or
peer review to reduce the number of flaws.
Such mechanisms can be really valuable in reducing flaws, and
eliminate a great many of them, but the normal method
of using these mechanisms won’t guarantee their absence.
You can eliminate some types of flaws completely by some activities,
e.g., you can completely eliminate buffer overflows by choosing almost
any programming language other than C or C++... but doing so
would not eliminate all flaws!
In particular,
testing by itself is impractical to prove anything about real software.
After all,
exhaustively testing a program that just adds three numbers
would take 2.5 billion years (assuming each number was 32 bits,
you could run the program a billion times per second, and you
used 1,000 computers for testing).
Real programs are much more complicated than this,
which is why testing by itself can’t reach
the highest levels of assurance.

The differences between medium and high assurance (as I mean the
terms in this paper) seem to confuse people, so let me contrast them directly.
When developing a high assurance program,
the program is presumed to be wrong (guilty) until
a preponderance of evidence proves that it’s correct.
When a medium assurance program is being developed,
it is spot-checked in various ways throughout its development
to try to detect and remove some of its worst defects.
Medium assurance software development normally
leave some defects in the program afterward.
Few like the presence of latent defects, but
few people are willing to pay for (or invest the time)
for high assurance development techniques today in most software.

It’s reasonable to think that as technology improves,
high assurance programs will become more common.
But even today there are some situations where medium assurance is not enough.
Typically this is where people’s lives,
or the security of a nation, is at stake.
In such cases, some of today’s customers need
serious evidence that there are no critical defects of any kind.
They need something different: High assurance.

High assurance challenges and standards

Ideally, all software would be high assurance, but ideally we’d all
live in mansions.
It’s very difficult to create truly high assurance software.
The configuration management and testing requirements are usually
more severe (and time-consuming) than those for other
kinds of software.
Applying formal methods requires significant mathematical training
that most software developers don’t have, and can be very
time-consuming.

Because of these challenges,
high assurance software is usually only developed for
critical security or safety components.
When creating critical security or safety components,
a number of regulations are often imposed.

High assurance software for security is the point of the
Common Criteria for IT Security Evaluation (ISO standard 15408) when
you select EAL 6 or higher -- and EAL 6 is really a compromise!
For purposes of this paper, medium assurance software is in
the EAL 4 to 5 range of the Common Criteria,
so Red Hat Linux and Microsoft Windows would both be considered
medium assurance products.
I consider EAL 2 (or less) to be low assurance; EAL 3 is a compromise,
but it’s basically low assurance.

Here are some other standards that are often mentioned in the security
or safety world, which often impact this kind of development:

Security:
Besides the Common Criteria,
the older (superceded)
Orange Book defined a set of requirements; level B3 had a number of
requirements aimed at high assurance,
and level A1 extended them even further.
(Again, A1 was clearly high assurance, with the next highest level B3
being a kind of compromise.)
While not officially required, the FAA and DoD have developed
Safety and Security Extensions for Integrated Capability Maturity Models.
In the U.S. national intelligence world, another software security standard is
DCID 6/3; its confidentiality protection level 5 imposes many requirements
(see the DCID 6/3 manual
if you want to know what each level means).

Organization of paper

The rest of this paper looks at FLOSS tools that can be used
to create high assurance components (there are many), and
FLOSS components that are high assurance themselves (they are rare).
It then contrasts this situation with medium assurance -- there are
many medium assurance FLOSS tools, and FLOSS
components with impressive results.
The paper then speculates why this is the circumstance, and then concludes.

It turns out that there are a lot of FLOSS tools that can be
used to help develop high assurance software.
To prove that, I’ve identified a few important tool categories,
and for each category I identify several FLOSS tools.
The tool categories I discuss below are
configuration management tools,
testing tools,
formal methods (specification and proof) tools,
analysis implementation tools, and
code generation tools.

There are many other categories of tools, and
many other specific FLOSS tools, that are not listed below.
But the discussion below should prove my point
that there are many FLOSS tools that can be used to help develop
high assurance components.

CVS is an old and still very
widely-used SCM tool. I suspect that most software worldwide, both
proprietary and FLOSS, is still managed by CVS as of 2006.
Subversion (SVN) is the
SCM tool rewritten as a replacement for CVS, and it’s
widely used, too.
But the list of FLOSS SCM tools is amazingly long, including
GNU Arch, git/Cogito, Bazaar, Bazaar-NG, Monotone, mercurial, and darcs
(see my paper for a longer list).
Clearly, there’s no problem finding a FLOSS SCM tool.

All developers test their software, but high assurance software
requires much more testing to gain confidence in it.
But again, there’s a massive number of FLOSS tools that support testing.
In fact, there are so many FLOSS tools for testing
that there’s a website
(opensourcetesting.org)
dedicated to tracking them;
as of April 2006 they list 275 tools!
This ranges from bug-tracking tools like
Bugzilla,
to frameworks for test scripts like
DejaGnu.

Many high assurance projects are required to meet specific
measurable requirements on their tests.
One common measure of testing is “statement coverage”
(aka “line coverage”), the percentage
of program statements that are exercised by at least one test.
One problem with the statement coverage measure is that statements
that have decisions, such as the “if” statement, can cause different
paths.
Thus, another common measure of testing is “branch coverage”
the percentage of “branches” from decision points that are covered.
Branch coverage has its weaknesses too, so there are
many other test measures as well -- but statement and branch coverage
are the two most commonly-used measures, so we’ll start with them.

Some experts believe that unit testing (low-level tests)
should achieve 100% statement coverage and 100% branch coverage,
with the simple argument that if you’re not even
covering each statement and each branch, your testing is poor.
Most others argue, however, that 80%-90% in each is adequate --
because the effort to create tests to meet the last percent is very
large and less likely to find problems than by spending the effort
in other ways.
No matter what, in my opinion you should create your tests first
and then measure coverage -- don’t
write your tests specifically to get good coverage values.
That way, you’ll often gain insight into what portions of the
code are hard to test or don’t work the way you thought they would.
That insight will help you create much better additional tests to
bring the values up to whatever your project requires.

(Oh, and why measure both statement and branch coverage?
It turns out it's possible to meet one without the other.
For example, an "if" statement with a "then" clause but
no "else" caluse might have all its tests yield true.. in which case
all the statements are covered, but not all the braches are covered
(the "false" branch is not covered).
Normally, when you cover all branches you cover all statements, but there
are special cases where that is not true.
For example, if your program (or program fragment)
doesn't contain any branches, or if there is an exception handler
without any branches in its body, you can have all branches covered
but not all statements covered.
Exception handlers might be considered
a branch, but that interpretation is not universal.)

There have been several recent developments in testing that
improve test efficiency:

QuickCheck is
QuickCheck (BSD license) is a combinator library written in Haskell,
designed to assist in software testing by generating test cases
for test suites. As noted in Wikipedia,
"The author of the program being tested makes certain assertions
about logical properties that a function should fulfill; these
tests are specifically generated to test and attempt to falsify
these assertions."
The assertions are also useful for documenting the program.
Although the original was created for Haskell,
re-implementations exist for Scheme, Common Lisp, Python, Ruby, Standard ML,
and many other languages.

One of the most important recent developments in testing has been
developed by NIST, and is to be released as a FLOSS tool.
NIST's
NIST's
Automated Combinatorial Testing for Software
work has developed new algorithms to efficiently create a minimum number
of tests that nevertheless cover various levels of combinatorials.
This means that you can efficiently create test suites that really
do a good job of testing all combinations.

Even in the case of test case measurement,
there are FLOSS tools that can meet this need.
There are several FLOSS “test coverage” tools, such as gcov,
that can report which statements or which branches
were not exercised by your test suite.

Many software developers have no idea what “formal methods” are.
Yet my definition of high assurance
implies that we’ll usually need to use
“formal methods” to create high assurance software.
This section explains what formal methods
are, shows that there are lots of FLOSS tools even in this area, and
then discusses some of the implications.

Formal methods, broadly, are the application
of rigorous mathematical techniques to software development
(see An International Survey of Industrial Applications of Formal Methods for a lengthier definition and discussion).
Ideally, we’d like a rigorous mathematical specification
stating exactly what we want the program to do and not do,
and then prove all the way down to the machine code that the software meets
the specification.
This is normally hard to do, so various compromises are often made.
Many have identified three different broad levels of the use of
formal methods, in order of increasing cost and time:

Level 0: A formal specification is created
(a specification using mathematics),
and the program is then developed from this informally.
This has been called “formal methods lite”.
Creating formal specifications is not easy, because you’re
trying to take ambiguous, poorly-defined ideas and
turn them into a rigorously defined specification.
Still, creating a formal specification often doesn’t
take too much time (for someone trained in how to do it),
and they do tend to help clarify what the real issues are.
This is the cheapest way to use formal methods, and
many argue it’s the most cost-effective way to use formal methods.

Level 1: The mathematical approaches are used further, beyond
a specification but not all the way into code.
Two common ways are to
(a) refine the specification down deeper
to a mathematically-defined design or a more detailed model, and/or
(b) prove important properties of the specification and/or model
(either by hand or with automated help using a theorem prover or
model checker).

Level 2: Theorem provers and/or model checkers
may be used to fully prove that the
actual code matches the design specification.
This is usually very expensive, and when done at all
this is often done with only the most critical portions (where
no other way can give enough confidence).

The “levels” are a little misleading, because you can actually do
things partially (perhaps only a part of the software
is formally specified), and level 1 is somewhat ambiguous.
But these levels give the basic flavor;
there is a trade between rigor and effort.

Now we come to the decision of where to draw the line,
and this isn’t an easy decision.
For purposes of this paper, to count as “high assurance”
there needs to be some carefully-reasoned explanation as to why
the running code meets its key requirements.
How much effort is needed for this justification
depends on the risk you’re willing to take,
and where you perceive the risks to be.
Thus, while level 0 may be less costly, that is often not enough, so
high assurance development often moves
into level 1 and uses a focused application of level 2 on the parts
that cannot be shown correct otherwise.
Almost no one tries to prove all code down to the machine code;
typically developers with such concerns
will check the machine code by hand to ensure that it
corresponds with the source code.
Some may prove down to the source code, or at least the parts of the
source code that are most worrisome.
Others may use proofs to a detailed software design,
and then use other less rigorous arguments to justify the source code.
You can even back off further, using formal methods
only for the specification (level 0), or not at all.
In all cases, though, there needs to be some careful reasoning that
convinces others that the code actually meets the key requirements,
typically by showing a stepwise refinement from specification through
to the code.
Mantras such as “correct by construction”
come into play in these kinds of systems.

We would love to formally prove that every line of code,
down to the machine code, is correct; doing so has lots of benefits.
Why is it so costly?
Simply put, creating proofs is incredibly hard to do;
often tools and knowledgeable humans must work together to create them.
To prove code correct, you generally must
write the code and proofs simultaneously (so that the code is in
a form that is easier to prove).
Requiring proofs also
creates limits on the size of programs (and thus their functionality),
because our ability to do proofs does not scale that well.
Years ago, the old historical rule of thumb
for the largest amount of
code that can be reasonably proven correct all the way down
to the code level was about 5,000 lines of code.
Cleanly-separated components can be verified separately
(e.g., a computer’s boot and initialization programs might be separable
from an operating system kernel), and that helps.
This rule of thumb is (I believe) historical;
the tools for verifying code have improved, and
good tools (including languages designed for provability)
can help today’s developers go significantly beyond this scale.
SPARK Ada’s developers in particular claim they can go way beyond this.
But it’s not clear
where the upper bounds really are, and it’s clear that formally
proving code gets harder as the software gets larger.
Typical operating systems have millions of
lines of code and are growing fast, so no matter what the upper bound is,
there is a real gap between typical
commercial demands for functionality and the ability of today’s
formal methods tools to verify it.
Don’t expect Windows, MacOS, the Linux kernel, or *BSD kernels
to be formally proved down to their code level.
Proving only general models of code (instead of the system itself)
eliminates this problem,
but as I noted above,
this doesn’t show that the code itself is highly assured.

Note that all formal methods have a basic weakness: They must make
assumptions, because you have to start somewhere.
In any such system, humans have to check the assumptions very, very carefully.
If you start with a false assumption, a "proof" could produce an
invalid conclusion.
This problem -- that your assumptions may be invalid --
is a key reason that testing and other activities are still needed
for high assurance software, even if you use formal methods extensively.

Another trade-off in formal methods
is between expressiveness and analyzability.
Fundamentally, any formal method has some sort of language,
a set of axioms, and inference rules (the rules that let you determine
if something else is true).
A language that is extremely flexible (expressive) typically tends to
be harder to analyze.
As a result, there are many different languages, each better and
different things.

Tigris
is an open source community focused on building better
software engineering tools (for collaborative software development).
They don't specifically focusing on formal methods, but
they have interesting tools like
Delta
(BSD license)
which minimizes "interesting" files subject to a test
of their interestingness
(e.g., to isolate a small failure-inducing substring
of a large input that causes your program to exhibit a bug).
Many of their tools can usefully work with the tools listed below.

Note - don’t treat “formal methods” as a checklist item
for high assurance
(oh look, some math, we’re done!).
The point in high assurance is to identify the risk areas, and
then use tools (like formal methods) to convincingly
show that there isn’t a problem.
There is more than a little overlap between those developing
high assurance software and the research community;
applying these techniques can be difficult for some domains,
if you need to get really high levels of confidence for
complex systems.

There are many different kinds of formal methods tools, which
I will group into these categories:

Specification tools: These help you write and check specifications written
using a formal notation (such as Z, VDM, B, etc.).
These tend to be designed for people who are working at level 0 or level 1;
they are often connected with other tools to go to level 1 or 2.

Theorem provers/proof checkers: Theorem provers
take a set of assumptions and rules, and try to
prove claims about them using traditional mathematical proof
techniques (generally they need human help).
They vary on many factors, such as
what information they can use (from specifications or
general theorems down to program code or annotations).
Proof checkers check a proof created elsewhere.

Model checkers: These try to prove claims, but unlike theorem
provers, model checkers do this
by trying to find all possible circumstances (states) and showing that
they meet the criteria.

Other: Other tools exist which don’t easily fit into these categories.

Note that these are very rough and imprecise categories.
All formal methods tools must support some kind of specification notation,
tools often have multiple capabilities, and there is a general
trend of combining these tools into larger interoperable capabilities.
A general discussion about issues in integrating tools is in the paper
“PVS: Combining Specification, Proof Checking, and Model Checking”.
Thus, any categorization is imperfect, but hopefully this division will help.

All formal tools have some sort of specification language,
but some languages are often focused on higher-level
specifications -- helping users enter, syntactically check, and
cleanly display the specifications with a minimum of effort.
These are often used for level 0 and 1 type of work
(though they can be used for more -- often by devising connections
to other tools).
Here is a partial list of specification languages,
and FLOSS tools that support them:

Z.
Z is pronounced “zed” even in the U.S., and
in 2002 was standardized as ISO/IEC standard 13568:2002(E). There are several
toolsuites that support Z.
The
Community Z tools (CZT) project
is developing and coordinating FLOSS projects that develop Z support tools.
fuzz (MIT license)
is a type-checker for Z.
ZETA (GPL + public domain)
is an environment for developing specification documents based on Z.
“It provides an integration framework for tools to edit, analyse
and animate Z specifications and formalisms which are mapped to Z.”
It supports type-setting, type checking, and “execution” of pure Z.
ProofPower (GPL license, except for the Ada plug-in)
is a suite of tools supporting specification and proof in
Higher Order Logic (HOL) and in the Z notation.
Jaza (GPL)
is an "Animator" for the Z formal specification language,
developed at the University of Waikato (primarily by Mark Utting).
More information about Z is available at
Z User Group, including the
Z User Group virtual library.

Alloy.
Alloy is a tool that's hard to categorize.
Alloy implements a specification language that's intentionally similar to Z,
but makes it very easy to analyze and find counter-examples for.
Its analysis capabilities are far beyond what a "pretty printer" or
"type checker" can do, but it can't prove arbitrary properties; see
its description for more information.
You give up some capabilities, but receive a massive ease-of-use bonus
in return.

CASL. The
"Common Framework Initiative for algebraic specification and development"
(CoFI) is a voluntary organization for an open collaborative effort
to produce a Common Framework for Algebraic Specification and Development.
In particular, they have produced the
Common Algebraic Specification Language (CASL),
a specification language that is designed to be a careful selection of
known constructs, intended to be expressive, simple, and pragmatic.
Their goal was to create a language suitable for
specifying requirements and design for conventional software packages;
it has restrictions to various sublanguages, and
extensions to higher-order, state-based, concurrent, and other languages.
Hets (LGPL-like license),
the successor of the CATS tool, supports CASL,
several extensions of CASL, and Haskell.
Hets is a parsing, static analysis and proof management tool
for combining various tools for different specification languages;
its "single" language
is the heterogeneous specification language HetCASL.
Hets includes parsing, static analysis, and proof support.

VDM-SL (Vienna Development Methodology - Specification Language).
Overture is a set of
FLOSS tools (both current and under development) to support the VDM++
specification language (an enhanced version of VDM).
VDM-SL is standardized by ISO/IEC as ISO/IEC 13817-1: 1996.
VMD is really a whole method, of which VDM-SL is the specification language
piece.
VDM seems to be less active than Z to me, but
that is simply an impression and may not be true.

Unified Modeling Language (UML) Object Constraint Language (OCL).
UML is defined by the Open Management Group (OMG); UML version 2.0
added OCL.
KeY (GPL license)
supports formal specification and verification of programs
in conjunction with UML.
UML OCL is part of the UML standard; KeY can then analyze the constrains.
The target language of KeY based development is Java CARD,
a proper subset of Java for smart card applications and embedded systems.
KeY currently requires the use of a proprietary UML tool, but this
does not seem fundamental to KeY; it should be possible to integrate
KeY into a FLOSS UML tool as well.

ProMela.
ProMela is a language for specifying distributed software systems;
it was originally developed for the model-checking tool
Spin, and the
DiVinE tool
supports it too.
See the text below for more about these tools.

Here are FLOSS theorem provers and checkers
(increasingly they are combined with model checkers, in which case
I list them here and not under model checkers):

ACL2
(GPL license)
is an industrial-strength theorem prover,
part of the Boyer-Moore family of provers
(winner of the 2005 ACM Software System Award).
It takes expressions using LISP notation and tries to automatically
prove the expression.
ACL2 is one of the more commonly-used such tools for industrial-strength
proving of real world programs, though it’s certainly not the only one.
I’ve talked to users of ACL2, who claim that ACL2 strikes a nice balance
between trying to do everything automatically (which sadly isn’t
practical yet) and forcing users to do everything “by hand”
(which is painful) -- it tries to do much automatically, while still making
it easy to control.
ACL2 is the intellectual successor of the Nqthm theorem-prover.
This family has been used to prove correctness for
many processor designs, microcodes, and machine object codes,
including AMD microprocessors and pieces of the Berkeley string library.
See Boyer and Yu’s
and
Boyer and Moore’s “Mechanized Formal Reasoning...”
for work at the machine code level, where they found 3 defects at the
machine code level (the same idea works for bytecode, too).
I should note that this family of tools (ACL2/Nqthm)
is rather different from other tools.
ACL2’s developers claim that
it only takes several months to become an effective ACL2 user
for someone who has
“a bachelor’s degree in computer science or mathematics,
has some experience with formal methods,
has had some exposure to Lisp programming and is comfortable
with the Lisp notation,
is familiar with and has unlimited access to a Common Lisp [implementation],
is willing to read and study the ACL2 documentation, and
is given the opportunity to start with “toy” projects”.

ACL2 is powerful enough to be very useful, and has been
used for many important commercial projects.
ACL2 directly supports mathematical induction, meaning that ACL2 can directly
handle computer programs with loops or recursion (something many other
theorem-provers cannot handle as directly and thus must handle in other ways).
ACL2’s defchoose and defun-sk abilities add the ability to handle
“there-exists” and
“for all” statements to ACL2, which the ACL2 developers say
adds “all the abilities of full first order logic”
(but see below for the limits in how they can be applied; ACL2’s support for
the quantifiers (for-all and there-exists) is limited).
ACL2 supports encapsulation (the “encapsulate” form lets you
describe general properties of a function,
instead of having to define all functions), so you do not have
to define an executable function to use ACL2.
ACL2 supports some capabilities of second-order logic (higher-order functions,
though not variables).

ACL2 is one of the strongest of any theorem-prover
in its support of executability;
you can interactively enter runnable LISP functions, and begin proving
properties about them. That is really powerful!
There are good LISP compilers, so the execution can be really fast
(especially if you use the usual LISP speedup tricks, such as
tail recursion, arrays, and declaring numeric types).
It also supports “single threaded objects”,
which you can use to make models with state run much faster.

ACL2 has weaknesses (and perceived weaknesses)
as well, though there is ongoing work to address many of them:

It is not well integrated into other tools.
For one, there’s no strong connection with
higher-level specification languages like Z or B.
There’s little support to call out to other tools like other
proof checkers or model checkers (e.g., Otter/Mace/Prover9) -- such
integration would make it possible to use their capabilities to
automatically prove theorems when ACL2 cannot find the proof without help.
The latter would be very useful because although
many find ACL2’s theorem prover relatively
easy to guide, ACL2 needs to be guided
in cases where other theorem provers could automatically find the proof.
For more information see the work on Ivy and Mu-Calculus in
“Computer-Aided Reasoning: ACL2 Case Studies”, which could perhaps
be the basis for connecting ACL2 to other theorem provers
(like Otter/Prover9) and model checkers.

ACL2 doesn’t have a lot of support for reasoning about quantifiers
(for-all and there-exists).
Certifying Compositional Model Checking Algorithms in ACL2
(by Ray, Mattherws, and Tuttle)
identifies several ACL2 weaknesses:
its logic “has little support for modeling or reasoning about infinite
sequences” and does not “permit recursive function definitions with
quantifiers in the body”.
They also note how these weaknesses could be eliminated;
note that this work would also help integrating ACL2 with other tools.

If you are thinking about using ACL2, I highly recommend getting two books
by the tool’s creators. These are
Computer-Aided Reasoning: An Approach, which describes how to use
the tool, and Computer-Aided Reasoning: ACL2 Case Studies, which
gives worked examples on various problems.
They are absurdly pricey in hardback (around $215-$224 each!), so
buy softcovers from the authors instead.

PVS Specification and Verification System
(GPL License, as of the 4.0 release of December 2006) is one of the
other major theorem provers/verifiers, and it's also FLOSS.
As they say,
"PVS is a verification system: that is, a specification language
integrated with support tools and a theorem prover. It is intended
to capture the state-of-the-art in mechanized formal methods and to be
sufficiently rugged that it can be used for significant applications."

Twelf (2-clause BSD license)
is a programming system and language
"used to specify, implement, and prove properties
of deductive systems such as programming languages and logics."

Symbolic Analysis Laboratory (SAL)
(GPL license) is
a framework for combining different tools to calculate
properties of concurrent systems.
At its heart is a language devised by SRI, Stanford, and Berkeley,
for specifying concurrent systems in a compositional way. Here's
how they describe it: 'It is supported
by a tool suite that includes state of the art symbolic (BDD-based)
and bounded (SAT-based) model checkers, an experimental "Witness"
model checker, and a unique "infinite" bounded model checker based on
SMT solving. Auxiliary tools include a simulator, deadlock checker and
an automated test generator.'

mCRL2
(BOOST license)
stands for micro Common Representation Language 2.
"It is a specification language that can be used to specify and analyse
the behaviour of distributed systems and protocols and is the successor
to μCRL.
It is a formal specification language with an associated toolset.
Using its accompanying toolset systems can be analysed
and verified automatically.
The toolset can be used for modelling, validation and verification of
concurrent systems and protocols.
The toolset supports a collection of tools for linearisation, simulation,
state-space exploration and generation and tools to optimise and analyse
specifications. Moreover, state spaces can be manipulated, visualised
and analysed.

"mCRL2 is based on the Algebra of Communicating Processes (ACP)
which is extended to include data and time. Like in every process algebra, a fundamental concept in mCRL2 is the process. Processes can perform actions and can be composed to form new processes using algebraic operators. A system usually consists of several processes (or components) in parallel."
This uses the BOOST license, which is OSI-approved.

Otter/MACE
(public domain),
developed at the Argonne National Laboratory,
is the “first widely used high-performance theorem prover”
(according to Wikipedia).
It includes a built-in model checker (MACE2).
This is very powerful theorem prover, and has proved theories
unsolved by mathematicians. A sister project even proved a 60-year-old
conjecture, the Robbins problem, and made the New York Times;
many mathematicians failed to find the proof,
yet Otter handles it easily.
That makes Otter noteworthy, but Otter has since been superceded by
Prover9/Mace4, noted next.

Prover9/Mace4
(GPL license)
is a combination of two programs:
Prover9 is an automated theorem prover for first-order and equational logic,
(based on resolution/paramodulation), while
Mace4 searches for finite models and counterexamples.
Prover9 is a successor of the Otter Prover, with a tagline
"the future of theorem proving".
You can check the proofs produced by Prover9 using
Ivy,
a preprocessor and proof checker proved using ACL2.
I've used this one personally - if you have a problem that's easily
expressed in its language, this is a very good tool.

SInE (Sumo Inference Engine) (GPLv3)
is "a metaprover targeted on large theories, especially on SUMO".
This is actually a support program designed to make other other
first-order theorem-provers (like prover9 and E) much more effective
on large problems.
Programs like prover9 and E take the assumptions (axioms) and negated goal
and try to derive everything that can be derived from them, looking for
a contradiction.
That's fine, but if there are a vast number of irrelevant assumptions,
they get overwhelmed, and that's where SInE comes in.
SInE selects only "relevant" axioms that "define" the meaning of symbols
and then runs an underlying theorem prover.

SPASS (GPLv2)
is an automated theorem prover for first-order logic with equality.
It can be used for the "formal analysis of software, systems, protocols,
formal approaches to AI planning, decision procedures,
and modal logic theorem proving."
SPASS+T (GPLv2)
is an extension of SPASS that "enlarges the reasoning capabilities of
SPASS using some built-in arithmetic simplification rules and
an arbitrary SMT procedure for arithmetic and free function symbols
as a black-box."
Unfortunately, SPASS+T requires Yices (proprietary) or
CVC Lite (license currently unacceptable to distributors due to a
dangerous legal clause), so it cannot be included in the main repostitory
of a typical Linux distribution.

SRI's New Automated Reasoning Kit (SNARK) (MPL)
is an "automated theorem-proving program being developed in Common Lisp.
Its principal inference rules are resolution and paramodulation.
SNARK's style of theorem proving is similar to Otter's [and Prover9's].
Some distinctive features of SNARK are its support for special unification
algorithms, sorts, nonclausal formulas, answer construction for program
synthesis, procedural attachment, and extensibility by Lisp code.
SNARK has been used as the reasoning component of SRI's High Performance
Knowledge Base (HPKB) system, which deduces answers to questions
based on large repositories of information, and as the deductive core
of NASA's Amphion system, which composes software from components to
meet users' specifications, e.g., to perform computations in planetary
astronomy. SNARK has also been connected to Kestrel's SPECWARE environment
for software development."
Note that it directly supports numbers (Prover9 does not).

LEO-II (BSD)
is a "standalone, resolution-based higher-order theorem prover designed for fruitful cooperation with specialist provers for natural fragments of higher-order logic. At present LEO-II can cooperate with the first-order automated theorem provers E, SPASS, and Vampire.
LEO-II is implemented in Objective CAML and its problem representation language is TPTP THF."

csisat (Apache 2.0 license)
is a Tool for LA+EUF Interpolation.
That is, it is
"an interpolating decision procedure for the quantifier-free theory of rational linear arithmetic and equality with uninterpreted function symbols. Our implementation combines the efficiency of linear programming for solving the arithmetic part with the efficiency of a SAT solver to reason about the boolean structure."

Zenon (new BSD)
is an "automated theorem prover for
first order classical logic (with equality), based on the tableau method.
Zenon is
intended to be the dedicated prover of the Focal environment, an object-
oriented algebraic specification and proof system, which is able to pro-
duce OCaml code for execution and Coq code for certification. Zenon can
directly generate Coq proofs (proof scripts or proof terms), which can be
reinserted in the Coq specifications produced by Focal. Zenon can also be
extended, which makes specific (and possibly local) automation possible
in Focal."
Note in particular that Zenon generates proofs in a Coq-checkable format.
It doesn't seem to be maintained as of 2009, and that's a problem.

Muscadet3 (new BSD)
is a knowledge-based theorem prover written in Prolog
(it's known to work with SWI-Prolog).
It is able to work with first-order and second-order predicate calculus.
It is based on natural deduction and uses methods which resemble those
used by humans (vs. the resolution principle).
It is composed of an inference engine, which interprets and executes rules,
and of one or several bases of facts.
It accepts TPTP Problem library syntax.
It supports the usual infix connectives: & (and), | (or), ~ (not),
=>, and <=>.
It supports prefix quantifiers using TPTP's syntax:
! (for-all) and ? (there-exists).
Note that the Muscadet3's syntax is TPTP's, whereas Muscadet2 used
a slightly different syntax.
TPTP doesn't have a syntax for second-order expressions, and
Prolog cannot handle P(A,B) where P is a variable predicate.
Muscadet (as described in its manual, section 11)
predicate variables are expressed using ".."; e.g.,
P(A,B) where P is variable is written as ..[P,A,B].

Otter-λ (Otter-lambda) (MIT-style license)
is "a theorem-proving program. It accepts as input a list of axioms
and a theorem to try to prove, and if successful, it outputs a
proof of that theorem from those axioms...
Otter-λ is a first-order theorem prover (Otter) augmented by
lambda calculus and an algorithm for untyped lambda unification."

KeYmaera (Hybrid Theorem Prover for Hybrid Systems) (GPL) is
"a verification tool for hybrid systems
and built as a hybrid theorem prover for hybrid systems.
KeYmaera separates the overall verification workflow into two phase.
In the first phase you specify the hybrid system that you would
like to verify along with its correctness properties.
In the second phase, you can use KeYmaera and its automatic proof strategies
to verify the specified property of the hybrid system."
Originally it depended on the proprietary tool Mathematica.
However, on 2009-03-26 Andre Platzer reported to me that they've
been a lot of work on it - it no longer
requires Mathematica, and that it supports "a much more
flexible structure and even a nice out-of-the-box webstart to run".

JAPE (GPL)
is a configurable, graphical proof assistant.
It allows user to define a logic, decide how to view proofs, and so on.
It works with variants of the sequent calculus and natural deduction.

LoTREC
(CeCILL License) is
"a generic tableau theorem prover for modal logic. It is a suitable educational tool for students and researchers for creating, testing and analysing tableau method implementations."

Metis (GPLv2)
is "an automatic theorem prover for first order logic with equality".
Its website reports these features:
"Coded in Standard ML (SML), with an emphasis on keeping the code as
simple as possible; Compiled using MLton to give respectable performance
on standard benchmarks; Reads in problems in the standard .tptp file
format of the TPTP problem set; Outputs detailed proofs in TSTP format,
where each proof step is one of 6 simple rules; Outputs saturated clause
sets when input problems are discovered to be unprovable."
MLton is an "open-source, whole-program, optimizing Standard ML compiler"
(released under a BSD-style license).

MaLARea (GPLv2+ except for snow)
is a metasystem for "automated theorem proving in large theories
where symbol and formula names are used consistently.
It uses several deductive systems (now E,SPASS,Paradox,Mace),
as well as complementary AI techniques like machine learning
(the SNoW system) based on symbol-based similarity, model-based
similarity, term-based similarity, and obviously previous
successful proofs...
The basic strategy is to run ATPs on problems, then use the machine learner
to learn axiom relevance for conjectures from solutions, and use
the most relevant axioms for next ATP attempts. This is iterated,
using different timelimits and axiom limits. Various features
are used for learning, and the learning is complemented by other criteria
like model-based reasoning, symbol and term-based similarity, etc."

Coq (LGPL 2.1 license)
is a formal proof management system: a proof done with Coq
is mechanically checked by the machine.
(Coq does not create proofs for the most part,
it checks and manages them.)
Coq was used by Trusted Logic to evalute the Java Card (TM) system
at Common Criteria EAL 7 (see Why and Krakatoa, which are FLOSS tools
for verifying Java programs and can use Coq).
Coq supports defining functions or predicates,
stating mathematical theorems and software specifications,
interactively developing formal proofs of these theorems, and
checking these proofs by a small certification “kernel”.
Coq is based on a logical framework called
“Calculus of Inductive Constructions”.
If you want to learn more about Coq, consider the book
"Interactive Theorem Proving and Program Development Coq'Art:
The Calculus of Inductive Constructions".
There many tools that run on top of Coq, too.
Coq has been used by Xavier Leroy (main developer of OCaml) to write a
certified compiler (
compcert) that guarantees that semantics of a C
source program is kept up to PowerPC assembly.
The
specification of the compiler back-end is available as GPL software
(though unfortunately not the Coq proofs).
Although the compcert work is not entirely FLOSS,
the fact that it exists shows that complete formal methods
can be applied to a nontrivial software project.

Agda is in an interesting transition.
Agda 1
(MIT license)
is "an interactive proof editor, or proof assistant, developed in Chalmers University of Technology, in the tradition of succession of such proof assistants (ALF, Cayenne, Alfa). Its input language, called Agda language (or simply Agda), is based on a constructive type theory á la Martin-Löf, extended with dependent record types, inductive definitions, module structures and a class hierarchy mechanism."
A research development of its successor, the
Agda2 language and its interactive proof editor (MIT license,
with a few pieces GPL), is going on.
Agda2 is "a dependently typed programming language with good support for programming with inductively defined families of types."

Matita (GPL, in Debian)
is an
"experimental, interactive theorem prover under development at the
Computer Science Department of the University of Bologna. Authoring
interface Matita is based on the Calculus of (Co)Inductive Constructions,
and is compatible, at some extent, with Coq. It is a reasonably small
and simple application, whose architectural and software complexity is
meant to be mastered by students, providing a tool particularly suited
for testing innovative ideas and solutions. Matita adopts a tactic based
editing mode; (XML-encoded) proof objects are produced for storage and
exchange. The graphical interface has been inspired by CtCoq and Proof
General. It supports high quality bidimensional rendering of proofs and
formulae transformed on-the-fly to MathML markup."

PTTP (BSD-style)
(Prolog Technology Theorem Prover) is a theorem prover based on
model elimination.
The term “Prolog” here is a little misleading; PTTP extends Prolog to the
full first-order predicate calculus.
There are two implementations;
the Lisp version is faster (and is intended here).
PTTP is extremely fast and has low memory requirements,
at a cost of being unable to solve difficult theorems
(the author recommends using Otter for difficult problems that are
intractable for PTTP).

Isabelle
(BSD-like license) is a
generic theorem proving environment developed at Cambridge University
and TU Munich, building on Standard ML.
It’s an “LCF-style theorem prover” --
that means its ideas are descended from the old
“Logic for Computable Functions” (LCF) theorem prover, via
another system called HOL (see below).
In these kinds of theorem provers (including Isabelle, HOL 4, and
HOL Light),
you “drive” (control) how it tries to prove things using
commands written in the programming language Standard ML
(it does not automatically find a proof for you, but lets you
command it and it does the manipulations for you).

HOL Light
(BSD-like license) is similar to HOL 4 (which is derived from HOL Light), but
is an unusually light theorem-proving system
running on OCaml (Objective Caml).
You still need to drive the program to make a proof; HOL Light includes
a MESON command which is an automated proof search method
called “model elimination” -- this automated search sometimes works, instead
of guiding the proof by hand.

MetaPRL (GPL license)
is (1) “a general logical framework where multiple logics can
be defined and related”, and (2) “a system implementation with
support for interactive proof and automated reasoning”.
It has a “semantic connection to programming languages,
that allows the system to be used as a logical programming environment,
where programs are constructed as a mixture of specifications,
implementations, and verifications.”
An extract from their website should explain its purpose best:
“The MetaPRL system was implemented with the purpose of
supporting relations between logics. There is a huge investment
in formal work in systems like PVS, HOL, Coq, ELF,
Nuprl, and others. These systems use different logics
and different methodologies, but they have common goals and their
results share fundamental mathematical underpinnings.
Mathematical developments are expensive; our first goal
is to expose the logical foundations that the systems share,
to allow the results to be shared between systems...
Work is underway to relate the PVS, HOL, Isabelle,
and Nuprl mathematical foundations.”
MetaPRL is part of the Cornell Prl Automated Reasoning Project, and is
thus related to NuPrl.
MetaPRL is built using OCaml.

Interactive Mathematical Proof
System (IMPS) (special license, MIT-like plus requirement to
identify changes) is “intended to provide organizational and
computational support for the traditional techniques of mathematical
reasoning. In particular, the logic of IMPS allows functions to be
partial and terms to be undefined. The system consists of a database
of mathematics (represented as a network of axiomatic theories linked
by theory interpretations) and a collection of tools for exploring,
applying, extending, and communicating the mathematics in the database.”
It was developed by MITRE.

LeanCoP (GPL license) is
a compact theorem prover written in Prolog for classical
first-order logic which is based on the connection calculus.
It's actually only a few lines long! It's certainly not as powerful
as some of the other provers listed here (although it does perform more
strongly than you might expect), but its short length might make
it a good starting point for special purposes, or for learning
a little about how these tools work.
(Originally there was no license statement, but on 2006-05-31
Jens Otten sent me an email saying he intended to license under the GPL
shortly; on 2008-05-21 I confirmed that there's a license statement.)

Gandalf (GPL license)
is an automated theorem proving (ATP) system.
It has won several times in the CASC contest.

I have not included some tools in this list because I can't confirm
that they have a FLOSS license.
MAYA (originally part of Inka, something that
supports graphs and connects to various other useful components) has no
license that I can find; its "mathweb" component is clearly GPL'ed,
but it's unclear it's entirely GPLed, and it depends on the
proprietary Allegro Common LISP.
RRL has no license I can find, and I can't download it.
The lesson here is that if you develop a tool, you need to clearly
identify its license so that others can use it.

Here are tools that are model checkers that at least say they are FLOSS:

Spin
(Spin license, which is an issue)
is a model-checking tool for
formal verification of distributed software systems
(using ProMeLa, its modeling language).
Spin has been used in a variety of applications, e.g.,
to verify the control algorithms
of a new flood control barrier in the Netherlands, and to
verify selected algorithms for a number of space missions
(including Deep Space 1, Cassini, the Mars Exploration Rovers, and
Deep Impact).
The big problem in model checkers is “state explosion”; Spin
counters this proble using a technique called
“partial order reduction”.
Spin won the ACM’s prestigious
Software System Award
in April 2002.
Here's an article about how
to use Spin and Promula to verify parallel algorithms.

However, although the front page of the Spin project
says it has an open source license, and I believe that was their intent,
there are significant concerns that suggest it may not
be a FLOSS license at all.
Spin created their own unique license, an unwise practice that is
broadly discouraged (because it's so easy to get it wrong).
When people create their own licenses but are serious about making them FLOSS,
they generally submit it to opensource.org or the Free Software Foundation,
but neither the
Opensource.org license list
nor the
FSF license list
identify the Spin license as a FLOSS license.
That's rather suspicious.
What's worse, the
Debian-legal team noted some very serious problems with the Spin license,
suggesting that it's not a FLOSS license at all.
Thankfully, there are other tools available now which do not have a cloud
of licensing problems hanging over them.

DiVinE tool
(libraries GPL; tools appear to be as well)
is a model-checking tool for verifying concurrent systems
(and is thus similar to Spin).
DiVinE can itself run on a parallel distributed system,
making it possible to handle larger systems than Spin can.
It has its own native DiVinE modeling language.
Perhaps more interestingly, it can process C and C++!

NuSMV 2 (LGPL license) is a model checker that is a
re-implementation of SMV (so that a FLOSS version is available).
NuSMV, like SMV, counters the “state explosion” problem using
a construct called “BDDs”.

Murphi (BSD-new license + must rename changed version)
uses a language
based on a collection of guarded commands (condition/action rules),
which are executed repeatedly in an infinite loop
(similar to Misra and Chandy’s Unity model).
The language includes common data types
(subranges, enumerated types, arrays, and records), as well as
“Multiset” (for describing a bounded set of values whose order
is irrelevant to the behavior) and “Scalarset” (for
describing a subrange whose elements can be freely permuted).
Murphi has been used to verify many hardware components
and protocols.

BLAST
(Berkeley Lazy Abstraction Software Verification Tool)
(BSD license)
is a software model checker for C programs. BLAST checks
that software satisfies behavioral properties of the interfaces it uses.
Their description:
"BLAST is a software model checker for C programs. The goal of BLAST is to be able to check that software satisfies behavioral properties of the interfaces it uses. BLAST uses counterexample-driven automatic abstraction refinement to construct an abstract model which is model checked for safety properties. The abstraction is constructed on-the-fly, and only to the required precision."
A key limitation: It has only been tested with non-recursive programs
(recursive programs require use of an untested option).
It also has a licensing issue; it requires a solver, and the only ones
it is written to use are Vampyre, Simplify, and Cvc.
(Vampyre and Simplify aren't FLOSS; CVC was intended to be, so perhaps
it will have a license change.)

Java PathFinder
(NASA Open Source Agreement) is a model checker for Java bytecode.

The
Boolean satisfiability (SAT) problem
is
the problem of determining if the variables of a given Boolean formula
(where all variables can only be true or false)
can be assigned in such a way as to make the formula evaluate to TRUE;
alternatively, it's to determine
if no such assignments exist (i.e., if it's unsatisfiable).
SAT programs are low-level programs/algorithms that
many other formal methods tools (like theorem provers) build on.
In the last number of years there have been a lot of improvements in
SAT solvers, resulting in improvements on anything built on them.
SAT is a big area;
SAT live tracks SAT goings-on.
Here are some SAT surveys.
There are a number of competitions, including the
International SAT competition,

MiniSat (MIT license).
In the
SAT 2005 competition,
MiniSAT all by itself won Silver in the industrial categories
SAT+UNSAT and SAT.
MiniSAT is a "conflict driven solver", one of main (modern) styles of
SAT solvers.
SatELiteGTI is the combination of
SatELite (used as a preprocessor) with MiniSat (the “GTI” component).
SatELiteGTI won Gold in all three industrial categories:
SAT+UNSAT, SAT, and UNSAT.
I cannot find the license for SatELite, but the developers are making
SatELite obsolete anyway by incorporating its capabilities into their
updated version of MiniSAT.

MarchDL (GPLv2+)
is a SAT solver based on the "look-ahead" approach (one of the other
main modern styles of SAT solvers).
It won a prize at the 2007 SAT competition.

PicoSAT (MIT-style)
is a recent and strong SAT solver.
It did very well in the
SAT'07 SAT Solver competition;
Version 535 won the category of "satisfiable industrial instances"
and came second on all industrial instances (satisfiable and
unsatisfiable).

Vallst
(Reciprocal Public License, a GPL-like but stricter
OSI-certified
license)
is another SAT solver.
It won two golds and one bronze in the SAT 2005 “world championships”.

The
Satisfiability Modulo Theories (SMT) problem
is an extension of the SAT problem (above).
Basically, given expressions with boolean variables and/or
predicates (functions that take potentially non-boolean values yet
return boolean values), determine the conditions that would make it
true (or conversely, show it's false).
An SMT solver adds one or more "theories" for various predicates, e.g.,
it might add real numbers (adding predicates like
less-than and equal-to), integers, lists, and so on.
SMT solvers are sometimes implemented on top of SAT solvers.

Ergo (Alt-Ergo) (CeCILL-C license) is
an automatic theorem prover focused on program verification.
It supports equational theory (=) and linear arithmetic, and it's relatively
small.
One significant problem is that this is licensed under the extremely rare
CeCILL-C license, not the CeCILL license.
I can't find a major FLOSS organization who has ruled that the
CeCILL-C license is FLOSS
(including the FSF, OSI, Debian, or Fedora).
This license is intended to be FLOSS, but that is as yet untested.

CVC3 (BSD)
is "an automatic theorem prover for Satisfiability Modulo Theories (SMT) problems. It can be used to prove the validity (or, dually, the satisfiability) of first-order formulas in a large number of built-in logical theories and their combination."
CVC3 is the successor to CVC Lite.
The license of earlier versions of CVC3 included some highly-controversial
non-standard clauses, one of which was an "indemnification" clause
that to some appeared highly dangerous to any user or distributor.
Fedora eventually ruled that the license was "non-free" and thus unacceptable.
I contacted the developer, and although it took a long time, I'm delighted
to report that as of October 2009, the CVC3 license was changed to a
simple, normal BSD license, resolving the issue.

Gappa
(CeCILL or GPL, libraries LGPL)
is a tool "intended to help verifying and formally proving properties
on numerical programs dealing with floating-point or fixed-point
arithmetic. It has been used to write robust floating-point filters for
CGAL and it is used to certify elementary functions in CRlibm.
It requires Coq support library 0.8.
("Why" can invoke Gappa.)

Arithmetic and Boolean solver (ABSolver) (Common Public License 1.0)
is a framework for combining other tools to solve mixed
arithmetic and Boolean problems, and is designed to make it easy to
add new solvers.
ABSolver is remarkable in its ability to solve non-linear problems.
However,
"Efficient Solving of Large Non-linear Arithmetic Constraint
Systems with Complex Boolean Structure"
(Journal on Satisfiability, Boolean Modeling and Computation 1 (2007) 209–236)
warns that ABSolver's "currently reported implementation
uses the numerical optimization tool
IPOPT (https://projects.coin-or.org/Ipopt) for
solving the non-linear constraints.
Consequently, it may produce incorrect results due to
the local nature of the solver, and due to rounding errors."

Argo-lib
(GPLv2) is an SMT-LIB solver.
It is "a C++ library which provides a generic support for using decision
procedures in automated reasoning systems and also support for several
schemes for combining and augmenting decision procedures. This platform
follows the SMT-lib initiative which aims at establishing a library of
benchmarks for satisfiability modulo theories. ARGO-lib platform can
be easily integrated into other systems, but it should also enable
comparison and unifying of different approaches, evaluation of new
techniques and hopefully help advancing the field. ARGO-lib follows a
range of techniques and different systems. The latest version of ARGO-lib
provides support for DPLL(T) scheme and for producing object-level proofs."

OpenSMT (GPLv3)
is a "compact and open-source SMT-solver written in C++,
with the main goal of making SMT-Solvers easy to understand.
OpenSMT is built on top of MiniSAT (http://minisat.se)...
Currently OpenSMT supports only the theory of
Equality with Uninterpreted Functions [QF_UF]...
In the future we plan to extend OpenSMT with other theories."

"haRVey-FOL integrates a First-Order Logic theorem prover (hence
its name), i.e. the E-prover. It uses the superposition calculus as
implemented by the E-prover, to determine the satisfiability of Boolean
combinations of atoms with functions interpreted in a first-order theory
with equality."
haRVey-FOL includes a pre-processor (by Augusto Antonio Viana da Silva)
that removes axioms that are not relevant for the proof of the current goal,
which should make it more capable than provers without one.
haRVey-FOL (aka "Harvey") depends in turn on SPASS (for some utilities) and E.
Unfortunately, haRVey-FOL also depends on zchaff, which is definitely
not FLOSS (and thus can't be pre-packaged into various distribution's
main repositories); my hope is that a future version will be able to use
miniSAT2 or some other FLOSS SAT solver.

"haRVey-SAT is based on congruence closure, the Nelson-Oppen framework,
and rudimentary instantiation techniques to decide the satisfiability of
a set of atoms written with uninterpreted symbols, linear arithmetics,
some lambda-expressions, and some quantifiers. The Boolean engine is a
SAT solver (zChaff or MiniSAT), hence its name."
Although rv-SAT has promise, it's not appropriate for use with anything
related to high assurance as of August 2008, for it says:
"rv-sat is in early development stage. In particular, it is not
complete for (linear) arithmetics. However, rv-sat gives only two
answers: "sat" or "unsat". In the case the formula belongs to a fragment
for which rv-sat is incomplete, "sat" should be understood as "rv-sat
has not been able to prove unsatisfiability of the input formula". In
short: "sat" should only be trusted if QF_UF.
These incompleteness issues will be solved in future versions of the software."

"Current developments aim at merging both branches, and provide one
uniform tool. The main issues are
the logics are different (haRVey-SAT is multi-sorted, haRVey-FOL is not)
[and]
there is some technical and theoretical difficulties to combine first-order provers within a Nelson-Oppen scheme.
So, haRVey is still in development stage...".
haRVey downloads
are available (but watch out, some links are broken, so it
can be hard to find).

STP (MIT license) is
a Decision Procedure for Bitvectors and Arrays.
"STP is a constraint solver (also referred to as a decision procedure
or automated prover) aimed at solving constraints generated by program
analysis tools, theorem provers, automated bug finders, intelligent
fuzzers and model checkers. STP has been used in many research projects
at Stanford, Berkeley, MIT, CMU and other universities. It is also
being used at many companies such as NVIDIA, some startup companies,
and by certain government agencies.
The input to STP are formulas over the theory of bit-vectors and arrays
(This theory captures most expressions from languages like C/C++/Java
and Verilog), and the output of STP is a single bit of information that
indicates whether the formula is satisfiable or not. If the input is
satisfiable, then it also generates a variable assignment to satisfy
the input formula.
We are currently adding the theory of finite sets and the theory of
uninterpreted functions to STP."
There is a
SourceForge home page for STP.
It uses MINISAT.

Here are FLOSS tools that are don’t easily fit into the above categories:

Alloy (GPL)
implements a simple structural modeling language based on first-order logic.
This is a really interesting project; its language is similar
to Z, VDM, or UML constraints, but it can analyze the
results completely automatically (no theorem-proving or other
complexities) and display results graphically,
making it unusually easy to use.
The tool can generate instances of invariants,
simulate the execution of operations (even those defined implicitly),
and check user-specified properties of a model.
“The motivation for the Alloy project was to bring to Z-style
specification the kind of automation offered by model checkers.
The Alloy Analyzer is designed for analyzing state machines with
operations over complex states...”

Alloy
includes the Alloy Analyzer, “which is a model finder (not a model checker):
given a logical formula, it finds a model of the formula.
When an assertion is found to be false,
the Alloy Analyzer generates a counterexample...
Alloy Analyzer is essentially a compiler. It translates the problem
to be analyzed into a (usually huge) boolean formula. This formula is
handed to a SAT solver, and the solution is translated back by the Alloy
Analyzer into the language of the model. All problems are solved within
a user-specified scope that bounds the size of the domains, and thus
makes the problem finite.”
Because of its different approach, Alloy supports many higher-level
structures (such as sets, relations, tables, and trees);
“most model checking languages provide only relatively
low-level data types (such as arrays and records)”.
The tool is written in Java, and includes a GUI interface.

This tool looks like it’d be very useful for specifying
in some medium assurance environments, and I think it would be useful
for high assurance at level 0 (with a little more strength than usual
at level 0).
The notation is fairly clear, and the notation is specifically designed
so that assertions can be analyzed in a completely automated way
(unlike today’s theorem-proving).
Those are big advantages, and thus this is a good example of a
“formal methods light” tool.

However, note that it cannot prove that certain things can never
happen;
instead it can prove something like "X cannot happen within Y steps"
(you can choose Y to be as large as you like).
The phrase they use to describe it is a "model finder" approach;
basically, they try to create a model that falsify the claims.
Thus, while it can certainly give some confidence that the specification
is right, it often cannot “prove” things to the strength
that you usually want at level 1 or 2 for high assurance.
Notationally, Alloy is very different from the tools called "model checking"
tools; model checkers are typically designed to analyze compositions of
state machines running in parallel, and usually only support arrays and
records inside the state machines.
In contrast, Alloy supports more abstract notations such
as sets and relations.
I can easily imagine this tool being combined with other tools
(a theorem-prover or model checker)... this approach supports
quick tests for sanity, and then you could
prove in more depth if you needed to.

SPARK
is a subset/superset of Ada.
SPARK builds in the ability to define preconditions and postconditions;
its tools can then determine if the postconditions are met.
The SPARK language is designed so that SPARK code can be passed into
an Ada compiler (unchanged) for code generation.
Tokeneer is a serious example of a system implemented using SPARK.
SPARK doesn't support dynamic constructs, so it's not a good fit
for some applications.
On the other hand, there are many applications (say, control systems)
where this omission is considered a good thing, and if high
reliability is critically important,
SPARK should definitely be considered.

Why /
Caduceus /
Krakatoa
These are tools for verifying implementations (code, with emphasis
currently on C and Java); all are released under the GPL.

"Why" is a software verification tool; it is a general-purpose
verification conditions generator (VCG) for other
verification tools (including Coq and PVS), which it can call on.
It can be used as a front-end for many tools, including
calling out to many automated tools (so it can actually combine the results
of many different tools in a useful way).
On top of "Why" are two very interesting tools:

Caduceus
is a verification tool for C programs, built on top of Why.
It can even handle C programs with pointers (C pointers are notoriously
hard to handle, but tools that can't handle C pointers are useless for C).
This is obsolete; instead, use the "Jessie" tool included
in Why (Jessie requires Frama-C).

Krakatoa is a verification
tool for Java programs, also built on top of Why.

Saturn (BSD-like license)
is a program to statically and automatically
verify properties of large (meaning multi-million line) software systems.

JACK: Java Applet Correctness Kit
(Cecill C licence)
"The Jack tool provides an environment for verification of Java and Java Card programs with JML annotations. It implements a fully automated weakest precondition calculus that generates proof obligations from annotated Java sources. Those proof obligations can be discharged using different theorem provers.
An important design goal of Jack is that it is easy to use for normal Java developers, who use it to validate their own code. To allow developers to work in a familiar environment, Jack is integrated as a plugin in the eclipse IDE. Care has been taken to hide the mathematical complexity of the underlying concepts. Therefore Jack provides a dedicated proof obligation viewer, that presents the proof obligations connected to execution paths within the program. For each proof obligation, the relevant source code is highlighted. Moreover goals and hypothesis are displayed in a Java/JML like notation.
Our goal is to allow formal method experts to prove the correctness of Java applets, and moreover, to allow Java programmers to obtain a high confidence in the correctness of their application.
Currently proof obligations can be generated for
the Simplify theorem prover (notably used by ESC/Java) and
the Coq proof assistant.
The Jack proof manager sends the proof obligations to the different provers, and keeps track of proven and unproven proof obligations."

Forge / JForge (GPLv3).
"Forge is a program analysis framework that allows a procedure in a conventional object oriented language to be automatically checked against a rich interface specification. The framework uses a bounded verification technique, in which all executions of a procedure are examined up to a user-provided bound on the heap and number of loop unrollings. If a counterexamples exists within the bound, Forge will find and report the complete program trace, but defects outside the bound may be missed. To facilitate modular analysis, specifications can be embedded as statements in code, an idea borrowed from the refinement calculus.

The core Forge library... operates on programs constructed in the Forge Intermediate Representation (FIR), a simple, relational programming language. To analyze a program written in a conventional programming language, like Java or C, that program and its specification must first be encoded in FIR. We have built a command-line tool called JForge that analyzes Java code against specifications written in the Java Modeling Language (JML) by translating them both to FIR, and we have made this tool available for download as well. Others are working on a translation from C to FIR, and we welcome and encourage you to encode your own favorite language in FIR."

Splint, formerly named LCLint
(GPL license)
does static analysis of C programs, and is usually used in a medium assurance
mode that requires very little specification work from a developer to
help find some security flaws.
But splint is actually based on a long trail of research
into formal methods (on “Larch” specifically), and it supports far
stronger annotation and proof methods if developers choose to use them
that move into high assurance.

Daikon
(MIT-style; includes some components with other OSS licenses).
"Daikon is an implementation of dynamic detection of likely invariants; that is, the Daikon invariant detector reports likely program invariants. An invariant is a property that holds at a certain point or points in a program; these are often seen in assert statements, documentation, and formal specifications. Invariants can be useful in program understanding and a host of other applications...
Dynamic invariant detection runs a program, observes the values that the program computes, and then reports properties that were true over the observed executions. Daikon can detect properties in C, C++, Java, Perl, and IOA programs; in spreadsheet files; and in other data sources. (Dynamic invariant detection is a machine learning technique that can be applied to arbitrary data.) It is easy to extend Daikon to other applications; as one example, an interface exists to the Java PathFinder model checker."

Kodkod (MIT license) is a constraint solver for relational logic.
It is "an efficient SAT-based analysis engine for first order logic with relations, transitive closure, and partial instances. The current prototype, which includes a finite model finder and a minimal unsatisfiable core extractor, is being used as a backend to the Karun, Forge, and Miniatur code checkers, a course scheduler, the Alloy Analyzer 4.0, a network configuration tool, etc.
Unlike traditional model finders (e.g. Alloy Analyzer 3, Paradox, and MACE), Kodkod is designed to take advantage of partial instance information..."

SATABS (BSD-old style license, but with odd notification requirement
that may be non-FLOSS) is a verification tool
for ANSI-C programs.
It allows verifying array bounds (buffer overflows), pointer safety,
exceptions and user-specified assertions.

CCured (BSD-new license)
is a “source-to-source translator for C. It analyzes the C program to determine the smallest number of run-time checks that must be inserted in the program to prevent all memory safety violations. The resulting program is memory safe, meaning that it will stop rather than overrun a buffer or scribble over memory that it shouldn’t touch.”
I am skeptical that this would be used in a high assurance setting, but
I can’t help but mention it.

Jakstab (GPLv2)
is an "Abstract Interpretation-based, integrated disassembly and static analysis framework for designing analyses on executables and recovering reliable control flow graphs. It is designed to be adaptable to multiple hardware platforms using customized instruction decoding and processor specifications similar to the Boomerang decompiler. It is written in Java, and in its current state supports x86 processors and 32-bit Windows PE or Linux ELF executables.
Jakstab translates machine code to a low level intermediate language on the fly as it performs data flow analysis on the growing control flow graph. Data flow information is used to resolve branch targets and discover new code locations."

Deputy (revised BSD license)
is "a C compiler that is capable of preventing common C programming errors, including out-of-bounds memory accesses as well as many other common type-safety errors. It is designed to work on real-world code, up to and including the Linux kernel itself.
Deputy allows C programmers to provide simple type annotations that describe pointer bounds and other important program invariants. Deputy verifies that your program adheres to these invariants through a combination of compile-time and run-time checking.
Unlike other tools for checking C code, Deputy provides a flexible annotation language that allows you to describe many common programming idioms without changing your data structures. As a result, using Deputy requires less programmer effort than other tools. In fact, code compiled with Deputy can be linked directly with code compiled by other C compilers, so you can choose exactly when and where to use Deputy within your C project."

"Unlike many other safe C variants such as Cyclone and CCured,
Deputy is incremental and thread safe. That is, programmers are free to add annotations and modify code function-by-function. This is possible because Deputy does not change the representation of the data visible across function boundaries, which allows “deputized” modules to interoperate with standard modules. While the initial version of the file may contain several blocks of trusted code, subsequent versions will gradually eliminate this trusted code in favor of fully annotated and checked code."
[Beyond Bug-finding].

ManTa (GPL + public domain)
is a programming/specification language, and also the name of
the supporting development environment.
It is fundamentally based on letting users write algebraic specifications
of ADTs (abstract data types). “Its theoretical bases ensure that every program written has “mathematical meaning” (i.e. a model)”
It then lets you “Evaluate expressions by using a Rewriting Motor,
“Demonstrate ADT properties by using an inductive theorem prover..., [and]
Generate correct code which implements an ADT in ANSI C or Ocaml.”
This looks interesting, but as of May 2006 it seems to have stalled since 2001.
Thankfully, any FLOSS project can get restarted by anyone else, so if
there is interest, that’s all that is needed.

FoCaL / FoCaLize
(BSD-style).
The FoCaL overview says
"The Focal project attempts to provide a programming environment in
which certified programs can be developed. This environment is based on a
language including functional and object-oriented features. Moreover, this
language provides means for the programmers to write formal specifications
and proofs of their code, and to have them verified by a proof checker.
Thanks to inheritance and refinement mechanisms, Focal allows to make
several refinements of a specification until providing an efficient
executable code (obtained via a translation to OCaml).
Focal provides a library which implements mathematical structures up
to multivariate polynomial rings and includes complex algorithms with
performances comparable to the best CAS in existence."

Another FoCaL overview adds some detail:
"Focal, a joint effort with LIP6 (U. Paris 6) and Cedric (CNAM), is
a programming language and a set of tools for software-proof codesign. The
most important feature of the language is an object-oriented module system
that supports multiple inheritance, late binding, and parameterisation
with respect to data and objects. Within each module, the programmer
writes specifications, code, and proofs, which are all treated uniformly
by the module system.
Focal proofs are done in a hierarchical language invented by Leslie
Lamport. Each leaf of the proof tree is a lemma that must be proved
before the proof is detailed enough for verification by Coq. The Focal
compiler translates this proof tree into an incomplete proof script. This
proof script is then completed by Zenon, the automatic prover provided
by Focal."

As of May 2008 it looks more like early research work, but
it will probably mature over time.
They seem to have focused primarily on implementing
computer-aided algebra (CAS) systems so far.

Banshee
(most BSD License, some GPL) is “a toolkit that simplifies the task of
building constraint-based program analyses.
Program analyses are widely used in compilers and
software engineering tools for discovering or verifying
specific properties of software systems...
the analysis designer provides a short specification file
describing the kinds of constraints used in the analysis.
From this specification, BANSHEE builds a customized constraint
resolution engine which solves those constraints very efficiently.”
Banshee is the successor of
BANE
(MIT license).
"The pointer analysis application, which includes a C parser derived
from GCC, is also included and is released under the GNU General Public
License."

iProver (GPLv3).
iProver can is a general-purpose automated theorem prover, using
"a modular combination of first-order reasoning with ground reasoning. In particular, iProver currently integrates MiniSat for reasoning with ground abstractions of first-order clauses... [it]
can solve around 4843 problems out of 8984 in the TPTP-v3.2.0 library (with the default options)."

Darwin (GPL).
"Darwin is an automated theorem prover for first order clausal logic. It accepts problems formulated in tptp or tme format, non-clausal tptp problems are clausified using the eprover. Equality is not built into the currently implemented version of the calculus, it is instead automatically axiomatized for a given problem. Darwin is a decision procedure for function-free clause sets, and is in general faster and scales better on such problems than propositional approaches."

Paradox (GPL license)
is a tool that processes first-order logic problems and
tries to find finite-domain models for them.
Paradox won the SAT/Models class (generated most models)
in the CASC 2003 competetion for first-order logic tools.
Paradox can read problems in both TPTP and Otter syntax.
It is written in Haskell, and depends on MiniSAT.
Paradox is co-developed with Equinox, a first-order theorem prover.

Proved ML (CeCILL-B license)
is a variant of the ML language
"which focuses both on being able to prove programs and be really usable".
As of May 2008 it was not ready for serious use.

Murphi
(CMC license, BSD-like) is
Finite-state Concurrent System Verifier.
It was developed by David L. Dill,
who's done a lot of work related to
verified voting.

Proof General (GPL)
is a generic front-end for interactive theorem provers
(aka "proof assistants") based on the customizable text editor Emacs.
It works with Isabelle, Coq, PhoX, and LEGO, and has experimental support
for other tools like HOL and ACL2.

JMLEclipse
is "an Eclipse plugin that allows the integration of JML into
Eclipse's Java Development Tools (JDT)...
The whole idea behind JMLEclipse is to have an open framework that can be used as frontend for different JML tools."

Computer Algebra System (CAS).
Historically, CASs have been separate programs.
There are a large number of FLOSS CAS programs to choose from.

It's worth noting the integrating program
SAGE
(GPL and GPL-compatible licenses), a Python-based
program that integrates several CAS and other mathematical programs with
the goal of "Creating a viable free open source alternative
to Magma, Maple, Mathematica, and Matlab."

Examples of specific CAS programs include:

Maxima (GPL), with a GUI
provided by
wxMaxima (GPL).
This is a large system with a very long history; it is very mature.

Axiom (BSD-style library).
This is a large system, also with a long history.
One interesting aspect is that all values are "mathematically typed", that is,
it includes a type system.
"In its current state it represents about 30 years and 300 man-years of research work."

Yacas (GPL).
This is implemented in C++ and is designed to require relatively few
resources or dependencies.

Kayali (GPL)
is a GUI front-end of a CAS system.
It's a Qt-based GUI front-end to (a subset of) Maxima and gnuplot.
It is implemented in Python.

General-purpose upper ontologies.
For purposes of this paper, an ontology is something that provides
(1) identification of basic categories of objects (real or abstract),
(2) a way of determining what kinds of entities fall into those categories, and
(3) a way of determining the relationships between and among the categories.
An ontology is extremely useful when handling unrestricted
natural language; ontologies help you infer
some of the information that is implied but unstated in ordinary text.
They also help you structure problems if they have a rich set of object types.
See
Wikipedia's article on upper ontology and
formalontology.it
for more about ontologies;
you should probably also know about
W3C's
OWL Web Ontology Language.
A Comparison of Upper Ontologies
(Technical Report DISI-TR-06-21) gives a brief comparison of ontologies.
Some general-purpose upper ontologies are available as FLOSS:

Wordnet (BSD-style).
is itself an ontology of words along with other information (parts of speech
and definitions).
Thus, Wordnet is useful as a dictionary, and it tends to be used for
a lot linguistic work.
Wordnet doesn't have much about interrelationships between concepts
(other than synonyms/antonyms) in the way SUMO and other specs do, though,
so for some purposes Wordnet is paired with other information.

OpenCyc
(Apache License Version 2) derives from the Cyc project.
This is one of the oldest serious computer-processable ontologies.
Note that the data provided by OpenCyc is FLOSS, and
some programs that manipulate it are also FLOSS
(those tend to be Apache-licensed), but other Cyc-related
programs are not FLOSS.

These tools appear to be no longer maintained, but they may still be
useful as basis for new work:

OBJ3 (BSD-style).
The OBJ languages are broad spectrum algebraic programming
and specification languages, based on order sorted equational logic.
OBJ3 is a particular instance of this family.
Maude descends from OBJ3 (so see the entry for Maude).

GETFOL (BSD-style license).
This is old and doesn't appear recently maintained, but it
has a long history of maintenance so it may be of interest.

Not all formal methods tools are free-libre / open source software (FLOSS).
I thought I should briefly note a few, so that people can save time instead
of trying to track down their licenses.

"Proprietary" simply means that the program is not FLOSS, and thus users
do not have the right to use the software for any purpose,
view the source code, modify it, and redistribute it (modified or not).
Many tools were developed at public expense at universities, but are
nevertheless proprietary.
Even if the software's development was completely paid for using public funds,
do not assume that software will be released as FLOSS to the public.
In some cases there are "demo" binaries of the tool available to the public,
but the source code is
not distributed and/or there are significant limitations on the tool's use.

The following are proprietary tools: Barcelogic,
Boolector, HySAT, Spear, Yices, and Microsoft's Z3.
ESC/Java and Simplify are not FLOSS; they were once widely distributed,
but since they were never FLOSS, when their company's direction changed
they became abandoned, legally risky to use, and impractical to maintain.
Similarly, Z/Eves was once widely distributed, but it was never FLOSS, so
when its distributor stopped distributing it, users had no legal recourse.

MathSAT is not FLOSS,
even though it has the text of the GPL and LGPL licenses in its release.
Alberto Griggio explained the real situation to me in 2008:
"It is linked with the GNU multiprecision library GMP,
which is covered by the LGPL. So, the sourcecode doesn't
have to be available, and in fact at the moment it
is not, sorry. The tarball includes a copy of the GPL as well as of
the LGPL, because, as far as we know, you must ship a copy of the LGPL
if you link against an LGPL'd library, and the LGPL itself requires
that you also ship a copy of the GPL. Sorry if this was unclear,

Epigram
has no clear license statement.
I've sent an email requesting that this be clarified.

CBMC is a model-checker
with an almost free license, so it's probably not FLOSS.
It allows change, but requires notification before installation;
that certainly fails Debian's requirements (e.g., it fails the
"Desert Island" test).
Once again, it's frustrating that people want to create new licenses,
please STOP!

Bogor is non-commercial-use only, and thus not FLOSS.

HCMC is
a model-checking tool, but I haven't found a clear license statment
about it.

Many of these FLOSS tools are considered very strong, innovative,
and/or have been used for serious applications.
Spin and the Boyer-Moore Theorem Prover (the basis of ACL2) each
received the ACM’s prestigious
Software System Award,
which recognizes a
“software system that has had a lasting influence”.
Both have been used for extremely important applications,
from checking space probe algorithms through microprocessor design.
PVS is also widely considered to be one of the better tools of its kind.
Otter has been used to find new proofs previously unknown to mathematics.
HOL 4 and Isabelle are widely used among these kinds of tools.
MiniSat, Paradox, and Vallst have won awards in recent competitions
against other similar tools.
Alloy is a new tool, but I think it’s pretty innovative.

One type of tool I’m not including are
probabilistic / statistical / Monte Carlo model checkers, such as
PRISM (GPL license).
They appear to be valuable for medium assurance, but
I am skeptical that they are appropriate for high assurance.
One tool in this space I should note is
GMC
(GPL license, likely), which is a highly experimental
Monte Carlo-based software model checker for the gcc compiler suite.
Open-Source Model Checking explains more about GMC.
GMC is very experimental, and does not appear suitable for
development use at this time; I note it here
because it embodies some very interesting ideas, for those who
are interested in the up-and-coming research.
Monte carlo model checkers have great promise for medium assurance,
but because they only cover statistical likelihoods
(not all possible situations), I would be nervous about using
any of them for high assurance.

Sadly, many tools have completely disappeared from the world because
they were not released under FLOSS licenses.
What’s particularly galling is that
many governments pay for academics to develop code,
yet fail to require releases of that code (that they paid for!)
under FLOSS licenses.
In my mind, this is shameful; if my taxes paid for the tool,
then I should have the right to see, use, and improve it,
unless there is strong and specific evidence
that an alternative license would be better for that particular circumstance.
FLOSS licenses allow others to study, use, improve, and release
those original or improved versions, and are thus a much better
vehicle for making continued research possible.
Instead, often a tool is released (if it’s released at all)
only as a proprietary binary file
(which is unmodifiable and will eventually be unrunnable) or possibly
as restricted source code;
in either case its license often has lots of anti-FLOSS restrictions
(such as you can’t modify it or you can’t use it for
a commercial purposes).
The tool often cannot be used for commercial purposes except through
special licensing deals;
that might sound fine, but in practice this generally squelches
research through trial commercial use, and it also
prevents distributors (such as Linux distributors)
from making the tool widely available.
Then either (1) the academic loses interest, or
(2) a proprietary company builds on the tool and tries to build a business
based on it.
If the original academic loses interest, then no one else (even
other academics!) has the rights to build on that code.
The second option (proprietary commercialization)
sounds good, but remember,
we've had decades of almost uniform failure in trying to sell
formal methods tools
(Most restaurant starts fail, never mind niches like this
where there are few users and the tools take time to learn.)
In the end, the project usually fails.
In either case, the software ends up being unavailable to all --
even if government funds were used to develop it.
This is incredibly wasteful, and in my opinion this is one of the
primary reasons people don’t use formal methods as often: most of the
research work is locked up in software with proprietary licenses
that is eventually thrown away.
A company can do what it likes with its own money, of course,
but if it will not sell it as a proprietary product, I think they should
at least release it as FLOSS so others can build on it and improve
the field.
Governments have no such excuse; it is their citizens’ money
they are squandering.
I believe that, with a little creativity, governments could ensure
that such projects can continue and grow.

Obviously, it is possible to have proprietary tools with long support,
but investigate any such vendor very carefully.
If you use such tools, depending on proprietary versions can be a big risk;
if the company goes out of business (which is historically very likely),
your largest cost suddenly depends on an unsupportable tool.
In large markets,
open standards help --
by having multiple competing implementations which you can switch between,
competition lowers costs and incentivizes improvement.
There are a few standards in this area (e.g., those for Z and VDM).
But depending on competition often fails
when there is a thin market, as in the case of formal methods tools.
You might consider requiring the vendor to escrow their code as FLOSS if they
decide to stop selling or supporting their product, so at least
you can have a support option if the vendor leaves the business
(as most do).
This isn’t an empty concern;
here are some examples where a proprietary tool license has caused problems
for its users:

Z/Eves and Eves were once
distributed by ORA Canada with a non-FLOSS license.
They distributed the software no charge for non-commercial use
until June 3, 2005, but then decided to cease distribution.
As of May 2006, I know of no legal way to acquire
these tools (which were used in many places).
Even if they eventually become available elsewhere, it still illustrates
the problem.

The commercial product
“VDM through Pictures” (VtP) by IDE was once prominent, but appears
to have disappeared.
VtP was once prominently displayed in comparisons like
this one.
But Aonix was formed by merging
IDE and another company in 1996, and in May 2006 I cannot find the
product at all.
(If anyone can help me find this -- perhaps its named changed! --
I would really appreciate it.)

KindSoftware had this problem with ESC/Java2 -- Compaq/HP did not
make a FLOSS release of ESC/Java, and now has decided to abandon the work.
Suddenly, a tool that many people depended on is on
doubtful (and constraining) legal ground.
At first some researchers tried to make improvements anyway...
but as a result, they don’t have clear legal rights
to release their own work!
KindSoftware is now going back and redeveloping from scratch something
like the original ESC/Java software, so that
improvements can be legally released using a standard OSS license
(they currently plan to release ESC/Java3 as part of Mobius).
They’ve learned their lesson, and plan to release the results under the
GPL (which is a mainstream FLOSS license, and thus
does permit improvements by all).

In contrast, some of the tools that have been released as FLOSS
have resulted in incredible benefits to the world, and lower
risks for their users.
ACL2, Isabelle, HOL4, and splint (for example) come from very
long lines of research, and their continued use today demonstrates that
releasing software developed during academic research under FLOSS
licenses can have tremendous, long-lasting benefits.
(The computer algebra system
Maxima has demonstrated
the same thing; it’s been around since the late 1960s and is still
actively maintained.)
The
NuSMV project specifically
re-implemented the SMV tool, so they could get the benefits of being
a FLOSS project (permitting extensions like
TSMV, an extension of NuSMV to deal with timed versions of circuits).
Any company doing research would be wise to consider releasing its code
as FLOSS -- if it’s research, they can often receive far more than they
release.
I think it would be much wiser to require that
government-funded software development in academia
be released under a FLOSS license under usual circumstances.
That way, anyone can start with what was developed through
government funds and build on it, instead of starting over.

Indeed, in some ways, FLOSS is an ideal way to commercialize
formal methods tools.
Formal methods tools require people to learn and apply new skills,
so for bigger projects you generally need someone to help you
understand how to apply the tool.
Thus, the FLOSS commercial model of “give away the code and
sell support services” is especially easy to apply in this area.
And if the commercial company flops, the work is still available
for future research or for combining with other components.

I do not think that within a few years suddenly everyone will be
using formal methods, for a variety of reasons.
But I do think that over the next many years we will see a very
gradual increase in use of these tools in very critical areas.

Of course, one challenge is that assurance tools are often not
assured themselves.
Assurance tools could even be maliciously undermined;
see the discussion under compilers for more about “trusting trust”
types of attacks.
Here are few items related to assuring the assurance tools:

In the proof area, one approach that really helps
is to separate creating proofs
from checking proofs -- it’s often incredibly hard for computers
to find a proof, even with lots of human direction, because doing so
in reasonable time requires lots of clever heuristics.
But checking a proof afterward is fairly easy, so it can be done by
a relatively tiny program designed for just that purpose, which
can then be examined carefully.
Two systems written using ACL2 are worth noting;
Ivy (noted above) is a proof-checker of Otter-like theorems (those created
by Otter or MACE);
Representing Nuprl Proof Objects in ACL2: towards a proof checker for Nuprl
talks about writing a proof-checker for Nuprl
(an LCF-style prover, a family that also includes HOL 4 and Isabelle).
One of many papers about this approach is
A Trustworthy Proof Checker.

This idea of using proof checkers is particularly exploited in
proof-carrying code, where
code producers include with the code a formal safety proof (that
they must create);
the code consumer uses a simple proof validator to check, before executing
the code, that the proof is valid.

Another approach (which could be combined with the first one)
is followed by HOL, Isabelle, and other descendents of the
LCF theorem prover.
LCF-style provers separate the problem of creating proofs into two parts:
a lower-level engine which implements the logical rules,
and higher-level part that decides which rule to use.
For the latter, they
generally use a general-purpose programming language
(such as ML) where the theorem proving “tactics” are written.
The theorems themselves, however, can only be modified by
the lower-level engine using inference rules known to be valid.
That way, the higher-level driver choose poor strategies and
fail to prove something that is true, but it won’t
accidentally implement an invalid rule.

ACL2 is written mostly in itself, which is essentially a subset
of LISP. It does prove some properties of itself.

Tools can’t do everything for you; humans have to help create proofs,
and they often have to try many different paths to find the proof.
This haiku by Larry Paulson expresses some of
the challenges of that process:

Hah! A proof of False.
Your axioms are bogus.
Go back to square one.

(Yes, many tools are designed to counter bogus axioms, but
the basic point is still true.
Namely, anyone trying to prove properties of real systems often
struggles and has to restart several times,
going through many different approaches, before succeeding.)

A different issue is how to run the various formal methods tools
listed above.
A few of them are implemented in widely-available
languages also used for other purposes (like C, C++, or Java).
Obviously, they’ll need an operating system,
and usually need other common tools like text editors.
(Warning: Most analysis tools run on Linux/Unix and are either
not available for Windows, or only work on Windows with an
emulation tool like Cygwin --
making the tool slower.)
But there are already well-known FLOSS implementations
for these, so we don’t need to discuss them in more detail.

However, specification and proof systems are often built
“on top” of other (less common) programming languages.
These other languages are often specialized themselves, and
in some cases using the specification or proof tools also involves
interacting directly with the underlying implementation tools as well
(e.g., to control/“drive” the proof system).

Programming languages which are functional programming languages,
or have a functional programming subset, are very common for these purposes.
A functional programming language is simply a language where assignment is not
(normally) used, and thus there are no “side-effects” -- instead,
functions accept values and return values (like a spreadsheet does).
There are many arguments for the advantages of such systems, but
one reason is simply that such systems make it possible to use
much more of the arsenal of mathematics.
Functional languages usually have built-in support
for lists and other constructs useful for the purpose.
J Strother Moore’s position is that all highly-assured software should
be written in a functional programming language,
because it is much easier to prove properties about programs
written in them.
(Most widely-used languages are “imperative” languages, including
C, C++, Ada, Java, C#, Perl, Python, PHP, and Ruby.
Techniques for proving programs in imperative languages are known;
C. A. R. Hoare’s 1969
paper on Floyd-Hoare logic did so, as did
Edsger Dijkstra’s weakest precondition work that was part of his
1975 work on predicate transformers.
Moore argues that their complexities are not
worth it, and that using a functional approach makes proofs much easier.)

In some cases it’s hard to figure out where to place some language.
In particular I’ve placed Maude here, but Maude could arguably
be considered an analysis tool (and be placed above).
In any case, there are many useful FLOSS implementations of these tools too:

Lisp.
Lisp is one of the oldest programming languages in the world and
the Lisp family is still widely used for these kinds of applications.
Today there are three major variants in wide use: Common Lisp,
Scheme, and Emacs Lisp (the latter is built into the Emacs text editor).
There are also a host of other variants.
There are many FLOSS implementations of the Lisp family.
Emacs lisp is implemented, unsurprisingly, by Emacs (GPL).
ACL2, noted above, builds on Common Lisp.

A good FLOSS Common Lisp implementation is
GNU Common LISP (GCL)
(LGPL license); this implementation isn’t quite ANSI compliant,
but since it compiles to efficient machine code it’s often used
for proof-checking work (because this task is very compute-intensive).
Another high-performance Common Lisp implementation is
CMUCL (public domain).
GNU CLISP (GPL license)
also implements Common Lisp; it has a
bytecode implementation which makes it a little slower, but it's
very capable.

There are many FLOSS implementations of the Scheme variant of Lisp as well.
These include
GNU guile (LGPL;
small and quick to start up,
used in many GNU programs as an extension language),
Bigloo
(compiler and tools GNU GPL, library is GNU LGPL;
this has a higher-performance compiler and other components
focused on bigger development efforts, including a compile-time
type checking system),
umb-scheme (public domain),
and
Gauche
(command-line "gosh")
(BSD license).
Practical Scheme
is a useful place to start looking for libraries.
Teach yourself Scheme in Fixnum days
is a reasonable intro to Scheme.

I should note that although Scheme and Common Lisp have a lot of
shared history, as languages
Scheme and Common Lisp
are basically incompatible with each other.
Common Lisp has multiple namespaces (versus Scheme's one),
Scheme uses the special values #t and #f for true and false, with distinct
values for #f, NIL, and the empty list'();
Common Lisp uses the older Lisp convention of using the symbols T and NIL,
with NIL also representing the empty list, and non-nil being considered true.
In practice, Scheme programs use recursion where a Common Lisp program
would not (because Scheme guarantees tail-call optimization while
Common Lisp does not).
Most of the built-in functions have different names (because of different
naming conventions), and many have subtly different semantics (because of their
fundamentally different notions about lists and boolean values).
The program
scmxlate by Dorai Sitaram translates Scheme
into Common Lisp; the license in the package itself
isn't FLOSS, but Sitaram has separately
released scmxlate and other
packages under the LGPL, so it is FLOSS even if at first
it does not appear to be.

Both Common Lisp and Scheme have formal standards, so as long as you
stay with the standards, you can generally port from one to another.
Although Lisp's usual programming notation is different most other languages
(strictly prefix s-expression), and it has a history of being an "AI"
language, studies have found that both development time and performance
can be quite good.
Ron Garret (nee Erann Gat) did a study in writing a test program in
Lisp; the resulting Lisp programs ran faster on average than C, C++ or
Java programs (although the fastest Lisp program was not
as fast as the fastest C program), and the Lisp programs took
less development time than the other languages.
Norvig's "Lisp as an Alternative to Java" adds some commentary about this.

A new and interesting variant of Lisp is implemented in
Qi
(GPL license).
This is another Lisp variant implemented on top of Common Lisp; it supports
pattern-matching, optional static type checking, and is
“lambda-calculus consistent” (supporting partial things, like
partial applications).
The authors claim that the type system of Qi is more powerful and flexible than
the very powerful capabilities of ML or Haskell, because it is based on
sequent notation.
Their manual is interesting in its own right, they even discuss
implementing simple automated reasoning systems.
One problem with Lisps is that their usual programming notation
(s-expressions) is painful for most people to read; see my
Readable s-expression work
(including sweet-expressions) for techniques to overcome that.

Haskell.
Haskell is a fully functional programming language.
Unlike most other functional programming languages, Haskell is
lazy (non-strict): it doesn’t compute something unless it needs to,
and typical Haskell programs define infinitely-long constructs
(from which it’ll only compute the parts it needs).
Haskell also is completely functional;
most functional programming languages make
exceptions for I/O and other parts, but Haskell supports special constructs
(particularly monads) so that even I/O is handled completely functionally.
There is a public specification for Haskell.
Many consider the canonical implementations of Haskell to be
GHC (for speed)
and
Hugs
(for nice interactivity and smaller size).
Both of them are FLOSS, and GHC works hard to be very efficient.
There’s at least one kernel implemented in Haskell.
I’m not sure if
Haskell will be used to implement many high-assurance programs,
because it’s hard to reason about the execution time
of a Haskell program.
Others will disagree with me on this point!
But there’s no doubt that Haskell is used in many places
for reasoning about programs.

Clearly, you need to have a way to execute the highly-assured
source code... which requires either a compiler or interpreter.

If you write code in C or C++, it quite common to use the
gcc compiler suite (GPL license), which is FLOSS.
Developers of high-assurance software who choose to use
C or C++ often use gcc as well, so clearly there are FLOSS
tools covering these languages.

However, programming languages often used for
other assurance levels don’t work as well for high assurance.
Here are comments on some programming languages if you are
interested in high assurance:

C and C++ are widely used for a variety of good reasons, and in
many applications their weaknesses are not a big problem or are
easily surmountable.
But C and C++ are not well-suited for high assurance, because
for high assurance their weaknesses are a serious problem.
One problem is that it’s notoriously easy to make a mistake
in C and C++ that will be missed both by tools and by other reviewers.
Studies have shown that, on average, C/C++ programs tend to
have more mistakes
than those in many other languages; if any mistake
is a disaster, that’s a bad place to start.
Also, C/C++’s designs make it incredibly hard to
prove anything about code written in them.
In particular, most formal method proof systems cannot handle pointers well,
yet both languages are fundamentally designed around pointers.
It is possible to use C/C++ for high assurance programs;
some organizations go ahead and do it, and compensate using
draconian style guides, massive extra reviews, and so on.
But the “savings” from using common C or C++ tools
is often completely overwhelmed by the vast amount of extra
time and money to compensate -- and
that’s if it’s possible at all.
When creating high assurance components
it’s hard to justify using a language so poorly designed for the task.
If you do choose to use C, I would suggest looking at
Les Hatton's EC set - it's a set of rules that slightly subset C,
based on measurements of failures (from Safer C, T, and many other
studies) - that way, you're less
likely to make the same mistakes as your predecessors.

Java and C# are much easier to verify than C or C++, and
are specifically designed so that the common mistakes people make
in C and C++ are automatically detected by the compiler.
One challenge in using these languages are relevant implementations.
The most common Java implementation (Sun’s proprietary implementation)
expressly forbids the use of Java in safety-critical applications.
The only C# implementations I’m aware of, both FLOSS and proprietary,
depend on lots of other lower-assurance components (medium assurance
operating systems and so on).
There are FLOSS implementations
(gcj implements Java and Mono implements C#),
but as of May 2006 these are less mature FLOSS programs and
it’s not clear anyone should use them
for high assurance programs (I am sure they will mature over time,
just as gcc’s support for C, Ada, and C++ has matured).

One additional problem, too:
many high assurance programs must also be hard real-time systems
(with guaranteed execution times), and/or must prevent
“covert channels” of communication.
In practice, this often means that
runtime memory heap allocation can’t be used, including
automated garbage collection.
Without automated garbage collection, many languages -- including
Java and C# -- are impractical; their designs fundamentally require it.

Python and Smalltalk are wonderfully malleable,
making it easy to create complex things.
But the malleability becomes a problem when you’re trying to prove
what a program does.
In these languages it’s easy to redefine fundamental constructs
(say, a library function) in the language, while the program is running,
far away from their use or definition
(aka “action at a distance”).
The result is that it’s incredibly hard to be certain what
some code in these languages does -- again,
making them poorly suited for high assurance.
They lack strong static typing, so many would be
nervous about using them to implement high assurance -- static typing
can detect many problems during compilation.
(There is an extension for Python that supports static typing, though
it is not often used.)
These languages also require automatic garbage collection; as noted in the
text about Java and C#, sometimes that is not allowed.

Lisp is a useful language, and is comparatively easy to analyze.
However, it also lacks strong support static typing
(the basics of static type declaration exist but are optional in Common Lisp,
and type checking tends to be run-time not compile time).
It requires automatic garbage collection, again, a problem for some
applications.
Many people find Lisp programming syntax hard to read, which is
a severe disadvantage if you want people to review your code!

BitC is a very promising
new language that is being developed.
It is currently in the research stage (as of May 2006),
so I would not recommend it for general use at the moment,
but it is worth watching.
The BitC
specification says this:
“BitC is a systems programming language that combines the
‘low level’ nature of C with the semantic rigor of Scheme or ML.
BitC was designed by careful selection and exclusion of language features
in order to support proving properties (up to and including
total correctness) of critical systems programs...

BitC is conceptually derived in various measure from Standard ML, Scheme,
and C. Like Standard ML, BitC has a formal semantics,
static typing, a type inference mechanism, and type variables.
Like Scheme, BitC uses a surface syntax that is readily
represented as BitC data. Like C, BitC provides full control over
data structure representation, which is necessary for high-performance
systems programming. The BitC language is a direct expression of
the typed lambda calculus with side effects,
extended to be able to reflect the semantics of explicit representation.”

“In contrast to ML, BitC syntax is designed to discourage currying
[because it] requires dynamic storage allocation... Since there
are applications of BitC in which dynamic allocation is prohibited,
currying is an inappropriate idiom for this language.
In contrast to both Scheme and ML, BitC does not provide or require
full tail recursion [but restricts it in a way that] preserves all of
the useful cases of tail recursion that we know about,
while still permitting a high-performance translation of BitC code to C code.
Building on the features of ACL2, BitC incorporates explicit support
for stating theorems and invariants about the program as part of
the program’s text.
As a consequence of these modifications, BitC is suitable for the
expression of verifiable, low-level ‘systems’ programs.
There exists a well-defined, statically enforceable
subset language that is directly translatable to a low-level language
such as C.”

An implementation is being developed (BitCC) which generates C code
(which is then compiled), as part of the Coyotos project.
Currently BitC uses a Scheme (LISP)-like syntax, though a C-like syntax
may be built eventually.
BitC development work is being done as part of the Coyotos project.

Ada is widely used in the high assurance world, even though it’s
uncommon at lower levels of assurance.
This shouldn’t be surprising; Ada was specially designed for
high-assurance applications.
Ada has all sorts of type-checking and other kinds of built-in static
checks to detect defects before you can finish compiling,
which is a good thing if a single software defect could kill people.
Various studies (such as
German’s) find that Ada programs tend to have fewer defects than C/C++.
Ada can be used quite easily without pointers or heap allocations
(though it has both if you need them).
Ada is usually compiled directly to machine code, yielding fast results and
predictable performance.
Ada has abilities such as built-in commands so you can compare the
source lines and generated object code (this is part of annex H,
“Safety and Security”).
You can easily add “restrict” statements to forbid certain
structures (e.g., if your chosen formal method system
can’t easily handle them, or if never using them aids optimization).
For example, the
Ravenscar profile of Ada is a predefined set of restrictions
useful for many real-time programs.
ISO/IEC has produced a guide for using Ada in high integrity systems.
If you’re curious about Ada, feel free to visit my
Ada95 Lovelace tutorial,
including my discussion about
safety and Ada.
The paper
Refinement of Z specifications using reusable software components in Ada
by Hayward and Bale shows a simple formal Z specification and an
implementation in Ada, concluding that
“Ada has proved an ideal language to implement Z [formal] specifications”.
Not all high assurance programs are in Ada, but it’s
not an unusual choice.

But choosing Ada doesn’t prevent the use of FLOSS.
The FLOSS
GNAT Ada compiler (GPL license)
is also part of the gcc toolsuite that compiles Ada code,
and commercial support is available for GNAT from AdaCore.
GNAT is one of the very best Ada compilers around; it is widely used.
In any case,
you don’t have to use Ada for high assurance software development,
but if you choose it, you can use a very good FLOSS implementation.
Also, check out SPARK (discussed separately).

Sometimes the best approach is to create domain-specific
language, use that to define at least part of your system,
and then create a code generator for your language.
This makes it much easier to use languages you might not
be able to use otherwise.
This can be very effective, but you must still decide how to
implement the code generator (and the rest of the code, if applicable),
and you must somehow verify that the generated program does what you want.

In all cases, you want to use the tools in a way that tries
to catch as many mistakes as possible before the code gets out to the user.
For example, you would normally turn on essentially all warning flags,
and you would normally set up guidelines on how to use the language
(to avoid things that are known to be problematic).

Compilers are hard to verify too.
There’s been some research progress for formally verify that
a compiler correctly implements its language, but it’s not
ready for typical use yet with real compilers;
current work focuses on tiny toy languages, and even these are difficult.
The article
Qualification of Software Development Tools From the DO-178B
Certification Perspective notes some of the challenges in
qualifying tools directly (from the DO-178B perspective).
An approach more commonly used today is to have the compiler generate code,
and then do hand-checking to make sure that the generated code matches the
source code.
This is obviously not ideal, but it easily beats writing the
assembly code by hand, and if the compiler directly supports it
(the GNAT Ada compiler does) this is not to hard to do.
Using a widely-used compiler, and avoiding “stressing” the compiler,
reduces the risk of accidentally inserting an error into the final
machine code to a very small amount, and hand-checking can detect
such errors.

Intentional malicious subversion of compilers turns
out to be much more difficult to counter.
An Air Force evaluation of Multics,
and Ken Thompson’s famous Turing award lecture
“Reflections on Trusting Trust” showed that compilers can
be subverted to insert malicious Trojan horses into critical software,
including themselves, and it’s shockingly hard to counter the attack.
Thompson even performed the attack under controlled
experimental conditions, and the victim never discovered it.
For decades it’s been assumed that this was
the “uncounterable attack”.
Thankfully, my own academic work on
Countering Trusting Trust through Diverse Double-Compiling
shows how to counter subversive compilers, so even this
nasty problem is now solvable.

There are very few high assurance FLOSS components available.
A later section will discuss this, but for now,
here are the closest to high assurance FLOSS
components that I’ve been able to identify
(some are quite a stretch):

The
GNAT Pro High-Integrity Edition (HIE)
for the GNAT Ada compiler is focused on supporting
requirements such as TRCA/DO-178B level A and B, EUROCAE ED-12B
and DEF Stan 00-55, according to the vendor AdaCore.
It features a configurable run-time library (so you only need
to include what you need, simplifying certification).

Various L4-microkernel-related projects.
L4 is a family of microkernels (with a common interface),
and some recent work has focused on
making high-assurance L4-related microkernels.
Most of them are open source software.
Parts of the L4::Pistachio have been formally proved, and that is
definitely FLOSS.
There's an
ongoing effort to create a new L4 implementation that is formally verified
(seL4), though at this time I don't know if it will be FLOSS.
L4HQ is a useful general source of L4 information;
L4Ka discusses this to some extent.
Open Kernel Labs maintains
OKL4. This is released under essentially a meta-OSS-license, requiring
that any redistribution must include information on how to get the source
code, and that it be licensed under an OSI-approved license.
They work with the NICTA
L4,
L4.verified, and
seL4
efforts, among others.
This is a large set of interrelated efforts; to understand them,
I suggest reading
Towards Trustworthy Computing Systems: Taking Microkernels to the Next Level and
"Secure Embedded Systems Need Microkernels" by Gernot Heiser.
In many ways this is ongoing work; L4 kernels have been around a long time,
but the efforts to really create highly assured versions of L4 are much younger
(and influenced by work such as Shapiro's EROS work).

EROS (GPL)
is an operating system based on capabilities, spearheaded by the very smart
Jonathan S. Shapiro.
There are many
documents available about EROS,
including proofs of correctness for its
fundamental security mechanism (confinement).
In a sense the project named “EROS” has completed, and now has two
two successors.
One is the CapROS project (GPL),
which is a commercial derivative spearheaded by Charlie Landau
(one of architects of KeyKOS, which was the EROS predecessor).
CapROS builds directly on the EROS work as a small, secure, operating system.
Meanwhile, Shapiro is pursuing the
Coyotos project Coyotos project (GPL),
taking the ideas of EROS but making some significant architectural
changes.
As of April 2006 the Coyotos system is not yet complete, but
the current code can be obtained from the web site. Coyotos has research
objectives in the direction of formal verification, but is also expected
to deliver commercially some time this year. The GNU Hurd project (GPL)
has decided to build their next-generation system using the Coyotos
system as the core OS.

Halfs (LGPL)
is a filesystem implemented in the programming language Haskell.
Halfs can be mounted on Linux (via the FUSE interface)
and used like any other Linux filesystem,
or used as a library.
The background is that,
"In the course of developing a web server for an embedded operating system,
Galois Connections had need of a filesystem which was small enough to
alter to our needs and written in a high-level language so that we
could show certain high assurance properties about its behavior.
Since we had already ported the Haskell runtime to this operating system,
Haskell was the obvious language of choice.
Halfs is a port of our filesystem to Linux.
High assurance development is a methodology for creating software systems
that meet rigorously-defined specifications with a high degree of confidence...".

To be fair, though, Halfs isn't really high assurance as far as I'm
concerned.
Yes, they implemented a filesystem in Haskell, but you can implement junk
in any language (including Haskell).
Haskell does provide some type guarantees, but that's not enough.
It's true that transformations to machine code can be checked thoroughly,
but on the other hand, it's difficult to analyze resulting Haskell machine
code (in traditional procedural languages the correspondance
is clearer, and thus easier to check).
What's potentially interesting from a high assurance viewpoint
is that Haskell can be deeply analyzed
in a way many other languages can't, and so you could verify many
interesting properties of Halfs.
Unfortunately, I don't see much evidence that this
has actually occurred with Halfs.

More generally,
Galois's whole business model rests on developing
high-assurance software (including heavy use of formal methods).
They do most coding in Haskell, and they do contribute to many Haskell FLOSS
efforts (eg, the Glasgow Haskell Compiler which is under a BSD3
license), although often the core software they develop is proprietary.
John Launchbury's talk at CUFP may be of interest.

The
tiling window manager X monad is
FLOSS and it uses some techniques to gain assurance
(e.g., the property checker QuickCheck and the
partiality checker catch).
It is implemented in Haskell.

Minix was released under the BSD license (a FLOSS license) in
April 2000.
It turns out that in the late 1980s some work had been developed to
demonstrate how to evaluate operating system security, using Minix as a model.
In particular a mathematically formal model was developed and proved
for Minix
(“Minux Security Policy Model” by
J. Eric Roskos and Terry Mayfield, IDA Paper P-2112, May 1988, IDA,
4850 Mark Center Drive, Alexandria, VA).
This formally defined the overall security policy
for a simple Unix-like system which supported the usual user-group-other
permissions on files, with file owners making the permission decisions
(without any “mandatory” access controls
that users can’t override), and proved certain things
about the policy (it was “level 1” in the parlance above).
However, this was only a proof of a very high-level policy of the system.
Again, there’s little doubt that creating simple mathematical models of
systems and then proving their properties can be a very valuable use of
formal methods... but it’s not enough for high assurance, because
no attempt was made to give strong evidence that the system exactly
met the policy.
Nor did the authors claim otherwise;
this work was only to demonstrate what to do to meet
the old “Orange Book” level C2,
a “medium assurance” level.
Today’s operating systems typically support far more functionality
than Minix of the late 1980s, so even this is very limited.

The
Trusted Computing Exemplar (TCX) project (license unknown,
probably not FLOSS, no code released)
is working to “provide a working example that shows how
trusted computing systems and components can be constructed.
The project will develop a high assurance, embedded micro-kernel
and a trusted application built on top of the micro-kernel
as a reference implementation exemplar for trusted computing.”
Some very smart people, such as
Cynthia
Irvine and others at the
Center for Information Systems Security Studies and Research,
are involved in this project.
They are focusing on meeting Common Criteria EAL7 and beyond
(the highest defined assurance level).
In particular, they intend to
truly prove every line of code all the way down
(which is beyond what people do today, even in high assurance).
The outputs of the TCX project are intended to be publicly documented,
but it is not yet clear if its source code and specifications will meet
FLOSS license requirements;
the 2002 TCXX white paper
only says that their “framework will support the dissemination of project
deliverables using a philosophy similar to the open source approaches”,
which makes no particular commitment.

Unfortunately, the TCX project doesn’t seem
to be working to ensure that the project can live on
or be supported as a FLOSS project.
Many FLOSS developers will only work on projects
if they can be supported with FLOSS tools
(see The
Java Trap for a discussion).
Yet the project chose to use
the proprietary CM tool Perforce, and was seriously considering the use of a
tool (PVS) that was proprietary at the time instead of using a FLOSS tool
(such as ACL2).
PVS is now FLOSS, but that is not the point.
The point is that
the project failed to even consider licensing as a selection
factor when it performed its
survey
of tools.
If the TCX fundamentally depends on non-FLOSS tools, it
will probably be mostly ignored by FLOSS developers, because it
will essentially be unusable to them (after the Java Trap and the
Linux kernel’s Bitkeeper crisis, few will be interested in depending on
non-FLOSS tools).
There’s also no evidence of outreach to the FLOSS community;
if FLOSS were being considered as a future maintenance strategy,
that’s a curious omission.
Also, since TCX is focusing on being a demonstration, it is unclear
if the results will be usable as an implementation in real projects.
And besides all this,
there is no plan for it to undergo a formal Common Criteria evaluation or
any other independent evaluation,
even though that is the specification they are using as their baseline.
And finally, I suspect that the project has been halted;
I did not get any responses from email queries, and I’ve heard
separately that the project stopped.

The
Distributed Trusted Operating System (DTOS) project
(no source code release known, but the FLOSS SELinux followed on)
was a joint effort by the National Security Agency (NSA)
and Secure Computing Corporation (SCC) to encourage strong,
flexible security controls in next generation operating systems.
DTOS was a successor of the Distributed Trusted Mach (DTMach) project.
DTOS is no longer active, but the
Flask and Security-Enhanced Linux (SELinux) projects have carried on as its
successors.
SELinux in particular is FLOSS, and since some of the lessons from
DTOS were used in developing SELinux, it may be that some of the DTOS
information would be relevant.
The
DTOS lessons learned paper
(27 June 1997) is especially interesting.

There’s much more available if you are simply looking for a
real-time operating system (RTOS) kernel, and aren’t really
looking for high assurance.
eCos is a FLOSS RTOS,
for example,
and there are several projects that create “real-time” versions of the
Linux kernel.
Also, if you want a hypervisor, Xen is very popular and is FLOSS.
But none of these are high assurance.

There are a lot of FLOSS tools to help achieve medium assurance, and
there are also many FLOSS programs that achieve medium assurance.
Let’s look at them, so we can contrast the situation of
FLOSS tools and components in medium assurance against those in high assurance.

FLOSS tools for medium assurance

There are a number of FLOSS tools to help find defects in programs.
These include splint and flawfinder; my
Flawfinder home page
identifies many of these program analysis tools, such as
PMD for Java.
The Linux kernel developers even developed their own static analysis
tool to examine the kernel (kernels are very different from application
programs, and their common failure modes are
different than applications’).
But note that these tools are generally designed to
to look for specific common defects and defects that are easy for
tools to find.
These could have some value in high assurance applications
as well, since these tools could quickly filter out problems before
bringing out the “big guns”.
But these tools aren’t enough;
a program can pass every such tool and still have serious flaws.
It’s also worth noting that these tools often have many false positives
(“bug” reports that aren’t really defects at all).

I should note that it’s been widely and repeatedly
proven that the most efficient method for finding and repairing defects,
in terms of found and repaired defects per person-hour spent, are
peer reviews -- groups of people actually looking at the code.
There are various processes for this, going under names such
as “software inspections” (Fagan and Gilb have different processes
sharing that name).
Peer reviews are low-tech, and often not exciting to do...
but they’re so effective that
anyone leading a software project should be interested in them.
Andy German’s 2003 paper “Software Static Code Analysis Lessons Learned”
reports that “independent code walkthroughs are
the most effective [static code analysis] technique
for software anomaly removal [, finding]
up to 60 percent of the errors present in the code.”
I co-edited and co-authored an IEEE Computer Society Press book,
“Software Inspection: An Industry Best Practice”, which if you can get it
(it’s now out of print) presents lots more information about it.
The Formal Technical
Review Archive lists many other texts.
For high assurance, 60% is not 100%, but for medium assurance it’s fantastic.
The known effectiveness of peer review explains why the
“many eyeballs” idea of FLOSS really can work.

FLOSS components at medium assurance

Several FLOSS implementations have undergone traditional
medium assurance evaluations.
There are at least two Linux distributions that have undergone
Common Criteria evaluations (Red Hat and Novell/SuSE) at what I
would term medium assurance levels.
Similarly, the cryptographic library OpenSSL has undergone FIPS evaluation.
To be fair, a program that has not undergone a Common Criteria
evaluation process might be more secure than one that has, and
a program that underwent a lower EAL Common Criteria evaluation might
be more secure than one at a higher level.
The Common Criteria is a standardized set of processes
for evaluating the security of software, and higher EAL values add
more evaluation processes.
A product that has not gone through the higher levels might still
be very secure -- all we know for sure is that no one has paid for the
higher-level evaluation to be performed.

Unsurprisingly, many other popular FLOSS programs and systems have
undergone many tool-based evaluations searching for flaws as well:

The paper
Model Checking An Entire Linux Distribution for Security Violations
describes the use of model-checking (with MOPS) to find defects in
a Linux distribution;
it helped find defects in 60 million lines of code.
One clarifying point:
Model-checking is a general technique that can also be used to prove
properties for high assurance, but in this case the technique only
proved the presence or absence of specific types of errors in
certain patterns. It did not determine if a program
had no errors.
(That being said,
MOPS is still
an interesting tool.)

Many tool vendors intentionally apply their tools to FLOSS programs,
since they make great test cases and reporting their results can help
both the tool vendor and the FLOSS project.
Examples
of tool vendors examining FLOSS programs include
the “Fuzz” studies, Reasoning’s analysis of the Linux kernel and MySQL,
and a study by Coverity.

The
Open Source Quality Project
investigates techniques and tools for assuring software quality,
and focuses on “designing and building tools to improve
the quality of Open Source software.”
Their rationale is that FLOSS is “attractive as a research vehicle
in software quality because of the critical role it plays in
the nation’s economy and precisely because it has the unique
feature that it is a real-world system that is
completely open and available for study.
Because of the Open Source tradition of incorporating useful
new techniques and tools into the Open Source environment,
there is also an opportunity for direct and widespread impact.”
There are many other similar academic projects, for the same reasons.

Looking more broadly, it is clear that there are many FLOSS
projects which take significant steps to search for and remove defects.

General issues: FLOSS and medium assurance

Many experts have concluded that FLOSS has a potential advantage over
proprietary software when it comes to security or reliability,
though not all FLOSS
programs are more secure than their competitors.
This is borne out by many studies of actual FLOSS programs, which show
that FLOSS does very well in terms of security
and reliability (often much better than their proprietary competition).

FLOSS programs always have the potential for massive (worldwide) peer
review, both during initial development
(e.g., by a community of developers) or afterward.
Thus, many defects are detected before customers use the results.
For many of the larger and more important components, such as the
Linux kernel and Apache web server, there is ample evidence
that in fact this happens both during initial development and
afterward.

FLOSS developers tend to work to develop better code,
because they know their name will be associated with
the code that is reviewed worldwide.

FLOSS programs tend to have well-defined interfaces and careful
designs, since their developers must usually work worldwide and
cannot depend on walking over to someone else to understand a design.

FLOSS programs are typically less pressured to deliver an
inadequate product; if the product isn’t ready, the world knows it.

The “Fuzz” studies in particular studied several sets of
proprietary and FLOSS programs, and found that the FLOSS
programs had markedly superior reliability... this is one
of the few studies that compared FLOSS and proprietary programs
as a set, instead of comparing one instance to another instance.

A detailed study of two large programs
(the Linux kernel and the Mozilla web browser) found evidence that
FLOSS development processes tend to produce more modular designs.
See Harvard Business School’s “Exploring the Structure of
Complex Software Designs: An Empirical Study of
Open Source and Proprietary Code” by Alan MacCormack, John Rusnak,
and Carliss Baldwin (Working Paper Number 05-016) for the details.
It’s generally accepted that there are important benefits to
greater modularity, in particular, a more
modular system tends to be more reliable, easier to change over time, and
more secure.

In addition, many FLOSS programs have now built-in countermeasures to
counter security problems in other components.
"Security Enhancements in Red Hat Enterprise Linux (beside SELinux)"
by Ulrich Drepper describes many mechanisms in Red Hat Enterprise Linux
that counter or limit damage even if another component has a vulnerability.
I'm a strong advocate of this belt-and-suspenders approach to security;
by all means, eliminate vulnerabilities in components, but also deploy
defensive measures so that unfound vulnerabilities are less serious.

In short, there is every reason to believe FLOSS components
can be at least as reliable and secure as proprietary components,
if not more so.

So now we come to an interesting question --
why are high assurance FLOSS components so rare?
There are very few real high assurance components available, period,
so we would expect there to be few FLOSS components.
Yet FLOSS still seems under-represented.

Jeffrey S. Norris and Poul-Henning Kamp’s
“Mission-Critical Development with Open Source Software: Lessons Learned”
(IEEE Software) clearly shows that people do develop mission-critical
systems at NASA using FLOSS components;
they relied heavily on FLOSS components, reporting that they
kept the project within budget and resulted in a more robust and
flexible tool.
The text suggests that these were medium assurance applications,
but the lessons are worth considering at any assurance level.

Key vendor-differentiating software is usually not FLOSS.
If a company or government
uses a particular piece of software as part of their
competitive advantage, they should usually not release it
at all (as either proprietary or FLOSS).
But most software is actually not company or government differentiating,
and even if the software as a whole is, some of its pieces are
still typically commodity components.
A company that sells software licenses to others
will often not choose to
release that program as FLOSS -- but if they have customers
that use the software, sometimes those users find that
using a FLOSS program instead has its advantages.
Components that are commodities shared among many devices,
such as a separation kernel or real-time operating system kernel,
would make sense as FLOSS.

There’s fragmentation in
the high assurance market (e.g., different detailed requirements
in different standards and different circumstances) which
probably doesn’t help.
But this harms other approaches for developing software too,
and increasingly this problem is being recognized and addressed.

Clearly high assurance components are
normally required to go through expensive independent evaluations
(due to various regulations).
But again, that is a hurdle that has been overcome before,
many times.

FLOSS is widely represented in other areas.
There are many medium assurance FLOSS programs that have better security
or reliability records than competing proprietary programs.
FLOSS is certainly well-represented by tools to create medium and
high assurance components as well.
A vast number of FLOSS tools are available, in fact, for creating
high assurance components
(and many have already been used for the purpose,
including ACL2, SPIN, and GNAT).

It’s even more bizarre when you compare software proofs with
the way normal mathematical proofs are made.
“Normal” mathematicians publish their proofs, and then depend on
worldwide peer review to find the errors and weaknesses in their proofs.
And for good reason; it turns out that many formally published
math articles (which went through expert peer review before publication)
have had flaws discovered later, and had to be corrected later or withdrawn.
Only through lengthy, public worldwide review have these problems surfaced.
If those who dedicate their lives to mathematics often make mistakes,
it’s only reasonable to suspect that software developers who hide their
code and proofs from others are far more likely to get it wrong.
Joyner and Stein's
"Open Source Mathematical Software" (Notices of the AMS, Nov. 2007)
notes how well FLOSS matches the methods in mathematics
for producing high-quality results, and includes some interesting quotes.
Andrei Okounkov (2006 Fields medalist) notes that
"I think we need a symbolic standard to make
computer manipulations easier to document
and verify... An open source
project could, perhaps, find better answers
to the obvious problems such as availability,
bugs, backward compatibility, platform independence, standard libraries, etc...
I do hope that funding agencies are looking into this."
Neubüser notes that with proprietary software
"two of the most basic rules of conduct in mathematics are violated:
In mathematics information is passed on free of charge
and everything is laid open for checking.”

The nonsense that FLOSS is more vulnerable to insertion of malicious code
doesn't wash, either. Any FLOSS or proprietary program can be modified -
just get a hex editor.
The trick is to get the maliciously-modified program into a customer's
supply chain, and that's much harder: You have to get the malicious code
into the FLOSS component's trusted repository, and not noticed (else it
will be removed).
"Application Security: Is the Backdoor Threat the Next Big
Threat to Applications?" by Scott Berinato
(in CSO Online)
interviewed security researcher Chris Wysopal of Veracode.
"As detection and scanning technology gets better at finding the
accidental coding errors like buffer overflows, Wysopal believes the
malicious will turn more and more to using backdoors--holes in programs
usually intentionally programmed in to allow access to an application."
Wysopal goes on to note that
"The lifetime of a backdoor in open source is very short. It’s measured
in weeks. The lifetime of a backdoor in closed source is measured
in years. The many eyes concept of open source is working to detect
backdoors. We found that in most open source cases, the malicious or
accidental opening was detected in a matter of days, sometimes a few
weeks. But every backdoor in the binary of proprietary software was
there for years or an indeterminate length of time. It never happened
that closed source backdoors were discovered in months. With an old one,
Borland Interbase, we saw seven years worth of versions where a backdoor
was there."
The interviewer was surprised,
and appears to have been unaware of FLOSS
trusted repositories and how they work.
He asked,
"with so many people manipulating open source code, the number of
backdoors to detect must be exponentially higher than proprietary systems,
and the potential virulence, of spreading backdoors, must be much higher
with open source?"
Rather than explain this, Wysopal went immediately to
the measure that mattered:
"when we looked at special credential backdoors,
the four biggest were all closed source products."
The notion that FLOSS always makes you more vulnerable
to backdoors is contrary to real world experience.

Yet the source code and proofs for high assurance programs
are almost never published publicly at all (never mind
being released as FLOSS).
This means that many "high assurance" programs fail to exploit the methods
used by mathematics to strengthen claims.
Thus, there’s a good case to be made that high assurance FLOSS programs
would tend to be much higher assurance than proprietary programs,
because there could be a worldwide review of the proofs.
At least for safety-critical work making FLOSS (or at least world-readable)
code and proofs would make sense; why should we
accept safety software that cannot undergo worldwide review?
Are mathematical proofs really more important than software that
protects people’s lives?

Several potential explanations come to mind, and I suspect only the
last two are really true:

Is high assurance too expensive to do using FLOSS?
Certainly, developing high assurance components requires a great deal
of personnel time, but this is true for other assurance levels too
(though they place their effort in different areas).
As I demonstrated in my paper
“More than a Gigabuck”,
FLOSS development approaches are quite capable of employing massive
amounts of development effort.
In 2001, the old Red Hat Linux 7.1
represented more than one billion U.S. dollars of development effort,
and that is a small fraction of today’s FLOSS development effort.
It’s true that independent evaluations are normally required at high
assurance levels, and they require cash payments that may be harder to get.
Yet the fact that other FLOSS programs have undergone independent evaluations
(such as Common Criteria evaluations for Red Hat Linux and Novell/SuSE,
and FIPS evaluation for OpenSSL) suggests that this is probably not a real
barrier either.
At least, it’s not a barrier if there’s an
economic model to support it, which brings us to
our next point.

Is there no possibility of a rational economic model for FLOSS high assurance?
FLOSS is a licensing model that is typically
associated with certain development and distribution models.
FLOSS is not a single business model;
instead, there are many different business and non-profit economic models
that can be employed by FLOSS projects.
Business/economic issues in FLOSS are discussed in many places, including
Bruce Perens’
“The Emerging Economics of Open Source Software”
and
Eric Raymond’s “The Magic Cauldron”.
Indeed, there’s a micro-industry of
economists who analyze FLOSS business
models, because they think FLOSS “shouldn’t” work
(due to failure to understand it) but it does anyway.
There are many different business models for FLOSS, but for our
purposes they all involve either making money and/or reducing expenditures.
The “make money” models typically do this by
charging for warranties, indemnification, support, or related services
(the latter is sometimes an example of “commoditizing your complements”).
The “reducing expenditures” models typically involve
cost sharing or
cost avoidance, and often look like consortia with an unusually efficient
legal framework (Apache and the X consortium are examples).
Linux distributors typically do both; they sell support and services
(to make money), while exploiting the far lower expenditures to
compete against established proprietary organizations.
Lots of organizations do this in many other areas, so unless there’s
something especially unique to high assurance, it’s hard to believe
the approach can’t work for FLOSS.

Commercial organizations often support FLOSS projects when it’s
in their financial interest to do so.
Essentially all work in major FLOSS projects like the Linux kernel and
Apache website are done through software developers paid to do so.
Even in high assurance tools there are examples;
ACL2 is financially supported by many corporations;
AMD, Rockwell Collins, and Sun Microsystems are three of them, and they’ve
supported ACL2 because they’ve used it in microprocessor development.
Not all FLOSS projects are financially supported like this, of course,
but today’s corporations are willing to do so when
it’s in their own interests.

Certainly, there’s a big problem that there is a very high “initial fee”
to create and evaluate an initial product, but that’s true
for other FLOSS products too.
For years it was assumed that modern compilers and operating system
kernels were “too hard” for FLOSS;
that myth has been completely blown.
And besides, these also mitigate against proprietary suppliers;
proprietary suppliers have difficulty raising enough money in these
cases too.
These initial fees could be addressed by having
several large integrators or users (who use
lots of high assurance components) pool their funds to create
such a component, with the idea that they’ll reduce their overall costs.
Such organizations manage to create consortia in other areas; there’s
little reason they can’t handle this too.

Of course, such organizations could claim that they’re worried
about “free riders” who use the component without paying for initial
development, but that’s missing the economic point.
Nearly all costs for software are not in development --
they are in maintenance.
One reason FLOSS projects do so well is that many people copy the software
(starting out as “free riders” ), but a small fraction eventually
contribute valuable information (bug reports, patches, etc.)... because
they must to cost-effectively use the product.
As a result,
releasing a component as FLOSS allows the maintenance costs -- the majority
of all costs -- to be shared, even among those who did not pay
for initial development.
This means that a proportion of those
who start as “free riders” eventually help share the
maintenance costs, reducing the maintenance costs for all.
Since maintenance is the primary cost, there is a reasonable economic
rationale for releasing software this way (in some cases).
The initial investors have an additional reason to invest:
they will decide what will be implemented first, and how, and
many have decided that this additional control
is worth the initial investment.
That doesn’t mean that FLOSS is always the best economic approach, but
it does mean that it’s an approach worth exploring.

The only obvious true difference I see are liability issues, but this could
help FLOSS implementors.
Some people may think that the liability costs of high assurance software
will make it “impossible” to use FLOSS -- yet this misses the point.
Liability is something customers are willing to pay for, and even better for
a FLOSS business, customers would typically have to
pay for liability protection and support.
AdaCore, who support the GNAT
Ada compiler, depend on this and seem to be doing well.
So this makes high assurance unusually tempting for a FLOSS business --
customers can try out the product (making it more likely to be
considered), but will be required to pay the supplier through laws and
contracts requiring liability protection.

Many different economic models could be devised; let’s imagine just one.
A consortium could be established to create such high assurance components,
and give a liability price break to its founders -- with others paying
a higher (but reasonable) price to gain liability protection and support
(perhaps the costs go down after paying double the costs
of the initial founders).
Such a consortium could encourage code and monetary
contributions through a variety of means (e.g., by using the LGPL license,
or using a dual-license approach with the GPL license being one of them and
a for-pay license being the other).
Organizations like major government integrators might be interested in
such a consortium, because it would be a way to reduce their expenditures...
and they would not want to have to compete against those in the consortium,
if they were outside it, since those inside
might have lower future expenditures.
Many variations of these ideas are possible, of course.
There seems to be little evidence that
there’s no economic model for FLOSS high assurance components.

Is the expertise too specialized and hard to acquire?
Developing high assurance components truly is a specialized art.
More importantly, there are few published examples of how to actually
do it, so it’s very difficult to learn how to do it.
In fact, there are reasons to fear that some of this knowledge
at the highest end is disappearing and will need to
be re-discovered when its practitioners die.
The lack of published proven programs is a real problem!

Halloran and Scherlis’ paper
“High Quality and Open Source Software”
notes that in the medium assurance realm,
the “quality and dependability of today’s
open source software is roughly on par with commercial [proprietary]
and government developed software”, and then asks,
“what attributes must be possessed by quality-related interventions
for them to be feasibly adoptable in open source practice?”
They then note several attributes, which would presumably apply to
high assurance as well.
They note that FLOSS projects generally bootstrap on
top of other FLOSS tools, in part due to ideology, but also
because it lowers barriers for new participants and enables
developers to fluidly shift their attention from tool use to
tool development/repair.
As noted above, there are FLOSS tools available.
But in their conclusions they note these criteria:
“(1) an incremental model for quality investment
and payoff (e.g., incrementally adding analysis support, test
cases, measurement, or other kinds of evidence collection), (2) incremental
adoptability of methods and tools both within the server
wall and in the baseline client-side tool set, (3) a trusted server-side
implementation that can accept untrusted client-side input, and (4)
a tool interaction style that is adoptable by practicing open source
programmers (i.e., that does not require mastery of a large number
of unfamiliar concepts).”
Based on this list, they conclude that
“With the exception of testing technology and some code analysis
technology, these requirements suggest that
some adaptation
will be required before adoption is possible for tools that embody,
say, lightweight formal methods approaches or advanced program
analysis approaches. Clearly, any technique or tool is not feasibly
adoptable if it requires a major (client-visible) overhaul of a project
web portal, collaboration tools, development tools, or source code
base. Discernible increments of benefit from increments of participant
effort is key to adoptability.”
In short, formal methods require a lot more training
before the benefits can accrue -- and because they aren’t
incremental (and are rare), it’s harder to do.

I think the difficulty of acquiring the
necessary skills before being able to do any work
is a real and valid issue.
This is a real problem for using these techniques
to develop proprietary software, too!
But there is such experience, so while valid
I don’t think that’s the primary issue.

Have few considered the possibility?
It may be that many potential customers/users have simply
failed to consider a FLOSS option at all.
In fact, I think this is the most likely reason of all.

People tend to do what they know how to do,
and repeat approaches they've used before.
In many cases, the people who are interested in developing a small
high-assurance component have done this before and used
a proprietary approach... and since that is what they did before,
they do not think to consider an alternative.

For example, I went to a detailed economic review in 2006 of
one particular high assurance effort, where the big contrast was between
“government-owned” and “commercial proprietary software vendors”.
The possibility of a FLOSS approach (e.g., establishing a
consortia to create a FLOSS implementation) simply never entered any
decision-maker’s mind!
When an alternative is never considered, it’s not
surprising when it isn’t chosen.
The developers of high-assurance software
are quite familiar with FLOSS, and in fact are developers of it
themselves (just look at all the FLOSS tools for high assurance!).
But the decision-makers about high-assurance software are not the
same people, and in fact tend to be very conservative people because
they’re worrying about security and safety.
Conservatism about technology makes sense for these stakes, but
that doesn’t mean that valid economic alternatives should be ignored.

The TCX research project is even more bizarre, because it
specifically notes that it hopes to aid both
the proprietary commercial and the “open-source” sectors.
The TCX project even mentions planning to release its results
“using a philosophy similar to the open source approaches”.
But the information published so far does not suggest that
the materials will ever be released under a FLOSS license, nor
have I found evidence that this option was considered.
Licensing was not even a criteria in its tool survey,
yet depending on a language only implemented by a proprietary tool
would be certain to drive away many FLOSS developers.
The TCX published works so far include a mathematical model, which
could be used as a basis for further work, but it is only released as a
file in (uneditable) PDF format, with no rights to make improvements.
There is no FLOSS license granting
rights to improve any of the works released as of April 2006,
even though the TCX text seems to imply that the
MIT and BSD-new licenses might be appropriate for them.
There’s also no published rationale for why the
works will not be released under a FLOSS license;
given the many pages discussing other project decisions, the
failure to discuss the licensing of the results
(where dissemination is the whole point of the project) is jarring.
The reader is left with the impression that the ideas of
using FLOSS licensing terms to release their work,
and working with the larger FLOSS community,
never occurred to the project leaders.

In short, I think this is the primary explanation: the
FLOSS option does not seem to be even considered as a potential strategy.
I don’t think every high assurance component of the future will be FLOSS,
in fact, I expect there to be a continued number of proprietary
high assurance products.
But there also seems to be no special reason that it should be ignored.
I suspect that the number of FLOSS high-assurance
products will grow as decision-makers start to
consider FLOSS options as well.
In the long term, I expect a mixture of proprietary and FLOSS components,
just as this is already true for compilers,
operating system kernels, web browsers, web servers, and so on.

There are a large number of FLOSS tools available for creating
high assurance components.
There are a vast number of tools supporting configuration management,
testing, formal methods, and code generation.
This paper focused on formal methods, since this is distinctive for
high assurance, and showed that there are many tools in this space
(both research and industrial-strength).
Some of these tools are ACL2, HOL 4, Isabelle, Otter, ProofPower, ZETA, CZT,
Spin, NuSMV, BLAST, and Alloy.
(I find ACL2, Otter, Spin, BitC, and Alloy especially interesting,
though with different reasons for each.)

In contrast, few high assurance components are FLOSS.
After looking at the options, the most likely reason for this appears to
be that decision-makers are not even considering the possibility of
FLOSS-based approaches.
Decision-makers should consider FLOSS-based approaches
as future high assurance components are needed,
including the possibility of creating consortia,
so that a FLOSS-based strategy can be chosen where appropriate.

Governments should require that government-funded research
normally release all software it develops under a FLOSS license,
unless the government is convinced that in this particular case there is
a better alternative.
This is amply demonstrated by the many discarded and underused
proprietary projects in these fields; the waste and
lost opportunity alone is enough to justify this.
More fundamentally,
governments use money from their people to do research; it is only fair to
ensure that all the people (who pay for the research) can reap the benefits,
unless they will be better served some other way.
If software research results are FLOSS, then
anyone can start with what was developed and build on it,
instead of starting over.
The Open Informatics Petition text goes farther
and suggests that governments simply mandate this no matter what.
I will not go that far; I think there are good reasons for cases where
government-funded research should not be released as FLOSS, but
the petition does explain in more detail the advantages of this approach.
Most detractors are concerned about this being required on all research.
If you simply say that FLOSS is a wiser
“default”, I think a much better balance is struck than
the current system in many countries.
The Open Informatics
web site has more information.
Requiring FLOSS release does not prevent commercialization; there are many
FLOSS-based businesses, and many FLOSS licenses permit adding extensions and
making the result proprietary.
The many success stories from FLOSS-based approaches (e.g., ACL2,
Security-Enhanced Linux, etc.) suggest that releasing software under
FLOSS licenses is a very effective way to improve tech transition
and establish sustainable research.
Since the GPL is the most common FLOSS license
(in formal methods tools and in general),
whatever FLOSS license is used in this case should at least be
GPL-compatible -- that way, research efforts can be combined
into larger works as needed.
For the same reason, I would recommend that
one of the “classic” FLOSS licenses be used in most cases
(i.e., MIT, BSD-new, LGPL, or GPL) -- they are well-understood,
and can be combined as necessary.

Finally, developers who want to start new FLOSS projects should
consider developing or improving high-assurance components or tools
(including tools that combine other tools).
Improving the user interfaces, capabilities, or integration of tools
would be very valuable.
Sample assured components, especially ones that are useful
(like separation kernels or RTOSs) would be of value too, both for
potential users and for others developing such programs
(because there are few publicly-available examples that people can
experiment with and learn from).
These are technically interesting, and given the increasing attacks and
dependence on computer systems, having more high assurance
programs available will be vital to everyone’s future.