Visions

Therefore all progress depends on the unreasonable man.
-- George Bernard Shaw

Large software systems (think millions of lines of code, multiple
languages) have surprising troubles. Semantic Designs
believes that automated analysis and transformation tools such
as DMS can
tackle problems not previously solved. Such problems require some
significant engineering but have big potential payoff in terms of
cost, development time, reliability, and the ability to achieve new
capabilities, for the client owning the large code base.

We suggest some ideas here. If you find these ideas interesting, or have a vision
of your own, we invite you to contact
us to discuss what might be possible.

Automated Extraction of Regression Tests

Writing software is hard. Writing tests to validate the software is hard,
and consumes about 40% of the overall effort in a well-run project.
If one could cut the cost of writing tests, it could have a major
impact on the cost of building and maintaining software. Unfortunately,
it is impossible for a tool to know what the functionality of a system
should be in the absence of anything but the source code.
So one cannot automate the generation of tests against the intended functionality.

But a running system represents working functionality (modulo its
known bugs). If one had tests that verified the running system
operated as it should, those tests could be used to verify that
changes to the running system, which occur continuously, do not damage
the part of the system not changed.
A solution that extracted tests from running software would be an
enormous benefit to organizations with legacy software.

Semantic Designs believes that it is possible to extract such tests
from existing code. The essential idea is to instrument the running
application to capture test case data based on data from its daily operation,
and use that to generate unit tests on program elements. For those
tests to be effective, the running context of the program must be
re-established during testing. A puppetization process would
install controls in the application to enable it to operate
in parts exactly as the original, and in parts to force it down paths
to the point where particular unit tests would be applied. SD has the
technology to instrument applications
in many languages (as examples, consider our test coverage and profiling tools)
and capture data. It has the technology to puppetize code.
What remains is to put the pieces together into a working system.

Unifying Forked Source Code Bases into a Product Line

Many organizations find themselves with a very large application that has
been forked into multiple versions, and are now doing updates
on the multiple versions at a correspondingly high prices.
An ideal solution would combine the multiple versions into
a single gold code base with configuration parameters, that
could be used to generate the multiple versions.
Then maintenance and updates happen on the golden code
base, which is delivered to multiple sites according their
corresponding configurations.

To do this, one must discover what the versions have in common,
and where they differ. The common part can be extracted,
and the differences added to the common section conditionally
controlled by configuration parameters.

Semantic Designs has tools for discovering common code
and differences (e.g, our Clone detection and Smart Differencing)
tools, across many languages (see our supported languages list).
We have the ability to transform code to insert configuration
conditionals of many kinds (preprocessor conditional, procedural
or macro abstraction, objects with inheritance, generics, whole-file-replacement).
The result is a product line, which can be used to generate
the instance variants as needed. Development on a shared code
base makes common updates easily shared, and changes in
configured code clearly specialized to the variant.
You can read a bit more about this in a Dagstuhl Research Report, in the section onf Refactoring to Product Lines.

Automated Code Update from Data Schema Changes

Every application is driven by a model of the world,
realized as in instance of a data schema that can hold
the necessary details. The schema may be implicit (e.g.,
as in hierarchical databases or flat files) or explicit
(as in relational data models or XML schemas). No matter
how the scheme is defined, the program contains code
that implicitly knows what the schema is. The obvious
value is the program knows how to manipulate data in
that schema. The problem is the organizations' needs,
and the world, both evolve, requiring the data schema
to change, and the program to change in response.

Some of that change is in terms of new functionality
that harnesses the new types and relations of data in
the new schema. But much of the change is just to
accomodate changes in the schema. As an example,
almost every new data field requires something to create,
read, update, and delete new data field instances ("CRUD").
Knowledge of how the program uses the data schema,
and changes being made to a data schema, could be used
to automate the mundane part of updating the program,
allowing software engineers to work on the interesting
functionality.

Semantic Designs tools can process data schema descriptions
(SQL, XML Schemas, ...) and source code. Changes made
to a data schema can be detected by SD's Smart Differencers.
Such changes could be used to automate much of the mundane
part of code base changes.

Integration of Two Applications by Data Model Unification

Application integration allows a company to provide more sophisticated
responses often with less effort, even to the point of driving
corporate mergers. But too often, integration fails because
the data models of two applications are not aligned, and because
one cannot easily make changes in one application caused by
integration changes induced by aligning data from the other model.
And thus synergies of integration are not achieved, or are long
delayed. Being able to unify two data models, and push changes
into two applications, is key to application intergration.
One needs to be able to align data model elements.

We suggested above how changes in one model could be partly
automated using tools. What is different here is aligning two data
models first. The changes required to align the models can be used to
drive model changes into each application. Semantic Designs thinks
that semantic description technology (e.g., OWL, descriptive logics,
specification algebras) can be used to provide precise semantics to
data elements and their relations, and new relations computed from
old. Thus an algebraic means for unifying the schemas is suggested,
which might both guide the unification process, and provide additional
semantics for the programs.

Semantic Designs tools can process "semantic description"
languages, and thus be used to enable modifications to schemas,
and check that the resulting schemas are aligned (to some degree;
semantic reasoning in general is Turing hard and most schemas
are imcomplete in the semantics of the modelled facts).
But any help here is enormous, because the cost of making the
changes incorrectly is very high.

Basel II Compliance: What's the source of that datum?

Banks and other large financial institutions are becoming
increasingly regulated in terms of delivered results and processes
required. One set of standards to be met by such institutions are the
Basel II agreements. Any Basel II solution considered to be "best
practice," should be transparent and auditable. It should provide
complete traceability of computed numbers down to the source data with
the appropriate audit trail. How is one to achieve this,
in face of large scale information systems in enormous organizations?

As financial processes become increasingly automated, this information
flows through computer programs owned by the financial institution.
One way to solve the tracking problem is then to literally trace the data
going into reports, into the databases that produce it, and from
there into other financial processes, repeating until one arrives
at source data acquired from some outside agent. (Even then
one might wish to dive further, but that is subject to the outside
agent cooperating on massive scale).

Semantic Designs builds
data flow analyzers
that compute this data for individual programs; we've handled
individual systems of 26 million lines of code. One can imagine scaling this up to trace
data across processes and databases owned by the institution, to
provide a documented trace of information sources. One would
need tools to enable financial engineers to explore this trace.
But questions about sources of information would then be answerable.

An odd side effect of this process is probably cleaning of
data. Consider the notion of "profit". It ought to be
that the profits of a company are the sum of the profits
of its divisions. However, if such profits are measured
in different ways (annual, cost-adjusted, ...) adding them
may in fact produce nonsense. A full dataflow analysis
would find where such profits are added. Adding type checking
would verify that the composition was valid. One might
not get a valid composition in all parts of an organization,
but the organization should at least know where data is
combined inappropriately.

Design Recovery

Most large applications exist only as source code (sometimes not even that).
Any actual design knowledge may be hidden in some engineer's brain
or more usually completely lost. A consequence is that continuous changes
to code, demanded of working systems, always requires rediscovery of
the concepts and code organization of the software. Thus programmers
spend 50% of their time just staring at code, trying to understand
what it does. They are hampered by only having the low level source
code, perhaps some hints in the comments and rarely, software
entities that are well-named with respect to purpose.
Tools that can rediscover common concepts for the application,
and where those concepts are implemented, could shorten the
understanding time and therefore delivery considerably, and could
raise the quality of changes that are made.

Code concepts are realized by data structures and idioms
for manipulating those data structures in ways that achieve
the application purpose. Once the data structures are defined,
the idioms to achieve purpose tend to similar because they
must process that data as defined. Semantic Designs
has the technology to find data values flowing through code
(e.g., data structure instances) and match idioms that
manipulate such data structures. One can "tile" the code base
with recognized concepts, and make those tiles visible to
new programmers that have not seen the code before, enabling
them to understand and decide what to do more efficiently.
(We are presently doing a version of this for Dow Chemical).

Design Traceability from Specifications to Code

The holy grail of program development is not to recover design information
that has been lost. Rather, it is to not lose that design information,
as it is generated, thus avoiding the expensive and error-prone
process of trying to rediscover it. One needs to record the the abstract
concepts, the program purpose, the implementation choices and the final
code to do this "right".

Semantic Designs' flagship product, DMS, was designed from this perspective.
SD has a vision of how such design information might be captured
and incrementally updated as changes are made. This would be especially
valuable for capturing the structure of complex, expensive artifacts
such a chip designs, software with safety requirements, or simply large applications.
You can read a technical
paper on formalizing and mechanizing this.

Intrigued?

Bring us your poor, your tired, your huddled fantasies of
massive software engineering using automated tools, and let
us set it free.

Topics

Semantic Designs- Our Goal

To enable our customers to produce and maintain timely, robust and economical software by providing world-class Software Engineering tools using deep language and problem knowledge with high degrees of automation.