Cω Overview

This section discusses the motivation behind Cω and provides a fairly detailed, yet high-level, account of the main features of the language.

Important: Cω is an experimental research
language. There are no plans to turn it into a commercial language
supported by Microsoft. It is not supported by either the C# or
the Visual Studio teams. There are no plans to integrate it into any product.

Introduction

In the last decade, strongly-typed, garbage collected object-oriented
languages have left the laboratory and become mainstream industrial
tools. This has led to improvements in programmer productivity and
software reliability, but some common tasks are still harder than they
should be. Two of the most critical for the development of the next
generation of loosely-coupled, networked applications and web services
are concurrent programming and the processing of relational and
semi-structured data.

It is a truth universally acknowledged that concurrent programming is
significantly more difficult than sequential programming. It used to
be the case that only highly skilled developers, working on such
projects as operating system kernels or the core engines of databases,
did non-trivial concurrent programming. Over time, however, more
developers have had to deal with more concurrency. This began to
happen with the arrival of multi-tasking GUIs and multi-user
server-side applications, and the initial approach was to encapsulate
the concurrency somewhere where the typical application programmer
wouldn't have to deal with it. GUI frameworks are typically
single-threaded despite the natural concurrency of graphical
interfaces, whilst a common approach to concurrency control in server
applications is to delegate it to the transactional mechanisms of a
database. However, we believe the arrival of network-centric computing
has exposed these palliatives as inadequate.

Even the simplest application now has to simultaneously manage both a
GUI and network communications, and locking up a single threaded user
interface whilst performing a lengthy remote operation is
unacceptable. As applications perform more remote communications with
more remote services, we are increasingly forced to replace
synchronous RPC with asynchronous (one-way) messaging. But
asynchronous programming is also difficult -- incoming messages arrive
at unpredictable times and in unpredictable orders, and one naturally
ends up needing multiple threads to handle them.

Unfortunately, support for concurrent programming, especially
asynchrony, in current mainstream programming languages is
weak. Shared memory concurrency on a single machine, even in modern
languages like C# and Java, is handled by 1970s threads and locks
model which is implemented entirely in terms of library
routines. Distributed concurrency and asynchronous messaging are
handled by different library routines with their own model; the .NET
libraries, for example, include a fairly complex delegate-based API for
asynchronous messaging that still offers little support for the hard problem
of handling incoming messages.

The current situation with regard to external data is just as
bad. Most web-based (and many non-web-based) applications are
basically thin interface-generating layers over relational databases
(the `3-tier' model). Yet dealing with relational data from within
current languages is messy, error-prone and dangerous. APIs for
database access typically construct SQL queries as strings and return
results as untyped collection objects which are deconstructed
imperatively. This makes code lengthy and unreadable, negates much of
the benefit of working in a language with strong static safety
guarantees and loses much of the advantage of the underlying
declarative query language. Worse still, it is insecure: when query
strings are derived from user input, preventing script-injection
attacks requires very careful coding.

The other form of external data with which application increasingly
have to deal is tree-shaped semistructured data, such as XML
documents. Here again, support for processing such data in current
languages is weak. One either uses an untyped DOM-like representation,
giving up static guarantees about conformance to a particular DTD or
Schema, or uses an external tool to produce a more strongly typed
mapping into the language's type system, with an accompanying loss of
fidelity, efficiency and genericity in queries. Even in the DOM case,
in which one can more easily support a reusable library of query
methods, constructing queries is still significantly more complex than
it is in a more domain-specific language such as XQuery. On the other
hand, interfaces from general purpose languages to external XML
processing languages are often string-based, with all the same
impedance mismatches and security holes as in the SQL case.

Methodology

The design of Cω is based on three principles:

Asynchronous concurrency and processing of relational and
semistructured data are sufficiently important that they should be
directly supported in a modern general purpose programming
language. The advantages of direct linguistic support include:

Stronger compile-time guarantees.

Intentions and invariants are more apparent in the code. They
become part of the interface rather than being buried in the dynamic
flow of control into mysterious library routines.

The compiler has more information and so has the freedom to
choose different implementation strategies (e.g. performing query
optimizations).

More natural syntax.

Better support from other tools such as editors and debuggers.

We should extend an already-popular language, rather than design a
new one from scratch.

The extensions should be principled. The
aim is to take models and lessons learnt from the design of more
academic, special purpose languages and try to incorporate them within
the mainstream object-oriented framework. In the case of concurrency,
we took ideas from a theoretical model called the join calculus and
a join-based concurrent functional language called JoCaml. In the case
of our data extensions, many of the underlying ideas come from
functional programming.

Cω Data access

From a data perspective, our design goal was to evolve C# to
provide an integration of the object, relational and semi-structured
data models. (A similar process is happening with database systems;
nearly all now offer some form of integration of the relational and
XML data model, and of SQL and XQuery.) Diagrammatically we have aimed
for the following.

More technically, we have
sought to integrate these models by generalization, rather than
by ad-hoc specializations (e.g. by adding Table<R> or
XML<S> classes where R is some relational type and
S is some schema).

Cω offers an integration of the data models by extending with
C# type system with some new types, and by offering some new
expressions -- primarily by generalizing the familiar member access
operator. We shall consider these new types in turn, and for each
consider the appropriate new expressions.

Streams

The first structural type we add is a stream type; streams represent
ordered homogeneous collections of zero or more values. For example,
int* is the type for homogeneous sequences of
integers. Streams in Cω are closely related to
IEnumerable<>, the generic iterator interface that will appear
in the next release of C#.
Cω streams are typically generated using iterators, which are
blocks that contain yield statements. For example, the method
produces a single element stream.

virtual string* Foo(){
yield return "Hello world!";
}

Importantly,
it should be noted that, just as for C#, invoking such a method
body does not
immediately execute the iterator code, but rather
immediately returns a closure. (Thus Cω streams are essentially lazy
lists, in the Haskell style.) This closure is consumed by the foreach
statement, For example, given a stream zones of type
int*, the following statement prints each
element in that stream.

A vital aspect of Cω streams is that they are always
flattened; there are no nested streams of streams. Cω streams
are thus aligned with XPath/XQuery sequence which are also
flattened.

A key programming feature of Cω is generalized member access: the familiar 'dot' operator is now much more
powerful. Thus if the receiver is a stream, then the member access is
mapped over the elements, e.g. zones.ToString() implicitly
maps the method call over the elements of the stream zones and
returns a value of type string*. This feature significantly
reduces the burden on the programmer. Moreover, member access has been
generalized so it behaves like a path expression. For example,
zones.ToString().PadLeft(10) converts all the elements of the stream
zone to a string, and then pads each string, returning a stream of
these padded strings. As we shall see, the combination of generalized
member access and new datatypes, enables the Cω programmer to write
XPath-like queries directly.

Sometimes one wishes to map more than a simple member access over the
elements of a stream. Cω offers a convenient shorthand called an
apply-to-all expression, written e.{s;...s;}, which applies the
block {s;...;s;} to each element in the stream
e. The block may contain the variable it which plays
a similar
role as the implicit receiver argument this in method bodies and is
bound to each successive element of the iterated stream. For instance,
the apply-to-all expression below will convert the stream of natural
numbers from 0 into the stream of even numbers, converts each of
these into a string, and then prints them all.

From(0).{return 2*it;}.ToString().{Console.WriteLine(it);};

Cω also provides a lightweight way to define finite streams
without the need of defining a method, called block
expressions. For instance, the following block expression
generates a stream of two strings.

string* greeting = {yield return "Hello"; yield return "World!";};

Anonymous structs ("tuple types")

The second family of structural types we add are anonymous structs, which
describe ordered collections of heterogeneous values. An anonymous
struct is like a tuple in ML or Haskell and is written, for example
struct{int i; Button;}. A value of this type contains a
labelled member i
of type int and an unlabelled member of type Button. We can
construct a value of this type with the following expression:

new(i=42,new Button())

To access components of anonymous structs we (again) generalize the
notion of member access. Thus assuming a value x of the previous
type, we write x.i to access the integer value. Unlabelled members
are accessed by their (constant) position; for example x[1] returns the
Button member. As for streams, member access is lifted over
unlabelled members of anonymous structs. To access the BackColor
property of the Button component in variable x we can just write
x.BackColor, which is equivalent to x[1].BackColor.

At this point we can reveal even more of the power of Cω's
generalized member access. Given a stream friends of type
struct{string name;int age;}*, the expression friends.age
returns a stream of integers. The member access has been lifted over
both structural types. To print the names of one's friends is the
query-like statement:

friends.name.{ ConsoleWriteLine(it);};

Interestingly, Cω also allows repeated occurrences of the same member
name within an anonymous struct type, even at different types. For
example, assume the following declaration:
struct{int i; Button; float i;} z;
Then z.i projects the two i members of z into a new anonymous
struct that is equivalent to
new(z[0],z[2]) and of type struct{int;float;}.

Cω provides a limited form of covariance for anonymous structs, just
as for streams. Thus struct{int;Button;} is a subtype of
struct{int; Control;} but is not a subtype of
struct{object; Control;}.
Cω does not support width subtyping for anonymous structs.

Choice types

The third structural type we add is a particular form of discriminated
union type, which we call an choice type. This is written, for
example, choice{int; bool;}. As the name suggests, a value of this
type is either an integer or a boolean, and may hold either at any one
time. Unlike unions in
C/C++ and variant records in Pascal where users have to keep track of
which type is present, values of an discriminated unions in Cω are
implicitly tagged with the static type of the chosen alternative, much
like unions in Algol68. In other words, discriminated union values are
essentially a pair of a value and its static type.

There is no syntax for creating choice values; this is handled by the
compiler.

choice{int;Button;} x = 3;
choice{int;Button;} y = new Button();

Cω provides a test, e is T, on choice values to test
the value's static type. Thus x is int would return
true, whereas y is int would return false.
This is a simple extension of C#'s is type test.

Assuming that an expression e is of type
choice{T1;...Tn;},
the expression e is T is true for exactly oneT in
T1;...Tn;. This invariant is maintained by the type system. The
only slight complication arises from subtyping, e.g.

choice{Control; object;} z = new Button();

As Button is a subtype of both Control and object, which type
tag is generated by the compiler? The answer should be obvious to the
experienced C# programmer: an choice type can be thought of as
providing a number of overloaded constructor methods, one for each
component type. Just as for standard object creation, the best
constructor method is chosen. In the example above, clearly
Control is better than object.
Thus z is Control returns true.
The notion of 'best' for Cω is the obvious extension of that for
C#.

As the reader may have guessed, member access has also been generalized
over discriminated unions (Cω's choice types). Here the behaviour of member access is less
obvious, and has been designed to be aligned with XPath. Consider a
value w of type choice{int; Button;}. The member access w.GetType()
clearly succeeds irrespective of whether the value is an integer or a
Button object. In this case the type of the expression w.GetType()
is Type.

However clearly the member may not be supported by all the possible
component types, e.g. w.BackColor. Classic treatments of union types
would probably consider this to be type incorrect. However, Cω's
choice types
follow the semantics of XPath where, for example, the query foo/bar
returns the bar nodes under the foo node if any exist, and
the empty sequence if not. Thus in Cω, the expression
w.BackColor is well-typed, and will return a value of type
Color?, which is either a Color object or
null. This new type, Color?, can be thought of a
singleton stream containing the Color value (when w
contains a Button), or the empty stream (which is equated
with null). Again, we emphasize that this behaviour precisely
matches that of XPath.

Cω follows the design of C# in allowing all values to
be boxed and hence all value types are a subtype of the supertype
object. Thus both anonymous structs and choice types are considered
to be subtypes of the class object.

Content classes

To allow close integration with XSD and other XML schema languages, we
have included the notion of a content class in Cω.
A content class is a normal class that has a single
unlabelled type that describes the content of that class, as
opposed to the more familiar (named) fields. The following is a simple
example of a content class.

class friend{
struct{string name;
int age;
};
void incAge(){...}
}

Again we have generalized member access over content classes. Thus the
expression Bill.age returns an integer, where Bill
is a value of type friend.

From an XSD perspective, classes correspond to global element declarations,
while the content type of classes correspond to complex types.
Some further comparisons with the XML data model are immediately
below, but a more comprehensive study can be found
elsewhere.

Further expressions and examples

XPath-like features

C# provides the ability to query members of an object using the "."
operator and members of an array using array indexers, "[i]". The
intention of Cω's generalization of these operators is to leverage
in XQuery/XPath-like programming in an intuitive way. It's certainly
the case that XQuery/XPath code can be almost trivially rewritten in
Cω.

As an example consider the following XQuery code, that is taken
from the W3C's Use
cases document. It operates on a
bibliogrpahy, that contains a number of books. For each book in the
bibliography, it lists the title and authors, grouped inside a
'result' element.

Filters. Generalizing array indexers, Cω
allows a filter to be passed over a stream. For example, the
expression friends[ it.age < 18] therefore
filters the stream friends of type
friend*, returning those objects who denote
minors.

Descendants. While Cω does not support all the
XPath axes (sibling, parent, namespaces, and so on), Cω
does support descendant queries using the syntax
....

Queries by type. Cω can also query by type,
this is the only XPath style axis supported in Cω, even
though XPath doesn't have a type axis. The query
friends.string::* returns all direct members of
type string, regardless of their member names.

XML literal expressions

Cω provides a convenient way to construct object graphs using
an XML literal expression. The following is from the XSVG sample
provided with the compiler installation. It builds a an SVG graphic
object using an XML literal expression:

This is analogous to the XML element construction that is possible
in XSLT and XQuery. However, Cω views this as just convenient
shorthand for the appropriate data constructors. Notice that,
just as in XSLT and XQuery, the XML elements can contain embedded code so
that dynamic object graphics can be constructed easily. XML literal
expressions are just another way of constructing Cω values and
can be used anywhere that an expression is allowed in the
language.

This is a considerable improvement on the current support for XML
programming in C#. The following shows a function GetTotalByZip
written in C# that takes an XML document as input, then proceeds to
process selected nodes in qualified orders. The output of the program
is the total for all items that were ordered in a particular zip
code:

Here you can see how XPath-like query expressions lead to succinct
Cω programs.

SQL-like expressions

Just as Cω includes XPath-like queries to process
semi-structured data, we should like to offer similar functionality
for processing relational data. Thus Cω includes a simple
fragment of SQL, including selection with projection, filtering,
ordering, grouping, and joins. The advantage of including these
features in the language mean that we can synactically check the
queries, and also use compile-time type checking.

Below is a fragment of a Cω program that uses a
select expression to return all the unitprice/quantity pairs
from some orders with matching zip codes. We highlight various
parts of the select expression and discuss them below in further detail.

Some points about this expression are as follows:

The projection list results in the select expression
returning a "tuple type", namely

struct{string ProductID; decimal Total; }

The from clause specifies the data source being queried. This can be either an object in a SQL database, or any in-memory object.

Cω select statement supports various forms of joins, which are essential to any real database application.

Cω also supports order by and group by clauses.

The where clause can specify any valid Cω Boolean predicate for filtering the selection.

Cω Concurrency

The basic idea

In Cω, methods can be defined as either synchronous or
asynchronous. When a synchronous method is called, the caller
is blocked until the method returns, as is normal in C#. However,
when an asynchronous method is called, there is no result and the
caller proceeds immediately without being blocked. Thus from the
caller's point of view, an asynchronous method is like a
void one, but
with the useful extra guarantee of returning immediately. We often
refer to asynchronous methods as messages, as they are a
one-way communication from caller to receiver (think of posting a
letter rather as opposed to asking a question and waiting for an
answer during a face-to-face conversation).

By themselves, asynchronous method declarations are not particularly
novel. Indeed, .NET already has a widely-used set of library classes
which allow any method to be invoked asynchronously (though note that
in this standard pattern it is the caller who decides to invoke a
method asynchronously, whereas in Cω it is the callee
(defining) side which declares a particular method to be
asynchronous). The significant innovation in Cω is the way
in which method bodies are defined.

In most languages, including C#, methods in the signature of a
class are in bijective correspondence with the code of their
implementations -- for each method which is declared, there is a
single, distinct definition of what happens when that method is
called. In Cω, however, a body may be associated with a set
of (synchronous and/or asynchronous) methods. We call such a
definition a chord, and a particular method may appear in the header
of several chords. The body of a chord can only execute once all the
methods in its header have been called. Thus, when a method
is called there may be zero, one, or more chords which are enabled:

If no chord is enabled then the method invocation is queued
up. If the method is asynchronous, then this simply involves adding
the arguments (the contents of the message) to a queue. If the method
is synchronous, then the calling thread is blocked.

If there is a single enabled chord, then the arguments of the
calls involved in the match are de-queued, any blocked thread involved
in the match is awakened, and the body runs.

When a chord which involves only asynchronous methods runs, then
it does so in a new thread.

If there are several chords which are enabled then an
unspecified one of them is chosen to run.

Similarly, if there are multiple calls to a particular method
queued up, we do not specify which call will be de-queued when there
is a match.

Examples

A simple buffer

Here is the simplest interesting example of a Cω class that
uses chords:

This class contains two methods: a synchronous one, Get(), which takes no arguments and returns a
string, and an asynchronous one, Put(), which
takes a string argument and (like all asynchronous methods) returns no
result. The class definition contains two things: a declaration (with
no body) for the asynchronous method, and a chord. The chord declares
the synchronous method and defines a body (the return statement) which can run when both the
Get() and Put()
methods have been called.

Now assume that producer and consumer threads wish to communicate
via an instance b of the class Buffer. Producers make calls to b.Put(), which, since the method is asynchronous,
never block. Consumers make calls to b.Get(),
which, since the method is synchronous, will block until or unless
there is a matching call to Put(). Once b has received both a Put()
and a Get(), the body runs and the argument to
the Put() is returned as the result of the
call to Get(). Multiple calls to Get() may be pending before a Put() is received to reawaken one of them, and
multiple calls to Put() may be made before
their arguments are consumed by subsequent Get()s. Note that:

The body of the chord runs in the (reawakened) thread
corresponding to the matched call to Get(). Hence no new threads are spawned in
this example.

The code which is generated by the class definition above
is completely thread safe. The compiler automatically
generates the locking necessary to ensure that, for example,
the argument to a particular call of Put() cannot be delivered to two distinct
calls to Get(). Furthermore (though it
makes little difference in this small example), the locking is
fine-grained and brief - polyphonic methods to not lock the
whole object and are not executed with 'monitor
semantics'.

The reader may wonder how we know which of the methods
involved in a chord gets the returned value. The answer is
that it is always the synchronous one, and there can be at
most one synchronous method involved in a chord.

In general, the definition of a synchronous method in Cω
consists of more than one chord, each of which defines a body which
can run when the method has been called and a particular set of
asynchronous messages are present. For example, we could modify the
example above to allow Get() to synchronize
with calls to either of two different Put()
methods:

Now we have two asynchronous methods (which happen to be
distinguished by type rather than name) and a synchronous method which
can synchronize with either one, with a different body in each
case.

A one-place buffer

The previous example showed how to define a buffer of unbounded size:
any number of calls to Put()
could be queued up before matching a Get(). We
now define a variant in which only a single data value may held in the buffer
at any one time.

The implementation of OnePlaceBuffer makes use of two
private asynchronous messages: empty() and
contains(). These are used to carry the state of the buffer and
illustrate a very common programming pattern in Cω: note that we have made
no use of fields. The way in which this works can be best understood by reading
the constructor and the two chords in a simple declarative manner:

When a new buffer is created, it is initially empty().

If you call Put(s) on an empty()
buffer then it subsequently contains(s) and the call to
Put()
returns.

If you call Get() on a buffer which contains(s)
then the buffer is subsequently empty() and
s is returned to the caller of Get().

Implicitly: In all other cases, calls to Put() and
Get() block.

It is easy to see that the constructor establishes, and both the chords
maintain, the invariant that there is always exactly one empty()
or contains(s) message pending on the buffer. Thus the
two chords can be read as a fairly direct specification of a finite state
machine.

A reader-writer lock

Controlling concurrent access to a shared, mutable resource is a classic
problem. Clients request, and later release, either read-only or read-write
access to the resource. To preserve consistency but minimize waiting, one
usually wishes to allow either any number of readers or a single writer, but
not both, to have access at any one time. The code below implements a
reader-writer lock in Cω. Writers call Exclusive(),
do some writing and then call ReleaseExclusive().
Readers call Shared(), do some reading and then call
ReleaseShared():

Again, we use private messages, idle() and s(n),
to carry the state. And once more, there is a simple declarative reading of the
constructor and chords which allows one to understand the implementation:

When the lock is created there are no readers or writers, so it is
idle().

If a writer requests Exclusive() access and the lock is
idle() then he may proceed (but the lock is no longer idle()).

If a writer indicates it is finished by calling ReleaseExclusive()
then the lock is idle()
again.

If a reader requests Shared() access and the lock is
idle()then the lock moves to the state s(1),
meaning there is now one reader, and he may proceed.

If a reader requests Shared() access and there are
currently n readers, then there are now n+1
readers, and the new one may proceed.

If a reader indicates he is finished by calling ReleaseShared()
and there were previously n readers, then if n was 1 then lock is
idle() again, otherwise there are now n-1 shared readers. In
either case, the reader who has just relinquished access may proceed with
whatever else he has to do.

Assuming that all the clients obey the request-release protocol, the invariant
is that there is at most one pending private async message representing the
state:

none <—> idle() <—>
s(1) <—> s(2) <—> s(3)
<—> …

Operationally, it may help to think about what does not appear in the
code. There is, for example, no chord that is applicable if a client calls
Exclusive() when there is a pending s(n) message.
In such a case, the client will block until all the readers have released their
access and an idle() message has been sent (There is a
question about fairness here - see the Polyphonic C# paper for a modification
to this example which enforces a simple form of fairness between readers and
writers.)

Conclusion

Cω is an experimental programming language intended to make it
easier to write the data-intensive distributed applications being
written today. To do so we need to provide application writers with
better support for data and control. Cω contains elegant
primitives for asynchronous communication, and offers a strongly-typed
integration of the object, relational and semi-structured data
models.