Summary
I've been thinking about objects recently. One of the questions I've been thinking about is whether or not one can make sense of objects in a language-independent way. Herein are some reflections, and what may be the start of a series of posting on the subject...

Advertisement

I've been doing some thinking about objects recently, perhaps brought on
by my agreeing to do a talk at the 8th Jini Community
Meeting on this subject. Of course, it does one good to
think hard about things that you have taken for granted for some time;
this is one of the reasons that I went back to Sun Labs some time ago, and
why I read The Mythical Man Month every couple of years, and
why I read Socrates' Apology at least as often. I, at least, need
to be reminded about the basics every now and then, and think about what I
do without thinking about it.

I imagine that most of the readers of this blog would claim to believe
in objects (this may be true by vacuous quantification). But what does
it mean to believe in objects? This has lead to a lot of reflection over
the past couple of months, but in this post I want to be a little more
focused. In particular, I want to ask the question of whether objects
are something that we can believe in without believing in the objects of
a particular kind expressed in a particular language, or if there is
something more abstract about objects.

In some sense, objects are simply a way of abstracting information by
joining the underlying state of the information with the code that
manipulates the information. I've often said that objects are a
combination of data and code, and in some abstract way that is true. But
getting more detailed than this generally starts tying one to the
language in which we are expressing objects.

For example, it is very rare that an object really is a combination of
state and code; far more common is that an object is a combination of
state and a class descriptor, with the class descriptor (often with some
other form of information) pointing off to some code. There are
non-class-based object-oriented languages (like Self) that don't have a notion of sharing code among a group of objects, but they are in the minority.

Once the notion of a class has been introduced, there are notions of the
relationship between classes, often stated in terms of inheritance and
often (but not always) associated with a type system. Different
languages allow different kinds of associations to be built between
classes and types. All these different ways of associating classes have
an impact on the way in which the state and the code that we think of as
making up the object are actually related. And then there are things
like the Java language notion of a classloader, which adds another level
of indirection between the state of an object and the actual code that
is associated with the object.

This difference between our (or, at least, my) mental model of an object
and the actual expression of an object in a computing system has all
sorts of interesting effects. The only one I plan on talking about here
is that an object in one language is generally very different from an
object in a different language, because of all of the various ways in
which different decisions made in the language design are reflected in
the way the objects in that language are put together.

For example, consider an object in the Java environment and an object in
C++. There are lots of surface similarities. But C++ allows multiple
class inheritance, while Java does not. C++ has operator overloading,
the Java language (whew!) does not. C++ objects have destructors,
Java objects have finalizers; while these appear similar on the surface
they are really very different. The list could go on.

Now, in what sense are Java objects the same as C++ objects? There is
the conceptual similarities (both are abstractions of state, allowing
polymorphic typing, etc.) but they are very hard to translate from one
idiom to the other. This is something that becomes most clear when one
gets asked (generally by a manager) to "translate" a program written in
C++ into the Java language. Even though these two languages are both
concerned with objects, there is no real sense in which one can
translate a program written in one into a program written in the other;
at best one can write a new program that does the same things in the new
language. But this isn't translation, this is writing a new program.

In some ways, this feels similar to the situation with basic data types
some time ago. When I started programming, a C int was not
the same as a Pascal INTEGER and neither was anything like a
COBOL PIC 9(5). In fact, at that time a C int
on one machine and compiler might not be the same as a C
int on a different machine or compiler. It took a lot of
work to get the notion of common data types to work across programs.

Similar work has not been done in the area of objects. There has been
some noise that an environment is both object-oriented and
language-independent, but to my knowledge all such environments are
object-oriented only in the entities that you can send messages to
(either directly or across the network). The arguments and return values
of those objects need to be restricted to a small number of basic data
types. This is hardly allowing first-class objects in a
language-independent way.

So the question that has been nagging me for the last while is simply
this: is there a notion of object which is independent of the language
in which one is programming? If so, I haven't come across it yet. If
not, then do we have to make a choice in our work between objects and
language-independence? I'm beginning to think that the second of these
is the more viable approach, but it flies in the face of much of what we
have been taught in software development.But perhaps that should be left
as a topic for another day.

Yeah, as I read along, I was thinking that having a universal "assembly language" type of common currency would allow for passing objects (state and code) between languages. Oh yeah, that's IL isn't it?

It's hard to have it both ways. We can say that it's a shame that there is no view of OO that unites design across languages, but then when a common object model comes around that actually does that, it's called a limited subset.

The price of uniting OO across various languages is creating that subset. I don't see how it could be any other way.

This notion of some sort of grand unification of languages comes up pretty regularly. For example, check out the archives of the comp.compilers newsgroup.

Fundamentally, there are always phase transitions between the abstract, mental models that we create and the manifestation of those models in the languages that we create and the implementation of those languages.

For a sideways example, look at the inability to tie so-called real-world identity to the abstract, online identities (for use in things like (secure, anonymous, verifiable, etc.) e-voting).

I think this all just points to the fact that the state of the art of computer science is still very primitive. Instead of telling the computer what we want to do we have to give it very meticulous and precisely formatted instructions. And then it only ends up doing what we wanted a small percentage of the time.

I think the properties of the JVM are a lot like the CLR, but for some reason Sun de-emphasized the ability to support any language that produces byte code (heck, it's called the "Java Virtual Machine" instead of the "Byte Code Virtual Machine"). They've also put little emphasis on the byte code itself, as a language, whereas Microsoft has made a big deal of MSIL.

It may just be a matter of marketing, but it does influence people. You could argue that Java's success was a matter of marketing along the same lines, because Java wasn't the first language to compile to an intermediate format (remember P-Code?).

Even in a cross-language enviroment with different assumptions and models, the fundamental aspect of putting what would normally be thought of as a stack frame on the heap, where it can be referenced from unrelated lexical scopes, is pretty powerful.

The crucial thing is whether (per Jim's original question) it's independent of the language in which one is programming?. It can certainly be a different language: I could, for instance, define the semantics using UML. (Sorry, I used the "s" word. I'll explain myself in a later comment.) However independence is tougher. If one could demonstrate a mechanical transformation between the (surface) languages which preserved the semantics, are they independent?

First, I'd like to get rid of all of the chimeras about basic data types and so forth. For me, objects are purely procedural entities; if you really want to think of them as glorified structs, introduce the necessary collection of getters and setters. This abstracts away any details of implementation, and leaves us with the "semantics" of the methods on the object. It's OK to continue to think about the "state" of the object. but we should do so in the same way that, say, Dennett talks about the beliefs of an individual from the intentional stance. The state is real, but we don't need to know if or how it's implemented. (It's simply introduced to preserve object-level coherence for the pre- and post-conditions of all the methods.)

(Dennett uses the analogy of an object's "centre of gravity". It's real, in that you can use it to make meaningful predictions about the way in which the object interacts with other objects, but there's nothing that implements it.)

But is this what Jim's after? What counts as "the same"? What kind of identity or equivalence are we talking about? There's a long tradition in computer science, probably inherited from mathematics and logic, of focussing on functional equivalence. How about temporal issues, or resource issues? Suppose I have two objects with the same functional signature, one of which executed a method in N**2 and the other in log(N). Are they the same?

To return to Jim's original thoughts about what it means to believe in objects: for me, a key part of it is adopting a stance in which I have no visibility of or dependency on the implementation of an object, including its "state" if any. The challenge is that our formal description tools are woefully inadequate. I've already mentioned a couple of issues (time and resources). How about two more favourites: thread safety and intra-object synchronization and deadlock? Can I really be as indifferent to the implementation as I'd like to be.....?

> To return to Jim's original thoughts about what it means> to believe in objects: for me, a key part of it is> adopting a stance in which I have no visibility of or> dependency on the implementation of an object, including> its "state" if any. The challenge is that our formal> description tools are woefully inadequate. I've already> mentioned a couple of issues (time and resources). How> about two more favourites: thread safety and intra-object> synchronization and deadlock? Can I really be as> indifferent to the implementation as I'd like to be.....?

There are some important concepts in this comment. There are two basic camps from my perspective. The Python, Perl, your-dynamic-language-of-choice crowd like to repeatedly demonstrate how much knowledge can be inferred by the language system. These languages also rely on runtime safe-guards to limit the effects of improper contract agreements.

The static typing, strict symantics crowd (which I am in) believe that is safest to never make implicit assumptions on behalf of the programmer. Thus, we (the symantic crowd) see nothing but huge quantities of looming details when we talk about interworking.

There's an interesting balance that is occuring... The dynamic language group tries more and more things that the static crowd considers dangerous to program correctness. And we learn from that some new boundries (read about how Sun is trying to figure out potentional JVM changes to allow dynamic languages to use the JVM more effectively, jython as an example http://www.tbray.org/ongoing/When/200x/2004/12/08/DynamicJava). The static typing group expresses more and more syntactically or semantically in language design that makes it much more interesting for a dynamic language to provide the same capabilities (read Guido's latest post on artima about adding static typing to Python).

Some might argue that SOAP has become the language independent object interaction language. But that is a completely different topic in my mind.

Doug Lea recently posted a great concurency checklist that might be interesting to use to create Java annotations for a more formal threading analysis tool. See this list at http://gee.cs.oswego.edu/dl/cpj/prop.html, and join the concurrency list if you want (concurrency-interest at altair.cs.oswego.edu).

My personal opinion is that there is no way to unify all of these concepts right now. We really should stop inventing new languages that just do the same things, but with different syntax or semantic twists.

We've recreated library after library in 10 different languages, and we've recreated huge runtime environments (Wine on linux is a great example) just because of portability. It really is time to unify behind a single runtime environment that is binary compatible everywhere, and move on. Then, we'd have object compatibility and we'd be able to solve real problems that add value to our businesses. We are solving the same interworking problems over and over and over again, feeling good about solving the problem, but really adding no value to the computing environment...