Look, I get a lot of people auditioning all the time. What makes you think that you’d be good enough for porno?

Sancho:

I am Sancho.

Maxxx Orbison:

Great… but what do you do?

Sancho:

What do I do? I am Sancho.

Maxxx Orbison:

And…?

Sancho:

And there are many Jeffs in the world, and many Toms as well. But I… am Sancho.

Maxxx Orbison:

And…?

Sancho:

Are you Sancho? No you are not. Neither is Scott Baio Sancho. Frank Gifford is not Sancho. But I…

Maxxx Orbison:

You… are Sancho!

Sancho:

That’s right.

Maxxx Orbison:

Okay, you’re hired.

In a nutshell, Sancho is stating something more abstract than just his name. I can say “I am Tom Barta”, but typically I just mean “My name is Tom Barta.” This guy isn’t just named Sancho, he is Sancho, and there is no other.

Think about it: what do you really mean if you go up to a stranger and ask, “Who are you?” The answer could be “John”, “a businessman”, “your neighbor”, or “a pround Republican”. Similarly, if you approach someone and say “Hi, I am John,” I can imagine it would take a tiny bit more processing than saying “Hi, my name is John.” If your name is rare (like Moon Unit or Apple), it’s even more important to remove any ambiguity. “I am Apple” sounds like nonsense or pidgin.

There’s a Nerd-Tangent Hidden in Every Real-Life Thought

There is a parallel between this and programming. The advent of object-oriented programming has popularized the notion of “object identity” versus “object equality”. I can have two objects sitting in memory with identical data. Sometimes, that’s just a programming error, and the same object has been copied unnecessarily (this happens frequently with caching or persistent systems). Sometimes, the two objects genuinely are different. How can I tell if one is just an alias, or if they are logically distinct?

It depends, of course. If I am looking at Value Objects, there’s generally not a reason to distinguish them by identity. The color Red is always Red, even if I have two Reds. However, with Entities, identity is of utmost importance. Two John Smiths in a customer database represent different people. Another way to think about it is in the context of the Flyweight pattern. The two Reds could be replaced with a flyweight without affecting the program. However, the John Smiths couldn’t.

Enter Programming Languages

Of course, programming languages that use objects must have some way of distinguishing object identity from object equality.

Language

Identity

Equality

C, C++

&a == &b

a == b

Java

Anyone care to fill this one in for me? I’m unaware of the semantics of == and equals().

PHP

nothing!

a == b (coerce types to match) or a === b (check types)

Python

a is b

a == b

Whoops! Looks like PHP doesn’t even have object identity! I’d like for someone to be able to refute this, but I haven’t been able to figure it out. PHP documentation claims === means identical and == means equal, but that certainly doesn’t match the notion of object identity I just explained. Sadly, this essentially means that object identity will never truly work in PHP. Instead, we are left with “equal” and “more equal”.

Does it Matter?

In the big picture, I don’t think it hurts PHP programmers to lose object identity. Most PHP applications are business-logic interfaces sitting on top of relational databases. What’s special about the RDBMS in this context? Well, object identity doesn’t exist. I know Postgres has oid and there are probably others, but using them for general applications seems to be unfavorable. In a database table, objects (tuples/records/rows) are identified by a primary key that disallows duplicates and frequently uses auto-incrementing integers. It’s a trivial solution, really, to just assign a number to everyone who walks in the door (until you run out of numbers, of course).

Since the database enforces this uniqueness, I know that two customers both named John Smith will at least have different customer IDs. Social Security, credit cards, university student IDs, and phone numbers all revolve around this notion of unique numeric identity. Consequently, almost any PHP application using a RDBMS can simply piggyback upon the database’s IDs and trivially state that === is now identical to ==.

I bash Java. A lot. So now, I’m going to take a bit of time to mention something positive:

Java generics combined with the enhanced for loop. It makes me feel like I’m working with the C++ Standard Template Library, and that’s a happy feeling.

Some people don’t like generics. Ken Arnold considers them “harmful” (seriously, that phrase should not be used anymore). I suppose Java was originally designed for blubs like Ken. The C++ community has been working with the same notion for significantly longer (where do you think the syntax came from?), and I don’t quite see anyone complaining about the STL‘s complexity. Rather, it’s a very simple and elegant way of abstracting basic algorithms.

Some concrete reasons in favor of using generics:

Type checking is pushed to compile time. The more checks that can be done at compile-time, the less risky a piece of software is. Generics allow a function to be provably type-safe by removing all casts.

They provide executable documentation, which is the best kind. I just converted some of my own code to use generics, and I found that I was already describing the data structures in comments anyways: Map matrix; // map(int -> map(int -> int)). Converting to Map<Integer, Map<Integer, Integer>> matrix means that the type documentation will never go out of date, as it is required for compilation. If I changed to a matrix of floats or longs, my comment would have probably fallen by the wayside, and wrong documentatio is arguably worse than no documentation.

They reduce visual complexity. There is a bit of duplication when declaring a new object: Map<Integer, Map<Integer, Integer>> matrix = new HashMap<Integer, Map<Integer, Integer>>(); That occurs much less frequently than data structure traversal, which (with generics) now has the advantage of avoiding typecasting. It’s not bad to see a bit of verbosity at the top of a function where my mind’s not invested in something, but I really hate trying to wade through a long algorithm and seeing language boilerplate clutter up my screen. Generics also go hand-in-hand with easy-reading iterables:for (Integer row: matrix.keySet()) {
for (Integer col: matrix.get(row).keySet()) {
...
}
}
This is on par with iterators from other imperative languages (C++, Python, and PHP, though C++ is the only one to standardize on ranges for iteration).

…is that it takes so long to type things! I’ve always thought Hungarian Notation was a foolish thing for classes and interfaces (e.g. CRotationMatrix and IAffineTransformation), but now I’ve decided why.

Anyone who uses a code-completing IDE should be able to type in any class or interface name and get at least a hint about it’s type. This is particularly relevant to Java, which appears to be a “tools-required language” by design with its verbose but easy-to-parse style.

Anyone who uses a “man’s man” editor like vim/emacs/editplus should at least have access to some basic text-searching utilities. Something like `grep 'class TheClassName' path/to/repo -R` or (in PHP-land) surfing to http://php.net/functionname is typically sufficient.

Most code I’ve come across in dynamically-typed languages like PHP or Python doesn’t require a distinction between “implementing an interface” and “extending an implementation”. I know PHP allows the abstract class keyword, and I can’t really say that I like it.

So who really wants to see the C or I in front of their class/interface name? I dunno, but it’s not me. I don’t buy the argument of “it makes code more self-evident/documenting”, because jumping to a definition and seeing

interface AffineTransformation ...
class RotationMatrix ...

is just as readable (moreso I think because of the conciseness) as

interface IAffineTransformation ...
class CRotationMatrix ...

I think I would shoot myself in the face if I had to program with CRotationMatrixAffineTransformation.

Module Names

Module names are another gotcha that I don’t like. Now, I’m not a regular Java programmer, so tell me if there’s some awesome reason that I’m overlooking.

I’m currently working on some Lucene and Nutch code. They are (respectively) defined in modules org.apache.lucene and org.apache.nutch, and when I play in an 80-character-wide ssh account, the pathnames tend to wrap around my screen unless I start symlinking everything.

It’s all nice and great to know that my particular version of Lucene came from org.apache and not com.ibm, but I think I’d much rather just use the module lucene without organizational domains getting involved.

If a codebase is transferred between organizations, they eather have to deal with the headache of renaming everything and breaking backwards-compatibility, or keeping names as-is and just knowing that the “current controlling organization” is wrong. What if I downloaded something from Apache’s website that still referenced com.ibm modules (maybe originally developed by IBM and then donated). The license is in a big JavaDoc at the top of every page, I don’t need to see it referenced everytime I need to import a module.

it adds unnecessary levels. Am I really worried about multiple groups creating lucene modules and having to deal with both of them? Not really, because of the concept of trademarks. If I create a software product called Vista, you can bet Microsoft will have something to say about it. It doesn’t matter if I’m com.wordpress.tombarta.vista (which is bad anyways, because I could change hosts anytime) or just vista. If I use the same name as some other product, people will be confused, and organizational namespaces won’t do a hoot about that.

Ok, I’ve bashed Java enough tonight. I really should find more languages to dislike, or more good things about the Java language. I don’t necessarily want Java to change… I just want other things to replace it. Like some ultra-sweet combination of Python and C++.

Edit: I’ve tagged this in Usability because I certainly consider a programming language to be a user interface the same way I consider a software developer to be a digital content creator.

I was originally going to write my thoughts about exceptions vs. error codes in the context of PHP development, but I got completely sidetracked. So instead, I’m going to talk a little bit about Java. Please remember that I have written a grand total of about 30 lines of Java code in the last 5 years, and I could very well be talking out of my ass. But I like to think I’m not an idiot, even if I’m not fluent in the language.

There’s a particular API wart I’ve seen with Java file I/O. I can’t find any references right now (sorry!), but I’ve seen it mentioned in the context of Java, RAII (which Java doesn’t have, yet some people do argue to that effect), checked exceptions, and general error-recovery strategies.

This (pseudocode) is the core of what you have to do when you open a single file in Java:

One thing I definitely haven’t seen is any mention of RAII in PHP. Nasty ugly weak PHP that’s fallen out of favor with the hip. I can’t possibly be the only persion to have thought about this, because one of the bugfixes in PHP 5.2 is properly ordered destructors. However, since most people aren’t developing for PHP 5.2, I suppose it’s not a surprise that the subject isn’t googleable. Anyhow, prior to PHP 5.2, objects in the same scope would be destructed in arbitrary scope (usually the order in which they were declared, which sounds backwards), so multiple resources acquired in one function couldn’t be interdependent without invoking undefined behavior. That’s bad, but it’s not a show-stopper.

I was looking at Doug Ross’s blog the other day; in particular his beef with exception-based programming. I definitely have more C++ experience than Java, so whenever I think about exceptions I think in the context of exception safety, transactional semantics, and RAII. Here’s my take on what he said:

Exceptions tacitly encourage abdication of cleanup responsibility… But the problem is, very few people are disciplined enough to set try, catch and finally clauses for every single method.

I would never expect every single method to have a try/catch/finally block associated with it. In my opinion, that’s a waste of the exception model, and severely limits its advantages. When I layer code, I want to consider sources (in the dark hearts of libraries) and sinks (in the application interface or controller) of exceptions. I try to minimize the exception handling (try/catch/finally) performed anywhere in between.

Wait! How can less exception handling produce better results? In C++ at least, the answer is “stack management”. Local objects (anything not allocated with new) are created on the stack. If an exception occurs, the stack is unwound, calling destructors for all instantiated local objects. If all of my setup code occurs in C++ object constructors, and all of my teardown code occurs in the object destructors, then the compiler will essentially write my try/catch/finally itself. I don’t have to worry about setting flags for what teardown needs to complete; that information is implied by the objects on the stack.

If readInto raises an exception, the destructor of openFile will close the file properly, and I don’t need to worry about it whenever I try to use files. There’s definitely a benefit to be able to avoid this boilerplate checking. The associated cost, then, is creating objects like openFile that act as sentries on my stack. Read the rest of this entry »

Ok, so I should specify right off the bat my opinions about Java. I don’t like it, and many of its shortcomings more pronounced due to my experience with Python. I don’t like forcing a coupling between object and file hierarchies. I don’t like silly typing rules casting everything under the sun to an Object. I don’t understand the hype around JDBC: I expect my language to be bindable to my RDBMS, and I know that any advanced database programming will require sacrificing portability. I don’t like hearing advocates brag about the “write once, run anywhere” mantra when I know that not all JVMs (or JNI modules) are equal, or possible. I know Python’s not perfect either, but for the sake of my argument let me use it simply as a foil: