Well, IcedTea now has ppc and ppc64 support out of the box. aph pointed out that, since I committed it the very instant it built, what is there is essentially a record of the bare minimum of what is needed to bring up an interpeter-only OpenJDK on a new platform and he suggested I write an overview of what I did, to serve as a guide to future porters. So here goes…

Gary’s guide to porting IcedTea

The first thing you need to do is check out a copy of IcedTea and patch the build system so it knows about your platform and will build without JITs on it. These changes are grouped into two patches, icedtea-ports.patch for the former and icedtea-core-build.patch for the latter. What you need to do is remake these patches to include your platform as well as ppc. I suggest you submit them for inclusion at this stage, so you don’t have to repeat this step every time IcedTea is updated with a new OpenJDK build.

Once you’ve done this you need to populate the ports directory with stubs so you can get to the point where libjvm.so compiles and links. If your platform is some sort of Linux then you’re in for a treat, because when I started on the ppc port I had the idea that I would do ppc, s390 and ia64 all at once, from the same codebase, and I wrote a templater to manage it for me. This never happened, but I coded in the templater right until the IcedTea import so you can use it to generate a pretty decent set of stubs.

The templater lives in contrib/templater. There’s some notes on it here, but you shouldn’t need them; basically, if you’re porting to a platform other than s390/s390x or ia64 then add it to the tables at the top of generate.py, then:

python contrib/templater/generate.py your_cpu

The templated files will give you a head start, but you’ll have to fix them up a bit. Partly this is because IcedTea will have moved on since my initial import, and partly this is because as I progressed with ppc it became more and more obvious that doing ppc, s390 and ia64 all at once was a pipe dream and I became less and less concerned with getting every #ifdef PPC perfect. There will be some PPC-specific code, and there will be some missing methods. Every time a build fails, stick in an Unimplemented(); and try again.

Eventually you will be at a point where libjvm.so compiles and links, and the ecj-bootstrap part of IcedTea will complete. Looking at the logs this took me two months — 300 man-hours give or take — but with the templater you could be there within a week.

At this point you may get a segfault. Your first Unimplemented() has been hit, which caused another Unimplemented() in the error reporting system. Temporarily “simplify” VMError::report_and_die() as described here and you will hopefully get your very first real live Unimplemented() message. Start implementing…

The first big bit you’ll hit is ICacheStubGenerator::generate_icache_flush. If you’ve avoided writing assembler thus far there is no getting around the fact that you need to write some now. At this point I implemented enough of an assembler for an unimplemented macro that called report_unimplemented from assembled code in exactly the same way as Unimplemented() does from C. Whenever I hit an Unimplemented() in a code-generating function I simply replaced it with __ unimplemented and continued, and I suggest you do this too.

Surprise! The very next Unimplemented() you hit will be the one you just wrote: the bit that generates the icache flush stub immediately calls it on itself. You really have to write it this time.

After that the next big thing is StubGenerator::generate_call_stub. The code this generates — the call stub — is used whenever C code calls Java. Within the interpreter certain conventions are employed when a method is called: a pointer to the method is in this register, and the stack frame is arranged like so, with the parameters at the end, and a pointer to the parameters is in that register. And so on. The details of this are your interpreter calling convention. The call stub’s job is to take a pointer to a method and an array of arguments and translate them into your interpreter calling convention. It creates what looks like an interpreter stack frame, fills in the relevant registers and jumps into the interpreter.

Before you can write your call stub you need to design yourself an interpreter calling convention. I described mine (more or less) here. The exact detail of this is up to you, but the state-monitors-stack order within the frames is important. Methods need to be able to allocate more monitors and extend the expression stack as necessary. You don’t want to move the interpreter state every time, so you put that at the bottom. You can’t move monitors without a safepoint, so you put those next. And you can move the expression stack whenever you like, so you put that last.

Once you’ve designed your calling convention and written your call stub you will be in the interpreter. For me this was another six weeks’ work, but it took much longer than it could have because the C++ interpreter (the code that the call stub was calling) was not released until b20. I had to try and design the calling convention blind, and a lot of stuff simply didn’t make sense.

So, you’re in the interpreter, the C++ one not the template one. Every method in Hotspot is defined by a methodOop, and each methodOop has a method entry which is the address of the code that will execute the method. Your call stub just jumped to the method entry of your first method, java.lang.Object.<clinit>. It’s an interpreted method, so you ended up in the interpreter’s normal (as opposed to native) entry, as generated by InterpreterGenerator::generate_normal_entry. To implement this you need to understand how the C++ interpreter works.

The normal entry in the C++ interpreter goes by the name of the frame manager. The guts of the C++ interpreter is the method BytecodeInterpreter::run, an enormous switch statement that takes care of pretty much everything. What it can’t take care of is all the stack frame stuff, which is where the frame manager comes in. The frame manager does work for BytecodeInterpreter::run but the relationship between the two is kind of reversed: rather than calling the frame manager to do work BytecodeInterpreter::runreturns to the frame manager with a message to do some work. The frame manager then does the work and calls BytecodeInterpreter::run again with a message that it did what it was asked to. Interpreting then resumes.

So you need to implement a frame manager. I recommend judicious use of __ unimplemented here: you don’t need to implement everything at once, and rather than writing a bunch of code that won’t get executed til later you may as well write just what you need. That way stuff gets tested immediately it’s written.

The first instruction in java.lang.Object.<clinit> invokes a native method, so the next thing you need to write is InterpreterGenerator::generate_native_entry and its associated signature handlers, result convertors and result handlers. Again, don’t try and write them all at once, stub out what you don’t need with __ unimplemented and continue.

Some time around now java -XX:TraceBytecodes will become your best friend.

The next thing you’ll probably have problems with is System.currentTimeMillis(). This is the first native method that actually returns something, and getting that something into the right place in the expression stack is fiddly.

At some point after that you’ll find native methods that are passed objects and that return them. These are pointers. Pay attention: the things you are passing to and from native code are not the pointers themselves but pointers to those pointers — except if the pointer is NULL in which case you pass NULL and not a pointer to that NULL. This tripped me over every single time.

Somewhere around 1400 bytecodes everything will go multithreaded. This is where you’ll find out your object locking doesn’t work.

Hello World is a little over 300,000 bytecodes. The great thing about the C++ interpreter is that there are points where implementing one little thing will suddenly have you interpreting orders of magnitude more bytecodes. Once you have one bytecode executing you’ll have dozens. You’ll add the stuff to return from native methods and have hundreds, then you’ll add the stuff to do object locking and have hundreds of thousands. From Hello World to javac and Ant is a pretty small step. And then you’ll be pretty much where PPC IcedTea is today.

I should thank Steve Goldman for tirelessly explaining all this to me and answering all my stupid questions.

Posted by gbenson on Friday, November 16th, 2007, at 11:42, and filed under Uncategorized.

{ 2 }

Comments

Would it be possible for you to make binaries (FC8) of your ppc port available somewhere?

I’m currently looking for a solution to have Java 1.6 on my PowerBook G4, but I don’t want to reinstall a new OS before testing it first… I’ve tried to compile it myself using the FC8 Live DVD but it was way too slow…