The Life and Times of Jeff Squyres

General Archives

July 12, 2000

First entry

So here we are.

I'm copying an idea from several undergrads (Pete, Arun, Perk, Brian -- in no particular order; I have no idea who had the idea first). All besides Evil Brian have their own custom journal scripts (Evil Brian has his hosted on a .com). Similar to batch queueing systems, there's a complciated heirarchy of who is derived from whom, and who stole what features from who else (if the grammar is wrong, deal).

When I decided to get into "that crazy journal thing that all the wacky kids are doing these days", Pete gave me a copy of his journal code, and I thought to myself, "Wait, this can't be right. It's under 100 lines or so. Nope, can't be right." So what did I do? I wrote my own.

One week and 1,887 lines of new C++ code and 875 new lines of PHP code later, I have my own journal system. It's chock full of features; I think it will even write simple Pascal programs for you. But lest we be accused to plagerism, let's give full credit of the other code bases that I stole from to make the jeffjournal package:

Shell client: 1,887 lines of C++ code

Back end web support: 857 lines of PHP code

GNU readline library: 21,222 lines of C code While readline actually set me back about a week (moral of the story: be very careful about including configure-generated C header files in C++ code), it is truly cool and extremely useful.

minime libraries: 11,585 lines of code This is my dissertation project. I pirated the use of the socket and console (i.e., readline) interfaces out of it.

So this is actually... well, it's a lot of code (can't do simple math anymore and am too lazy to fire up bc). Ok, this was just over the top. But what else are you gonna do with a DSL connection?

I plan on having some semblance of a journal out here for the world to see. Readers can expect to see gritty coding nuances, general musings on [un]reality, and lots of other boring things. Probably mainly boring things (I'm a geek, what do you want?).

Readers should not expect to get too many journal entries next week, and should expect to get none the week after that (I'm getting married next weekend; I've been verbotten to touch computers on our honeymoon -- what's a geek to do? Oh yeah... :-).

Jeff's Journal

Eric Roman mailed out an interesting project that he heard about recently: rexec. Seems to be a new project under the old name for transparant and secure remote execution from the CS folks at Berkeley. Printed out the paper; it should make a good read.

Spent the afternoon cleaning up the minime code -- I made bunches of changes to the socket and console routines to be able to write the shell client for the journal system (jjc).

Hmm.. just found an annoying bug in jjc: C-h C-h (i.e., hitting backspace twice) brings up the emacs -nw appropros list, but hitting C-g to abort the appropos list somehow makes jjc think that the emacs child has finished, and therefore jumps back to the prompt, but then seg faults and dies. Ugh! Gonna have to fix that one. :-)

Some guy mailed me today about parallel bladeenc today. Apparently, his company (www.scyld.com) is releasing their own MPI soon. He suggested that I add a two-line fix to parallel bladeenc that allows MPI_INIT to fail, and then allow it to procede in a serial fashion. This is a truly cool idea, actually. He was motivated by the fact that they support a "serial" MPI dynamic library that allows mpirun-less invocations of MPI programs. In contains stubs of all the MPI functions and simply fails (i.e., returns != MPI_SUCCESS) if you invoke any of them (e.g., MPI_INIT). Hence, if your code is smart, it takes the failure of MPI_INIT to mean that it should run in serial. So I made the quick change to parallel bladeenc; it'll go out in the next release (whenever that is).

Speaking of parallel bladeenc, I mailed Tord about a week or two ago asking questions about the MP3 format itself -- Jeremy Faller and I spent about half a day trying to make parallel bladeenc generate diffable output to serial bladeenc. We didn't succeed, and actually came up with many more questions than answers/solutions, but we understood why parallel bladeenc's output is different than serial bladeenc's. The parallel output is actually probably lower quality -- something we'd like to fix. But we can't do that until we understand the output format of MP3 more... Still waiting for an answer from Tord. :-(

July 17, 2000

Jimmy has fancy plans, and pants to match

More wedding stuff today. Spent all day waiting for a friggen' package from UPS that never arrived (they tried to deliver it Friday, left a note saying that they'd deliver it Monday). Ugh. Got lots other wedding planning stuff done, though.

Helped Don Peterson with some C++ stuff today. I sent him a bunch of code (that I actually tested), and then discussed mods to this the rest of the day (well, actually discussed my typos in the mails mostly
-- 'cause I was sending him mods that I hadn't tested -- ugh!).

Talked to the DoD investigator that covers this area again today. An undergrad who graduated from CSE a year or two ago is in the Air Force (did the ROTC thing here at ND) is being assigned to a "sensitive" job in the Air Force. I've talked to this investigator several times over the past several years about various other students who I knew who went on to various DoD/DoE jobs. Pretty standard stuff, actually -- not as impressive at it sounds. :-)

He's a nice guy. I've talked to him about his daughter (she's a Signal Corps LT, like me) and various other military stuff (he's ex military himself -- a warrant officer). We talked about the person he came to talk about, and then we chatted for a while before he left.

July 19, 2000

Donkey, donkey, donkey, donkey, donkey

Whoo hoo!!

Here we go into the home stretch... Journal readers should not expect another journal entry for about 1.5 weeks or so. It's Wednesday before my wedding, and I likely will only be in sporatic contact with internet-enabled computers (a new innovation, so I hear) for a while. There's much to do, and little time to do it!

vacation has been enabled, and I've proverbially passed the buck to others for the next 1.5 weeks.

My wedding day comes Friends and family to South Bend Screw the rest of you!

I wasn't an english major for nuttin'. Did I mention that I'm moving to Looieville?

July 31, 2000

To the moon!

Back to reality.

What a week. This'll be a pretty long journal entry, as I have abbreviated entries for the entire past week in this one entry, as I have had little to no computer access the entire time (and I wanna know who bet that I would check my e-mail while on vacation -- they lost!). Some notes are kinda sketchy 'cause I didn't start taking journal notes until Friday or so. You'll deal.

Friday, 21 July, 2000

T-1 day. I spent the morning in the office hurriedly trying to finish the wedding program. My Big Thing was that the music had to be in the program (i.e., not just the words). Tracy's church in Looieville only puts the words in the Sunday programs, and it really annoys me because I don't know all their songs, and it makes it really hard to sing them. Since we have a lot of non-Domer folks coming to the ceremony, I wanted to put the music in the program.

So here's another problem: I decided to do the program in MS Word on the assumption that Tracy would be able to edit it as well. i.e., I could do some work, e-mail it to Tracy, have her make some edits, send it back to me, and repeat as necessary. Bad assumption on my part -- Tracy's MS Word couldn't read my file (i.e., it came out at garbage), even though they were the same version of word.

Know what I like about Microsoft products? Nothing at all.

Also particularly annoying is the scrolling behavior when in two-column landscape mode (that I used 'cause the programs were folded in half). If you go to the bottom of the left column and hit the down arrow, one would expect to go to the top of the right column -- i.e., go down with the text. Nope -- you go to the top of the left column on the next page. There's other non-intuitive (IMHO) scrolling like that was well. Needless to say, I was strongly wishing that I had just done the whole thing in LaTeX by the end of the ordeal.

I ended up scanning in the music and placing them in the document. It all turned out ok in the end, but I think that Word really made it take longer than it should have. Ugh!!!

Renzo (the best man) and Lynn (his wife) picked me up and we ran to Kinko's to run off the programs (I had some nice paper that I wanted to use). Kinko's could do it by 9pm at the earliest, but we needed them at the rehearsal at 5pm, so that was no good. This was kind of frightening, because Kinko's has never failed me before.

So we went to Copy Max (of Office Max). They were able to do it just fine. Dr. Romi was working, so I said hi to her as well. While they were doing it, Renzo and I went to pick up our tuxes at Bernardo's. Both of us needed slight alterations to our tuxes (which they do on the premises). While we were waiting, my dad called and was surprised when I reminded him to pick up his tux (<sigh> --
good help is so hard to find these days!). So I told him I would pick him up shortly and get his tux with him. John Shipman (another groomsman) also called during this time, so I told him I'd pick him up as well.

Renzo and I finished, swung by the Marriott and picked up my Dad and John and promptly went back to Bernardo's. We ran into Mark Payne (Tracy's brother, another groomsman) and her father getting fitted for their tuxes as well. After getting all of that straightened out, we ran by Copy Max and picked up the programs. John's response to the text that I wrote about him in program was, "Jeff, I have two words for you: rat bastard." BTW, be sure to ask him what "wizard fries" are. :-)

I got dropped off at my apartment so that I could change and go meet Fr. Hesburgh (Fr. Ted wanted to meet with Tracy and I for about an hour before the ceremony and have a chat). Tracy met me at his office on the 13th floor of the Hesburgh Library right at 4pm. While we were waiting, I looked around his waiting room and noticed a corner of it completely filled with military stuff. I saw a big picture of an SR-71. Apparently its the same SR-71 that he flew in and broke mach 3.3 in. This guy has had an amazing life, and is still a really down-to-earth guy.

Tracy had never met him before; I'd met him a handful of times. We had a nice chat, and Fr. Hesburgh gave us his collected wisdom of marriage from his life (he was a marriage counselor for many years, and has probably married thousands of couples in his time). I'm really glad that we were able to have him preside over our ceremony in the Basilica at Notre Dame -- it was way cool. If you've never met Fr. Hesburgh, I highly recommend making an appointment and just going to have a chat with him. He loves to meet with people (particularly current students) to just shoot the breeze. He's got some amazing stories and is probably the most famous person you or I will ever meet.

After our chat, Tracy and I went over to the Basilica for the rehearsal. The Basilica staff is very Draconian about schedules --
you have 45 minutes for your rehearsal, and that's it (which is completely understandable -- 4 couples get married there every Saturday; it takes a finely tuned machine to keep it running smoothly). We ran over a bit, but they were not able to interrupt Fr. Hesburgh (it's his church, after all!), which, I have to admit, we were kinda counting on. :-)

The rehearsal dinner was at Tippecanoe Place, and went very well. My dad gave a really nice speech at the end, and gave me his self-winding chronometer (a highly tuned watch, for all you laymen) that he got from Luzern, Switzerland (which, coincidentally, is where Dr. Lumsdaine's family is from, and is the name of 8 machines in the LSC) when he was a teenager. He gave a good speech which included the following statistic:

There are approximately 90,000 living ND graduates. Jeff has been at ND for the graduations of about 25% of them.

Wow -- if that doesn't date me, I don't know what will!

John, Renzo, and Darrell came over to my apartment for a cigar and a beer or two to calmly round out the evening. We hung out by the smoking table for perhaps the last time. There was a party going on in the apartment above mine, which was very amusing. Jeremy Faller and Kevin Barker their respective weekend significant-others showed up after a while, too. So we were all hanging out by the smoking table, which was fun.

After everyone left, it was just Kevin, Danielle, and me left at Chuck's old place. I packed for the cruise, and laid out my clothes for the wedding tomorrow.

Saturday, 22 July, 2000

Ms. Tracy Payne and I were married in the Basilica of the Sacred Heart on the campus of the University of Notre Dame on 22 July, 2000. Renzo and Lynn came and got me around 7:30am. Did a bunch of pictures before the ceremony (my parents were late... <sigh>). The wedding ceremony went well (aside from a little confusion about my name... :-). Pictures were good, too, but very numerous (a little rushed in the church, 'cause Hesburgh's homily went a bit long, but hey -- it's his house, he can do whatever he wants! Plus, it was a pretty nice homily :-). Oodles of pictures down in the grotto and whatnot, and then a limo with Renzo and V to the reception (Marriott, downtown South Bend).

The reception was a blast. It was way cool to see so many friends and family all in once place (thanks, everyone, for coming!). Started with a typical receiving line followed by dinner (ok, it was really lunch, but you have to s/lunch/dinner/g for a reception -- it's a protocol thing). Gotta love being at the head table -- you get served first! There was an open bar, etc., etc. Renzo gave a good best man toast. Cutting the cake went really well, too -- Tracy and I did an impromptu (and very minor) cake-on-the-nose deal that apparently went over pretty well (many "aww..."'s and "that's cute"'s, etc., etc.). When I was eating my piece of cake, however, Jeremy Faller had the verve to say right in my ear, "Hey Jeff... seafood!"

As a Pavlovian response (no, really!), I turned around to face the crowd, and did seafood with my wedding cake. Tounge out, cake/icing everywhere -- the whole 9.7 yards. True class all the way (Tracy was so proud. No, really!). Many flashbulbs went off, so I had better get a few copies of those pictures.

Sidenote: the only thing that I knew about my wedding for the past several years was that there was going to be free alcohol available during the whole schameel (Irish catholic and all that). We had an open bar before dinner, freely flowing wine during dinner (reference: Jesus/"that Cana wedding"), and open bar again after dinner. I mention this only because I was particularly proud to see the whole ND crowd cheer and stampede for the bar as soon as it opened again after dinner. I salute you, my fine feathered friends --
you inspire us all (reference: Bill McNeal/News Radio).

Many people danced, which was cool. The DJ did really well --
played all the typical ND songs which kept everyone dancing (except for the Madonna song, which cleared the floor -- and I again blame Faller [guilt by association]). I'll spare the details here, but I danced a good deal of the time, and still managed to greet most of the guests at least briefly.

After the reception broke up, we had a pizza-n-beer party (again in the Marriott) a few hours later in which a good number of people showed up (more than we anticipated, actually -- we ran the Marriott out of pizza, so we switched to hot wings). More way coolness, 'cause the setting was much more informal than the reception.

Sunday, 23 July, 2000

After all that, Tracy and I had to get up at 3:45am to catch our 5:15am flight to Miami (V drove us to the airport). Aside from being early, the flight went well, and we boarded the Royal Caribbean (RCCL) cruise ship Voyager of the Seas. It's an amazing ship. It's the largest cruise ship in the world (although not the largest ship in the world -- there's still a few oil tankers that have that prestigious honor). Here's some impressive stats about the ship:

It has more crew space than RCCL had on their entire first cruise ship.

I think there were 3200+ passengers on this trip; 108 honeymoon couples.

Voyager is several times larger than a US nuclear aircraft carrier.

It's so big that it has 2 wake-reduction generators under the ship to limit the size of its wake while in port.

It has no rudders -- it has three propellers, two of which can rotate 360 degrees to steer the ship.

Voyager is a most excellent example of Engineering with Extreme Prejudice. Tracy and I actually borrowed my friend Darrell's 3-tape video series about the design and building of the ship. My deep admiration and respect goes out to all of the designers, architects, and builders.

So anyway, we arrived in Miami with no problems (although we were dead tired), and got to the boat via a shuttle bus. Did I mention that it's a big fricken' boat (hitherto referred to as BFB)? There was a monstrously long line for check in, but it actually went pretty quickly, and we got on the boat in fairly direct order.

After wandering aimlessly for a little while, we found our cabin (#7572). It had a little couch, mini table, dresk (i.e., combo dresser/desk), several large dressing mirrors, a mini safe, a closet with several shelves, a bathroom, a queen-sized bed (or possibly king-sized -- we never did figure that out), 2 nightstands, a phone, and a balcony. The balcony had two chairs and a mini table. The amount of furniture makes the whole arrangement sound larger than it really was; it was actually fairly... cozy (we're convinced that the cabin was actually built around some of the larger pieces of furniture [reference: Engineering with Extreme Prejudice]). But it was ours for the week, so it was perfect.

We wandered around for a bit (did I mention that this was a BFB?) and had lunch in the Windjammer Cafe.

Sidenote: It seems that they use the same names for things on all RCCL boats. Tracy and I took a cruise on Granduer of the Seas a few years ago, and it also had a Windjammer Cafe . Indeed, many of the other cafes, bars, pools, etc., etc., had the same names on Voyager as they did on Granduer. Coincidentally, the Cruise Director (i.e., the main PR face) was the same guy from our previous cruise on Granduer. This must have been a promotion for him --Voyager has been at sea for less than a year (launched in November of 1999), and apparently RCCL took the brightest and best from its other cruise ships to staff it.

Sidenote: Food on a cruise ship is amazing. There's no end to the supply of it and it's all free. Drinks are just about the only food that you pay for. Sodas and regular stuff like that come free when you're having a meal, but you have to pay for them when you get one from a bar, for example. Alcoholic beverages always cost money. But you pay for everything with a cruise charge card (which also serves as a room key); no cash is used on the boat. Pretty handy, actually. And it works out well for RCCL, because you have no concept of how much money you're spending. Anyway, cruise food is never ending; there is really good food available just about 24 hours a day. It's a truly amazing feat of logistics, actually --
providing chef-level food (i.e., with all the little garnish decorations, ice sculptures, people in tall white hats, etc.) for so many people in various locations around the BFB around the clock. Let's call it Cooking with Extreme Prejudice.

We had a mandatory muster drill before the ship sailed. This is apparently required by maritime law in an attempt to prevent the need for movies like Titanic from ever being filmed again. All passengers meet on the muster deck underneath their life boat and stand in rank and file to for an attendance check (kinda like the Army). Our muster captain's name was Regina. Even though it was 4:30 in the afternoon, it was hot in the Miami port. The passengers were somewhat restless, but we got through it.

There was a lot of activity in the port while we were sitting there, waiting to sail; powerboats, jet skis, and even a water-based airplane were going hither and thither. Some powerboat even sped by the entire Voyager and mooned the entire BFB during the muster drill. Needless to say, this involved having his ass in the breeze for probably a full minute or so as his boat sped down the length of the BFB. True class!

We got a package with our cruise that entitled us to a bottle of Champagne in our cabin upon sailing, so Tracy and I enjoyed it on our balcony while sailing out of Miami Port. It was amazing to see how many powerboats, jet skis, and people on shore stopped to wave as we sailed. Indeed, a large number of cars pulled over on the highway to watch us go, too. Since there are a non-trivial number of cruise ships that have Miami as their home port, you'd think that Miamians would be jaded to seeing the cruise ships set sail. Apparently not. But this does raise the question: why is the fundamental human response to seeing a cruise ship sail by to wave? Without fail during the entire week, whenever we sailed by some group of people, one or more of them would wave. Is this a Pavlovian response? Have all of us, in some prior life, been conditioned to wave at cruise ships as they go by in order to receive a food pellet? Maybe it's just Waving with Extreme Prejudice.

We also discovered that our room's TV actually functioned as an interactive system that provided not only tons of information about our scheduled island stops, but allowed us to order room service, check our cruise charges, order excursion tickets, etc., etc. Pretty neat, actually.

The main dining room serves dinner in two shifts: main seating and second seating. Tracy and I opted for second seating. It is typical for cruise ships to ask a few demographic questions about you when you buy the ticket for the purposes of (among other reasons) finding compatible people to seat you with during dinner. However, there was some kind of mix up with our table. The matrid'D (whatever) took us to our table, but it was filled to capacity with 80 year old ladies. So they had to move us to a different table (which wasn't a bad thing
-- while I personally have nothing against 80 year old ladies, we were glad to sit with people closer to our own age). Amazingly enough, they did this with big paper maps of the entire dining room rather than on a computer. We got moved to table 476 with the following people (whose names we did not remember at all on the first night):

Randall and his 8 year old son Blake from Texas. Blake (who appeared to be both highly intelligent for an 8 year old as well as highly annoying), only showed up to dinner once that week, though, and Randall only showed up twice. Indeed, you can get food just about anywhere on the boat -- the main dining room is not the only place to get dinner. I guess they didn't like us. Bah.

Marty and her 18 year old son John. Friendly folk from the San Francisco area.

Tina and her 14 year old son Peter. Also friendly folk from New York city.

Mercedes and her ?15? year old daughter Daniella (not sure I spelled those right) from Florida. Nice people, but kinda quiet. They also usually sat directly on the other end of the table, so Tracy and I didn't get to talk to them much.

All in all, a pretty likeable crowd. Not exactly our age bracket, but much closer than the little old ladies at our real table. Tracy and I thought it highly ironic that we, the honeymooners, were at a table of divorcees with their children (indeed, we were pegged as honeymooners on the first night), but it actually worked out really well. As you'll see below, we got along quite well with everyone and had a great time all week. Indeed, we were frequently among the last to leave after dinner every night.

Monday, 24 July, 2000

This was a day at sea en route to our first destination: Labadee, Haiti (see Tuesday). Tracy and I did nothing, and did it all day. I mainly read Cryptonomicon while Tracy sunned on deck (while I did come back with a little bit of color, I'm not much of a sun worshiper. I sometimes come out of Cushing at ND at night and am surprised to see that entire weather systems have moved in and out during my day at work, completely unbeknownst to me).

The ship was moving at 17 knots which meant that it was really windy on deck. Some things that I have noticed so far:

Many families are using walkie talkies to communicate with each other on the boat. I wonder how well they work -- i.e., if you're in the depths of the BFB, do they really work well enough to talk to your mother on the upper pool deck?

The staff on the ship use 2-way phone/walkie talkie things to communicate with each other. And they even work when we're out at sea, miles from any possible commercial cell coverage. So do they have their own cell on the boat itself? Hmm. Interesting.

The rank of the officers on the boat is widely different: the lowest seems to be indicated with shoulder boards that have a narrow white strip on a wide yellow stripe. But the shoulder board strip combinations are widely different after that -- different widths of yellow and white stripes, sometimes white on yellow, sometimes just plain yellow, etc., etc. I'll try to figure this out over the course of the week.

All several hundred cash registers on board the BFB (the various shops, the bars, etc.) all use flat screen touch-sensitive monitors. No keyboards. This must have cost a large chunk of change! But it seems to work well for them -- very little footprint and no additional keyboard, and you can do all data entry with an index finger. Didn't really get a chance to look at them (they're inevitably always facing the other way), so I don't know what OS they were running, but it's probably either some flavor of Windoze or a custom OS/application. Probably 'doze.

Had lunch at an on board Johnny Rockets (reference: cruise food, above). Apparently, Johnny Rockets is a chain of 50s-style burger joints, complete with the staff in white aprons, paper hats, 50's music blaring out of jukeboxes, etc., but I'd never heard of them before. Had a good burger and shake (but it was not a $5 shake, mind you). I think the most surreal point of my Johnny Rockets experience was when the whole staff got up to do the Hand Jive when it started playing over the jukebox. Let me clarify exactly why this was surreal: the entire staff was multi-ethnic -- not a single soon-to-be-DWM (i.e., no Caucasians) among them. This is not intended to be a racist statement -- it just struck me as odd to see the Hand Jive, in which you picture John Travolta and a bunch of other decidedly white 50's males with greased back hair and leather jackets, performed by people from other countries (literally; every staff member's nametag also identified the country that they were from --Voyager's crew was from something like 50+ different countries). Their English was markedly better during the song, too; is that how America is known and identified? By show tunes from Grease? If I ever get mistaken for a foreign spy and am interrogated by the CIA, am I going to have to (in addition to knowing all the world series and superbowl winners from the past 100 years) be able to sing any Grease show tune upon command?

We also attended a wine tasting in the afternoon. We got to sample nine different wines, which was pretty cool. Most of them were good, but I didn't like two of them. The people at our table (don't remember any of their names) immediately pegged us as honeymooners as well.

We went to the show before dinner -- an "intro" show, which had several acts, all punctuated/MC'ed by the Cruise Director.

Dinner attire was "smart casual" -- I wore my new suit. John showed us a game called "spoons". It's one of those "try and figure out the rules" kinds of games, so I won't go into detail here. I happened to figure out the rules first, which was irritating to the others at the table (reference: cocky, flippant, arrogant). I then introduced everyone to "Big Black Frying Pan" which, although different, is along the same lines. Tina was about ready to murder someone by the end of dinner because these games can be quite frustrating when you can't figure them out, but much fun was had by all.

Tuesday, 25 July, 2000

We arrived at RCCL's private area on Haiti: Labadee. In the words of a stand up comedian that we saw on the boat, "Labadee is apparently the Haitian word for 'damn hot'." Labadee is a little peninsula with nice beaches and all the usual water sports. Tracy and I rented a jet ski and took a tour several miles down the Haitian coast with it.

Neither of us had ridden a jet ski before, and it was BIG fun. We had to watch a Yamaha safety video before skiing off, which featured a perky US Coast Guard officer giving all kind of rules and safety tips. I found this pretty ironic, since we were in Haiti.

I drove down the coast, and Tracy drove back. Did I mention that jet skis are way fun? (reference: Top Gun movie, "I feel the need... the need for speed!", reference: Fr. Hesburgh's SR-71 flight) Our guide pointed out some nifty things about the island, all of which I promptly forgot. For safety reasons, they had us drive in a single file line, [supposedly] 100 yards behind each other. We got suck behind Slow Redhaired Lady twice, which was kind of a drag (pun intended), but other than that, the speed was great.

Jet skis are not hard to drive: just squeeze the trigger/throttle, steer with handlebars, and go. The only trick to get is that the steering is waterjet-powered, and can be delayed by fraction of a second or so -- something you have to get used to and compensate for.

The driver wears this harness thing that has two hand grips on the side for the passenger to hold on to. Since I drove down first, I had the harness on first. When we switched half way through the trip, we were somewhat rushed (since no one else switched drivers), and Tracy didn't adjust the harness at all, and it fit very loosely on her (there's just more of me to love, that's all!). Hence, the hand grips were pretty useless to me, and Tracy almost bounced me off the jet ski a few times. Much, much fun. I highly recommend it.

After the jet ski tour in the morning, we went back to the ship, got lunch on board (although most of the food service had been temporarily moved to the island), and went back and lounged on the beach for the rest of the day (i.e., I sat in the shade and continued the Cryptonomicon).

There was a "repeat cruiser"'s reception where they were passing out Champagne like water, so Tracy and I naturally attended. Got a closer look at the Captain's rank: 4 medium-wide yellow stripes with a big yellow diamond at the top. I think there are a small number of other ranks that have yellow diamonds as well.

The dress at dinner was "formal". I had rented a tux from the ship to wear that night (they tell you ahead of time that two dinners will be "formal dress"). This was Blake's one and only appearance at dinner, and he annoyed everyone by figuring out the spoons game within minutes (I told you he was smart!).

We went to the show after dinner, which was a stand up comedian. He was ok -- somewhat repetitive, but we laughed.

Sidenote: friends of mine mentioned that they didn't want to go on Voyager because it's just too many people -- the tendency to wait in line for things would be just too much. However, I've noticed that we rarely wait in lines very long. They seem to have the crowd/traffic control issues worked out pretty darn well (reference: Engineering with Extreme Prejudice). Yes, there are billions of people around, but once you get past that, it doesn't really impact much. There are, however, a noticeably larger number of children on this cruise than there were on our last cruise (many other people have remarked on this as well).

When we returned to our room, we found a manta ray made of towels on our bed. Very amusing and rather cute -- it was made by the cabin steward when he made up our room. I think our cabin guy from our last cruise did something similar as well. A friend of mine told me that when she went on a cruise, their cabin steward would make crash-test dummies from their clothes. For example, when they came back from dinner one night, there was a pair of legs and feed sticking out from one side of the bed and a body, arms, and head sticking out of the other (all made with their clothes), making it look like the bed had fallen on the crash-test dummy . Funny stuff.

Wednesday, 26 July, 2000

Arrival at Ocho Rios, Jamaica.

We slept in and got room service breakfast (reference: cruise food). We lounged around our balcony and continued to explore the ship before our afternoon excursion into Jamaica.

We signed up for a yacht tour that left right from the same dock as Voyager. The first stop was the Dunns River Falls. The falls were actually impressive enough -- a gently sloping 900 feet in the vertical direction, quite beautiful, and you actually can climb the falls (the main attraction). However, the climb was actually somewhat frustrating, because you are limited by really slow people in front of you, so you can take about 3 steps and then have to wait. So we both walked away from there with a less than "that was awesome" feeling.

The yacht tour continued on to some waters off the coast of Jamaica for snorkeling. We were further annoyed that they didn't have enough snorkel masks for everyone on the boat, and Tracy and I had to wait quite a while for someone to finish before we could go snorkeling. And then the water was really choppy, and Tracy got a little queasy. So all in all, the yacht tour was kind of a bust.

The BFB set sail again around 5pm, heading for Cozumel, Mexico. We went to a honeymooners reception that night, where, again, Champaign was poured freely (who can ignore free alcohol?).

When we returned to our room, there was a towel elephant waiting for us.

Thursday, 27 July, 2000

Another day at sea, this time en route to Cozumel, Mexico. We basically did nothing all day again; I continued reading Cryptonomicon and Tracy sunned on the deck.

We went to the Bingo game in the afternoon. They play all week and have a rolling jackpot (more below). We didn't win at all (they play 5 games in one session), but it was fun anyway (must be deep-seated Irish/Catholic roots in me that enjoys a good rowdy, full-contact game of Bingo -- Bingo with Extreme Prejudice).

Dinner attire was formal, so I wore my tux again. I had a blue paisley vest this time, though, instead of the standard black cumberbund that I wore last time. We had a formal portrait taken too (same package as the champagne in our room when we first sailed). But we didn't go to the main dining room -- we went to the quaint Italian restaurant that you have to get reservations for (although everything is still free -- reference: cruise food). The food was excellent, and we got a nice bottle of wine with dinner.

Went to the show after dinner, entitled "Dreamscape" where we met up with Tina, Mercedes, and Marty. The theater is really quite excellent, and I haven't really talked about it much yet, so I'll describe it now. It's a 2-floor theater (main floor seating and a balcony), very nicely decorated such that you can easily imagine that you're in a mid-sized playhouse in London. The stage setup is very high-tech -- they can do many different kinds of effects and have tons of props, curtains, booms, etc. They even have an orchestra pit and movable sections in the state (i.e., in the vertical direction, which was handy during various portions of the shows). The sound booth was in the back on the first floor, and the lighting booth was in the back of the balcony (why do the lighting cronies always get shafted?). Full bar service on both floors with waiters/waitresses, which was nice.

"Dreamscape" was a bit trippy, but parts of it were good. My favorite part was several people dressed up in [apparently] velcro suits that would throw themselves up on a wall (Letterman-style) in various shapes and letters and whatnot. Very amusing. There was also a stand up comedian at 12:15am that we wanted to see, but we had to get up early for our tour in Cozumel, so we didn't go.

I accidentally put the "do not disturb/please make up room" card out facing the wrong way -- it said "do not disturb" so we didn't get a towel animal this evening. But we heard that it would have been a little dog.

Friday, 28 July, 2000

Arrival at Cozumel, Mexico.

We signed up for a rather lengthy tour of the Tulum ruins -- a Mayan city. This is actually on the Mexican mainland, not on the Cozumel island. So we took a ferry to the mainland, and a bus to the city itself. Our tour guide took us around the city a bit and told us all about it. Very cool stuff, actually (note to self: gotta investigate the Mayan numeral system -- the Mayans were really into math and calendars in their lifestyles and religion). Only a few buildings were left standing, but you could walk around much of it.

This was apparently the last city that the Mayans built, and actually enclosed it within a wall (which is evidently unusual for them). They did some amazing things with sunlight -- they made specific holes in walls and buildings so that on the equinox and solstice, the rising sun would appear in specific places in rooms, walls, etc., etc. Truly, the entire city was built with fundamentals and exactness that required Engineering with Mega-Extreme Prejudice. I wonder whether many modern contractors could achieve the level of exactness that the Mayans did (piping sunlight through strategic holes in walls and buildings across the entire city, for example --
amazing).

The city was directly on the coast, too; there were paths down the cliff which the city was built on to walk down to the beach (important for sea trade, apparently). They even had a light house to warn for reefs and whatnot.

After returning from the Tulum tour, Tracy and I ventured out to Cozumel itself for some shopping. I was looking for a good t-shirt, but came up empty (they all appeared cheesy to me. It's amazing how I'll take and wear any freebie computer t-shirt, but when it comes to buying one, I'm extremely picky). Tracy got a silver necklace. We walked around a bit and saw the waterfront of Cozumel, but then had to return to the ship before it sailed.

One surreal experience: on the approximately 3-5 minute cab ride from the BFB to downtown Cozumel, I saw 42 Volkswagen Beetles. Yes, 42 (and that's not even counting the VW busses). Not the new models -- the old-style VW beetles (and many of them were fairly new). Absolutely incredible. If you ever have a desire to get a VW Beetle, go to Cozumel. Apparently they still have a VW Beetle factory in Cozumel, hence, in an amazing show of local support, everyone proudly drives around in their locally-made Beetles yelling whatever it is that proud Beetle owners yell (in Spanish). Either that, or it's just amazingly cheap to buy a Beetle there.

Dinner dress was casual. I introduced Peter to the concept of placing a sugar packet on the handle of a fork (or spoon, but forks give straighter trajectories) and slamming down on the curved end to launch the sugar packet across the room. The heavier sugar packets work better, such as pure sugar cane sugar. It's actually amazingly hard to do right -- it's difficult to get any distance our of the sugar. It's a delicate balance of placing the sugar correctly on the handle of the utensil and hitting the other end just right to get any kind of distance. If you don't perform these steps just right, any/all of the following will happen:

the sugar packet will only go straight up (and therefore straight down)

the sugar packet will veer wildly off-course and end up in the soup of someone at an adjoining table

you'll end up launching your eating utensil across the table/room

What followed was a medley of sugar football, where just about all of us at the table tried to make field goals from as far a distance away as possible. I actually managed to make one down the length of our [fairly long] table into Marty's lap (a perfect 3 pointer, if I do say so myself!). The rest were comical attempts that usually ended up horribly wrong (oops) followed by our whole table pretending that nothing happened ("Jeez, I don't know sir -- we don't have any sugar packets mysteriously ending up in our soup. Must be a problem with your table; you should call technical support."), punctuated by waiters, wine stewards, or any other Person of Responsibility walking by. Great fun was had by all (mothers included!).

When we got back to our room, there was a towel monkey hanging from the ceiling in our room. The best part was that he was wearing Tracy's sunglasses. It was so funny that we had to take some pictures with it.

Saturday, 29 July, 2000

Another day at sea, this time en route back to Miami.

Yet another day of doing nothing (one of the important reasons we took this cruise -- to relax!). Much more reading of Cryptonomicon and jotting notes for this journal down.

We went to the afternoon session of Bingo -- the rolling jackpot was over $10k. It works like this: the last game of the session is always "cover all", meaning that you have to get every number on your board before you can call Bingo. They start the week with a coverall bingo jackpot of some value X (which is some complicated formula that has to do with how many people play, the number of letters in the Roman number representation of number of seconds since midnight on January 1, 1970, and number of revolutions the engines have made since sailing away from Miami). You win the jackpot if you cover your board within the first 50 balls called. If no one wins, the jackpot rolls over to the next session (where a new and entirely different formula is applied to calculate the new value of X to add in).

So anyway, it's not unusual for the jackpot to be huge by the end of the week. During the last session of the week, the jackpot goes to whoever is the first to cover their board regardless how many balls it takes. Hence, everyone and their brother (and their dog, cat, and platypus) shows up for the last session. Tracy and I got to within 2 numbers on one of our boards, but didn't win. The jackpot was actually split between two winners -- lucky sods.

Nothing else memorable that day -- just lots of relaxing. There were some interesting lightening storms off the port side of the boat within the clouds and whatnot; very beautiful. Some rain actually came over the boat, too; Tracy and I were sitting in one of the covered hot tubs at the time and just watched the sheets of rain plummeting down onto the deck, with various thunder claps and lightening flashes. Cool.

There was a "goodbye" show before dinner which had several kinds of acts magic, comedy, music, dancing, etc. Not a bad show.

We played more sugar football at dinner (casual dress). John wasn't there last night, so he was introduced to it this evening. Two of Peter's friends joined us during desert (their parents had already finished dinner and left), so we introduced them to sugar football as well. I repeated my record-setting distance, but also flipped my fork all the way down the table as well, knocking over a glass and scaring the bejesus out of the new kids (no pain, no gain). Again, more fun was had by all. An elderly woman at an adjoining table was glaring heavily at us. Marty pointed her out to us, and as a unit, everyone at our table turned and looked at her (reference: cocky, flippant, arrogant). Most amusing.

The string quartet came by our table this evening and asked for requests. John, being a smartass, asked for "Stairway to Heaven". And wouldn't you know it -- they knew it. I've never heard Stairway rendered on an acoustic guitar, two violins, and a huge bass before. Most interesting. They did a pretty good job, I have to admit! But it was still surreal.

Tracy and I had a final stroll around the ship after dinner, and then went back to our cabin to pack (you have to put your luggage out before midnight so that they can collect it for debarkation in the morning by order of your flight time). No towel animal this evening; bummer.

Sunday, 30 July, 2000

We ran into Marty, John, Tina, and Peter in the morning right before debarkation. Said goodbyes and the like.

Flight from Miami to O'Hare was no problem (although the mysterious ecosystem that we call "airline travel" [hitherto referred to as the Nemesis] somehow changed our flight number and moved back our departure time by about 15 minutes. While this was slightly alarming (since the Nemesis had previously not informed us of this fact), it was actually no big deal because our layover in Chicago was supposed to be over 2 hours). However, upon arrival in Chicago, we discovered that our flight to South Bend had been canceled. Doh!!!

What followed was several hours of standing in line, attempting to communicate with lower echelon Nemesis peons (LENPs), and generally trying to discover a) where our luggage was, and b) how to finish our journey to South Bend. These are seemingly simply tasks, however they proved to be difficult to find answers for.

The location of our luggage is still a mystery -- it is currently lost within the vortex of the Nemesis. We hope to find it tomorrow (Monday); multiple LENPs assured me that it would find its own way to South Bend, and magically be delivered to my door. I attribute this proposed luggage self-exploratory behavior to the non-Euclidian properties found within the Nemesis (reference: price/distance ratios found on such sites as BizTravel, Travelocity, etc.); indeed, to my knowledge, my luggage has never moved itself before, but it is relatively new luggage (just got it this past Christmas), so it may have habits that I am unaware of. We ended up getting a rental car voucher from American and driving back to Sound Bend (which turned out to be uneventful).

Since we got a point-to-point rental (i.e., ORD to SBN), mileage and time don't matter -- the car just has to be at the SBN Avis terminal within 24 hours -- we decided to spite the Nemesis and drive straight to Macri's and celebrate being home with some Big Beers. Most excellent.

We're back in Turtle Creek now. Spoke briefly with Dog on the phone about news from the past week and checked my e-mail; only had 10MB of new mail, or 360 new messages (much, much lower than I thought, but I did unsubscribe from most lists and remove myself from most aliases before I left last week). Read some of the most important-looking messages; I'll check the rest tomorrow. Found several messages for Jeremy Faller on my answering machine (which I find rather amusing -- most were from a woman from his moving services who adopted an increasingly annoying tone that Jeremy was not answering her messages). Also found that the ceiling in my bathroom is leaking from the apartment above me again -- the floor was rather wet and smelly. Gonna have to talk to Turtle Creek management about this tomorrow.

Monday, 31 July, 2000

Well, this journal entry has taken a good amount of time to write, so we get Monday as well. :-)

The LENPs have located our luggage, and indeed, it has mysteriously made its way to South Bend by itself. We picked it up when we returned the rental car. Since then, it hasn't moved by itself (at least when I was looking); it must be tired from the trek to South Bend from Chicago.

Tracy and I spent the rest of the day packing her car with more junk from my apartment. There's now very, very little left. Mainly my TV, VCR, the server, an ND flag, some clothes, and all the junk in my office. Gotta take my stereo receiver in to Best Buy to get serviced, though -- I think 2 of the 3 video channels have been fried over the years (it's under warranty, so the service should be free. Woo hoo!).

Gonna go head in to work now, see if I can catch Lummy before he heads back to Cali, and say hello to everyone in the lab.

August 1, 2000

Platypus face

Finished Cryptonomicon this morning. A mostly good book, but I have to admit that I was a bit disappointed by the ending. It was too vague, and tried to imply a lot of answers but really left a bunch of things unanswered definitively (but not in a "wait for the sequel" kind of way). Plus, some of the things that were tied up in the last few pages of the book were (IMHO) plainly obvious by that point, and it was just a relief to get to the point where they actually stated what you had been assuming for the last 100-200 pages.

It's a monster of a book -- over 900 pages long. There's a bunch of good WWII storyline in there, as well as a somewhat-weak storyline about setting up a data haven in the modern world with a bunch of cool crypto stuff trying the two together. So my review: the first many-hundred pages were ok (indeed, the style of the book takes a few shifts a few hundred pages in), but the ending was decidedly week. Still, I'd recomend it to others.

I spent most of yesterday reading and answering e-mail, but spent a few hours with Jeremiah discussing what he wants to do for a master's project. He was initially leaning towards doing STL in OOMPI (to which end he's been cleaning up OOMPI and gearing it up for 1.0.3 release -- a nontrivial task!). It's been good, I think -- it was an excellent introduction to "real world" computing, and how hard it really is to write Quality Software.

In the past few weeks, he has been running regression compiles and tests on all kinds of combinations of platforms, operating systems, and compilers. He hacked up a bunch of shell scripts to do this, and has generally learned a lot about it (try it yourself -- it's a lot harder than you would think). But this has inspired him to move away from STL/OOMPI and to tackle a long-standing issue for the LSC: a rock-solid regression compiling and testing agent that can be used to perform compiles and runs on all manner of combinations of setups such that it can be used to test software before it is released. We talked about this for an hour or two last night and brought up all kinds of issues. He seems pretty interested in it, and it could be a great project for the lab as well as a good master's project.

Had to fix up some weirdness on wedding.squyres.com today -- it seems that the Apache's were spinning endlessly and creating a huge load. Dunno exactly what caused it, but Ed and Don have been working on their fantasy football pages, so they may have tickled some PHP bug or something. Restarting apache seemed to fix the problem. Gotta setup virtual hosting for their hostname, though. Will do that tonight.

Heading down to Looieville soon -- taking the latest Mandrake CD with me, and will bring my SBN router with me. The SBN router will become the router down in Looieville (hence, the web server, router, mail server, and soon, the DNS server). The current Kentucky router will become my desktop workstation and just sit behind the firewall. Might do other services from that machine (i.e., DHCP, NFS for home dirs, etc.). I plan to setup bind in a week or two, too -- Darrell and I will be secondaries for each other. Hence, my router machine will likely become squyres.com as well.

I'll probably keep the mail services on pennyhost, though. Who knows -- I might take that over as well, but I'd want to find some web-enabled email management software first (i.e., a good webmail client, ability to change forwarding/storage, etc.). A project for a future day.

Just found out that the OIT Solution Center sells W98 CD's, but only the first edition -- not OSR2 (hasn't OSR2 been out for 1-2 years now?). How much do they suck?

Do you know what do I like about the OIT Solution Center? Nothing at all.

Answered some IMPI mail apparently from the guy at HP who is working on their IMPI implementation. Looks like we may have left a sentence or two out of the IMPI standard -- he raised a valid clarification issue. Oops. I've pinged Judy and Bill at NIST to see what they want to do about this (i.e., how to fix the doc).

Tons of LAM and other MPI messages remain in my inbox -- will have to start getting to them tonight...

August 3, 2000

Chocolate moose musings II

Take II on this entry (note to self: write some kind of primitive HTML tag checker to ensure that tags are closed properly in journal entries).

Spent the entire yesterday rearranging the computer room in Tracy's (er... our) apartment. Reconfigured the network to incorporate my router box properly -- now I have a desktop machine (albeit with a flakey 3Comm card... #$!@$!@$!!!!) that is not responsible for the router, web server, etc., etc. Still not finished yet, but we're closer.

The new router is the latest Mandrake (but without the latest Kernel -- couldn't get that to work with ReiserFS properly. Screw it). Its currently running apache/php/mysql and sendmail. Future plans include mailman and bind. I kinds need DNS running soon, 'cause mail is currently kludged to look like it came from lsc.nd.edu (shh!!); need a proper squyres.com name other than wedding.squyres.com. :-) It doesn't appear to be perfect yet (Don's still having X forwarding issues via OpenSSH), but I've already removed the monitor and hidden it under the desk.

The desktop is a compaq desktop with serious I/O suckage. I just backed up all the data on it [temporarily] to AFS, leaving the way clear to upgrade it to the latest Mandrake when I return to KY on Saturday. I'm also a bit wary of upgrading that machine because it has some special SCSI drivers in it that took Dog and I *several* days to get right the last time we installed Linux on here. Let's hope that these SCSI drivers are mainstream enough to be in the main distros these days!

Also spent a bit of time yesterday helping Don and Ed configure their fantasy football league on www.fhffl.com (which is really wedding.squyres.com gotta love DSL!) -- it's part of a long-standing deal which is now probably defunct because Lummy is likely moving to IU, but what the hey. In helping them possibly move to a real database rather than text-file-based data storage, I had to explain a lot of database concepts to them (no DB background at all, but they're smart guys). We're having another infamous "beer-n-computer science meeting" at MBC tonight. Yummy. Will code for beer!

Mmmm... Chemical Brothers... mmm...

While I'm upgrading everything, I just got the latest linux netscape (4.74). Let's see what kind of mess it can create, now!

Went to see a Louisville River Bats minor league baseball game last night with Tracy and a bunch of people from GE (a freebie from the good folks at GE). The stadium is brand new -- only been operating this year. The game was by no means a sellout, but it there was a pretty good sized crowd there. Nice stadium, too -- bigger than the Silverhawks stadium -- it even has an upper deck. Their mascot is a purple fuzzy dude who has some flaps hanging off his arms that are supposed to pass as bat wings. He came out during the later innings with a t-shirt gun. Very amusing -- it could launch tightly scrunched t-shirts into the upper deck from where he was standing near the dugout.

Met several of Tracy's coworker's kids, had some beer, and mmm... ballpark hot dogs. Is there anything in this life as good as a ballpark hot dog and/or brat? Quite yummy. And to top it all off, we won the game. The River Bats had a cool 3-run homer in the first or second, sucked for most of the 2-7th innings, and then had a rally and won something like 10-5.

Now I gotta drive back . Will solve the X forwarding problem later (seems to have something to do with the fact that openssh X auth != regular ssh X auth, and the fact that Goofy's shoes, contrary to popular belief, were at least 2 sizes too small).

August 4, 2000

Jeff's Journal

Tied up some loose ends today:

Checked into the error that Arun reported that he was getting with parallel bladeenc; couldn't reproduce it. Sent him the latest copy to try. Turned out to be an embarrassing use of a variable before it was initialized in LAM's mpirun. Additionally, we accrued command line arguments into a fixed-length string that could be overflowed (oh for STL strings...). Doh!

Replied to mp3check author dude (see previous entry).

Finally fixed the "delete" button in the MPI listing stuff; I think it was malfunctioning before and deleting all the data in the database. Oops!

Replied to Bill George at NIST about some pending IMPI errata w.r.t. IMPI_H_ACKMARK and IMPI_H_HIWATER --
the IMPI doc doesn't clearly state how these values should be arbitrated. Bill and I are discussing what the mechanism should be. Actually, the mechanism is clean: min(a, b). The discussion is between where the value should be applied universally to all hosts or on a host-pairwise basis. I'm [currently :-)] in favor of the latter. We'll see how it works out.

Installed GNU mailman 2.0b5 on mail.lsc today. Apparently the previous versions had some security problems. Oops. I tried to setenv CFLAGS to -fast, 'cause there is a small C portion in mailman (most of it is in python), but it still used just "-O". I suspect non-careful use of AC_PROG_CC in its configure.in script (curses, autoconf foiled again!!).

Got minime in a compilable state again. Working on a primitive html tag checker so that I won't leave unterminated tags again. It should bitch if you leave tags unterminated when you finish typing the rant, and automatically closes them if you "submit" without fixing them. Simple stack-based thing (gotta love the STL!). I also added warning if it removes "LocalWords:" lines when you submit (not when you re-edit).

Finally had a meeting with the Grad School people (they're nice and reasonable people once we all get in a room together and talk over the issues -- they even want to take us out to lunch for our troubles. Free food -- strong>woo hoo!!), and we worked out all the "final" kinks in the ndthesis style. Changed a few things in the sample thesis, and we should be good to go!

Tales of an ND grad student

Got back up to ND. Lummy was here in the office -- I thought he was still in CA; pleasant surprise. Looks like we'll be heading to Berkeley in a few weeks for a few days for some design meetings about the BLD. Should be fun and interesting.

OOMPI 1.0.3 is just about ready to roll, but I found a possible problem; may require more testing...

Inilib is getting closer, too -- perhaps in a few weeks. It still rocks, though -- it's heavily used in this journal client, for example (gotta love pre-release access!). :-)

Spent last night talking with Ed and Don about databases and their fantasy football setup. They bought the beers and dinner, so I guess I couldn't complain. I gave them a database on www.fhffl.com, and they'll start playing with the setups that we described last night. They do some nifty things with pulling down info from other web sites (NFL sites and the like) to feed their data pool.

Mmmm... the power of PHP and MySQL... mmm...

Got a reply from the mp3check author dude (it doesn't work on big endian machines). He claimed to have fixed the endian problems, but I found a bunch of compiler issues (I'm assuming that he's using g++ --
wow, does g++ suck!). Even after getting it to compile, it still doesn't work on big endian machines properly. Bonk! I sent him a reply with tons of info to keep him busy.

It seems that I have way too many MP3s out in AFS space -- I filled up the lums CCSE volume. Whoops! The irony is -- I literally tried to download them all to wedding.squyres.com earlier that day, but realized that I don't have a hard drive large enough for all of them, so I deferred to this weekend when I'll buy a new hard drive large enough to hold them all (I currently have 8GB of MP3s, and that's perhaps 1/4-1/3 of my CDs). Since the CCSE volume was full, I downloaded a bunch of them and deleted them of AFS to give much more working space.

August 5, 2000

Leaving Las Vegas^H^H^H^H^H^H^H^H^HTurtle Creek

I closed a major chapter in my life today. I left the apartment where I have lived for just over 6 years. Indeed, I have lived in South Bend more-or-less continuously (minus some summers and Army time) for a few days shy of 11 years -- the majority of my adult life.

However, I'm about to start a new chapter, too -- I'm moving to Louisville, KY, to go live with my new wife, Mrs. Tracy Payne Squyres.

At the risk of sounding sentimental, I feel compelled to present a few reflections of my mixed feelings.

I have been gradually moving my stuff down to Louisville over the past two months or so. Still, today was the final day of my lease, and (by design) I loaded the last of my stuff in my car this morning, cleaned the apartment thoroughly for the last (first?) time, locked the door, and left.

It was surprisingly hard. I'm not an overly-sentimental kind of guy; indeed, I'm from the MTV generation and have the attention span and short-term memory of a skiddish cat. The apartment itself is pretty crappy; it's small, didn't have too much sound protection from other apartments in the building, had very hard water, crappy cabinets, etc., etc. But it was home. I have lived in that location for quite a long time -- it had become a part of me. I've had many good times, many bad times, and some just downright weird times in that apartment. The good times always come to mind first, which is one of the reason that it was hard.

This morning, as I was cleaning and packing, I was musing on the history of my time in that apartment. This is the end of a 7 year streak -- I initially moved in with Mr. Huy Phan (EE grad student) back in the summer of 1994 (he had some other roommate for the previous academic year; I never knew who it was). Huy eventually moved out and went back to France. Mr. Brian McCandless (CS grad student) then moved in with me. Brian graduated a few years later, and Chuck (EE grad student) moved in. Chuck was only around for a semester and a half; Kevin Barker (CS grad student) moved in before Chuck even left. So that apartment has seen a continuous stretch of a single lease since 1993 -- 6 people. And I got the clean the apartment today. Did I get the short end of the stick, or what?

I found all manner of interesting things in the apartment today:

A grand total of 41 pens, pencils, markers, and various other insundry writing utensils. And all of my commonly-used pens are already down in Louisville -- where did these come from? Why did we have them? We certainly didn't write that much. A mystery.

I found -- still in shrink wrap -- a mini gas grill. Who the heck did that belong to?

I also found a boom box. I have no idea whose it is, nor how it got into my apartment. The left channel doesn't work, but I'll bet that it could be fixed fairly easily. I gave it to Pete and Brian.

The couch that Tracy bought (used) in her freshman year and gave to me when I moved in the apartment in 1994 has now been passed on to Pete and Brian. May it continue to give them good service.

The Christmas lights that have hung in the apartment for years (literally), and have been on continuously since April or so (it's all about uptime, baby) have also been bequeathed to Pete and Brian as a symbol of Bachelorhood.

All in all, I was surprisingly happy -- albeit sentimental --
about moving out today. This is surprising because I absolutely detest moving; after loading each carful of stuff over the past two months, I always found myself emotionally drained because a little piece of me was leaving. But today was different. I realized that I actually do have closure with this place -- I'm ready to move on and become a husband and start the next chapter of my life. This move has been planned for quite some time now, and I guess that I've been subconsciously preparing for it all along.

Flashback to last night. I went out with Pete (just graduated CS from ND in May, and is just starting as a CS grad here this semester) and Brian (CS undergrad, starting his senior year here at ND) -- the same guys who inherited most of my stuff. For those of you who don't know, Brain has been one of my students for a year or two now; Pete worked for me for about a year as well. We went to BW-3s, had some wings and beer, and played trivia. It was much fun. We went back to Turtle Creek, had a few more beers and pizza, and used the Smoking Table one last time. More fun. In short, it was a perfect evening; we just hung out, were generally stupid, and got a little philosophical at times. These guys will become the next set of urban legends in the College of Engineering at Notre Dame; I am leaving ND in capable hands.

Back to today.

I said goodbye to Troy (one of the maintenance guys at Turtle Creek), and asked him to be nice to me when he does the final inspection of the apartment. He always liked us, and took pretty good care of us (when things broke, he always came pretty quickly and fixed them). I said goodbye to my apartment (it's a thing that I have -- I always have to say goodbye to places that I've lived), and got in my car and drove away.

Metallica's "No Leaf Clover" was playing on the radio as I drove away.

Chapter 29

August 7, 2000

Tales of a Fourth Grade Nothing

Spent much of yesterday opening wedding presents. Yummy! Got lots of free stuff. Got lots of stuff that we didn't ask for, but hey --
don't look a gift horse in the mouth (who came up with that expression, anyway?). We cataloged everything in our handy-dandy wedding software database (don't laugh -- there are a good 10-15 wedding software packages out there these days; it's big business! And it was truly helpful in organizing stuff). Now comes the hard part -- gotta write all those thank you letters.

Got a 40GB hard drive from CompUSA for all my MP3s. I'll install that RSN. Got some net books at B&N, too. There's a new Cussler book out, but it's still in the Big Paperback size (which is just about as expensive as the hard cover). The final Reality Dysfunction book is still not out in paperback (bonk!). And the latest Area 51 book is still not out yet. (Ok, I just admitted it to the world --
I'm into cheesy sci-fi and action books for recreational reading. You'll deal.)

Got some replies from Tord about parallel bladeenc. I read them, and I think I understand what he's saying. Unfortunately, action on these items gets pushed on the stack until other things finish up. :-(

Setup GNU mailman on wedding.squyres.com for Don and Ed. Might move my journal mailing list here, too, but probably not before I get bind running to give this machine a decent name. Still haven't quite decided what to do with squyres.com mail yet, because several other members of the family use it, and I don't feel like hosting it. Hmm. Will require some thought.

Went over to Laura and Paul's later because they had tons of extra food from a wedding that they went to on Saturday. Saw Melinda and Reuben as well. Good fun. Came back and crashed afterwards. Mmm... sleep...

Oops - the GNU mailman that I setup for Don/Ed isn't quite functional. Had to fix a few things (I only briefly tested the web interface yesterday before we left for Laura/Paul's). Mailman's woes seem to be related to some sendmail issues, too.

I've now spent a good chunk of this morning fighting with sendmail w.r.t. my firewall and whatnot, and getting it to do what I want (it still doesn't). I remember the days when sendmail setup was simple and easy to understand. Wait... no I don't.

August 8, 2000

I drank what?

Spent too much time on ndthesis yesterday. Hopefully, we're 100% done with it.

Went out to switch my cell phone down here to Louisville yesterday (SBN's Alltell just got bought by Verizon, which is everywhere --
quite handy for me!), and found out that I only had something like 20 days left on my contract. So I ended up upgrading to one of those whacky digital phones that has voice mail, call waiting, no roaming (important because I'll be traveling a bunch), etc., etc. I think it even writes optimized high performance scientific code.

Talked to Faller yesterday; he sounds like he's doing well in Bahston. He had some ideas regarding parallel bladeenc and Tord's replies to us; he's still convinced that we can generate output from parallel bladeenc that is diffable to the serial bladeenc. The crux of the issue is that the parallel and serial outputs are the same up until the last frame of the first slave's output. And even that frame is the same... until a point. This is the point where slave 0 runs out of input data, and therefore -1 pads the rest of the frame (it took us a while to understand that this is what was happening). The next slave's output is completely different from the serial output --
it's not like the serial output is then just shifted down into the next frame (which would be easy to fix). I think it has something to do with what Tord mentioned: that MP3 is only differential within each frame, but does depend on a small number of bytes from the previous frame (which is somehow not strictly classified as differential across the frames -- I think it has to do with framing setup and the like, although it does affect the output data).

Anyway, Jeremy is convinced that we can have the master re-frame the output data from the slaves and thereby create diffable output. He's gonna spend a few days reading the MP3 file formats and papers; we'll talk again when he's done.

I was hit by two inspirations a few minutes ago, which I promptly mailed off to Arun (who is giving 1.5 LAM talks today):

LAM: The Code to Glory PVM: The Code Less Traveled

Don't get me wrong -- while I'm certainly not a PVM guy, nor would I ever write any new code in PVM, let us not downplay the importance of PVM in the Grand Scheme of Things. It was the first widespread "standardized", portable parallel code tool ("standardized" is in quotes because it was really only a research project -- it wasn't a real standard). Hence, it was the first time that you could write a parallel code on one kind of machine and run it on others (rather than have to re-develop it for every new kind of parallel computer that you tried to run on). Plus, it worked on clusters -- a prime candidate for development of parallel codes (especially considering that running on the Big Iron costs $$$).

So my statement really reflects the Way It Is Now -- most new parallel users use MPI, not PVM. Indeed, many parallel hardware vendors don't actively develop PVM anymore; they only develop their MPI. However, there are probably uncountable millions of lines of legacy code out there. PVM is like fortran -- it will never really go away.

And this is not to say that MPI won't some day be replaced by something More Useful. I'm quite convinced that MPI is not The Answer; it's just the best that we have right now.

Spent this morning answering some backlogged LAM mail. Will spend the rest of today finishing off all the current backlog of LAM mail, continuing setup of queeg (my Linux desktop -- was having some problems getting SSL/pine to compile), wedding gift reconciliation (one of our registries screwed up an allowed people to buy 3-4 of an item that we only ask for 1 of), and minime hacking.

In the words of the Ancient Masters, "After 3 days without programming, life becomes meaningless."

August 9, 2000

Smashing the stack

Spent time yesterday and today going over the complexities of Health Insurance. I have become convinced that Health Insurance is a scam run by a bunch of ex-patriot armadillos down in Arizona. Only they could dream up such convoluted and bizarre rules, regulations, policies. Or perhaps it was just a committee.

I say, deliver me from ex-patriot Arizonian armadillos!

Engineering: overthrowing armadillos.

In other news, I finished the next round of enhancements for my journal client:

it warns you about unclosed html tags, and will [admittedly stupidly] close them if you submit without fixing them

it removes some tags automatically, like <html> and the like

it warns you and automatically removes "LocalWords:" lines so that you can run ispell on your entry and not have to worry about remembering to delete those lines before you submit

Perk pointed out HTML Tidy, which does more or less what is outlined above, but doesn't do the disallowed-tags thing. But it is much smarter about closing tags, replacing incorrect tags with real tags, etc. It also [unfortunately] automatically adds a <TITLE>, which I don't want it to do.

Who knows -- might replace my functionality with HTML Tidy someday. But this works for today, and prevents one <strong from messing up all journal entries.

All for the glory of LAM.

Sadly my telephone headset is falling apart. I need new ear muff thingies (the current ones are flaking off one little black flake at a time), and some wire in the cord is loose -- it cuts in and out randomly. And you know what they say about hardware problems... Actually who the hell cares what they say? Just go buy another one; hardware isn't interesting.

Finally, I got to spend a little quality time with minime today (woo hoo!). Continued to work on the encryption and authentication schemes for the sockets; not quite right yet, but see an older journal entry that describes the scheme.

August 10, 2000

Cleveland rocks

Got minime to compile on Linux again. A while ago, I did some ugly things with signals in a solaris/sysv-specific way that disallowed compilation on Linux for a while. Finally got around to fixing it today; this marks the first journal entry in quite a while that has been submitted from a Linux box instead of ssh-ing to ND to use the Solaris journal client (which is ironic, actually, since the journal server is sitting right here next to me --ssh-ing up to ND made the data go much farther to get to its ultimate destination). Whooo hoooo!!

At Brian's advice, I went and got Mozilla M17 (source). It's still compiling.

I love inilib. It does such nice things for me. :-)

Motivation for saying that: Perk and I have been having a conversation about using the "HTML Tidy" program to clean up journal entries before they are submitted vs. using an internal parser (that I have already written). Turns out that "HTML Tidy" is 95% better --
it's much smarter about closing tags, but it does a few icky things. Best way to resolve it? Have a user-definable option! Let them choose between the internal parser and HTML Tidy. And inilib just takes care of storing that for me. Make today an Inilib day.
Favorite phrase of today: "beaten on the head by a Mozilla stick."

Faller asked for a copy of the LSC Coding Standards today. Must be spreading that to the good folks at Analog Devices. LSC: The World Domination Tour.
Did some LAM work today; added auto-generation of man pages from structured comments in source code. It's something a) I foolishly promised on the LAM list, and b) oh yeah, users indicated that they wanted on the LAM user survey. Kinda neat, actually. Had to re-create man pages for MPI_Comm_spawn and MPI_Comm_spawn_multiple --
I had made all the MPI-1 man page comments back on Dec 31/Jan 1 while I was waiting for the world to implode. Had to do some icky things to fool automake into a) putting them in the distribution, and b) installing them when "make install" is invoked. Yuk.

Speaking of LAM, finally resolved Mr. Pascal's issues with LAM/MPI. Turns out that you have to use a special option to the Free Pascal compiler to tell it to link to libc; if you manually link with "-lc", it won't work (for lack of a longer explanation). I asked about such a thing a month or two ago in the initial set of e-mails with Mr. Pascal, but he didn't know about it then. Yesterday, we initiated contact with the Free Pascal developers, and they immediately mentioned this special switch. Oh well, live and learn (but try to avoid Cobol whenever possible).

We're still getting bounced messages from the LAM list from <ptavares@dsg.dei.uc.pt>. Dog claims they're nowhere to be found on mpi.nd.edu's sendmail queue, but the bounces keep coming back. We'll probably get them for another 2-3 days, <sigh> I'll be very happy when we can switch to GNU Mailman (gotta wait for IU vs. ND decision first).

Continued to rip my CDs. It's going nice and slow, but now I have plenty of disk space.

August 12, 2000

Entry of a 1000 URLs

I'm up in South Bend, and yes, my cell phone works. It's not in digital mode, though (bummer!). I found that yet another company has been sucked into the Verizon Wireless void -- Air Touch Paging. So I tried to send myself a test text message from their web page, but it didn't work. Perhaps it will when I return to digital areas on Monday...

In other news, I had a good chat with Loomsdale yesterday (sorry, that's Dr. Loomsdale to you, Gentle Reader). We haven't really connected recently, especially with this whole wedding thing of mine that happened shortly ago. T'was good to catch up. Got a few more details on the whole IU vs. ND thing (sorry, not at liberty to put them in my journal, so get off my back already, ok?!?!) -- we'll see how that plays out.

This led to more chats with Jeremiah, Rich, Dog, and Brian, which led to nothing productive getting done yesterday. Dog and I finally gave up, got some food at Wendy's, and went to see Scary Movie. If you accept it as a totally stoopid movie, it's actually quite amusing.

I introduced Dog to some of the wonders of PHP and MySQL last night, too.

Stayed at Ed and Suzanne's last night. Saw them this morning and we chatted for a while. Came in to work and did various LAM/MPI things:

squashed a bug relating to -laio not propogating down to hcc and hf77 properly when compiling with ROMIO

squished another w.r.t. profiling and using both MPI_Init and PMPI_Init in the same program

played with CVSweb and ViewCVS, mainly to see if it would be worthwhile to put the LAM/MPI CVS repository out on the web for read-only access (a thought that has been nagging me for a while, especially since 6.3.3 has taken so long to release!). Decided that I liked ViewCVS better than CVSweb. I mailed the LAM/MPI mailing list to see if anyone would be interested.

played some more with the doctext package from the MPICH group to fix some bugs w.r.t. the nroff-generating code. I'm iterating with Bill Gropp about this -- it affects the man pages that get generated for LAM/MPI.

But now, on to more interesting things! Minime calls. Had some interesting minime thoughts yesterday while driving up for Looieville. We'll put those in a separate journal entry.

August 16, 2000

Is one of us supposed to be a dog in this conversation?

It's been a bit since my last journal entry; the lapse is mostly due to travel. Woof! So here we go...

Added a few new features to the journal client: you can now preview your journal entry in lynx and/or netscape before submitting it. I'll probably add one more option to run HTML tidy (either automatically or manually -- haven't decided yet).

Spent this past weekend at Notre Dame. I was supposed to meet some friends, of whom one is entering ND's law school this semester. Signals got crossed (read: I had the wrong time in my palm pilot --DOH!!!) and I missed them. So I spent the weekend with Suzanne and Ed, and helped them buy a laptop, second hard drive for their desktop (for Linux, of course), and a new monitor. Spent much of Sunday afternoon/evening installing stuff on the latop and desktop. The desktop's modem was flaky under linux; it was most frustrating. I think I have a spare to send to them.

I found out that I definitely don't have text paging enabled on my cell phone. I got back to a digital area (why is Verizon/SBN still analog? Grumble) and tried to page myself from their web page. It said that the page was sent, but it never came in on my phone. I guess I could pay more for such a thing, but I really don't think that I need it.

Saw Lummy on Friday and Monday; had some good chats with him. The Big News is that he's going to stay at ND. He accepted ND's offer, and we're just going to reap the benefits from it (read: lots and lots of funding!). Some side effects: guaranteed post doc funding (woo hoo!!), a new computer for me (800mhz soon-to-be linux box). Rock on!!

I noticed today that ND's college of engineering started giving out engineering rings this past graduation. I want one! Luckily, I've got a graduation left at ND, so I'll likely get one. :-) Pretty cool things, those rings.

Started looking at Vorbis as an alternative to MP3. I've had a disappointing show of contributions and whatnot from the bladeenc community -- Jeremy Faller and I still have some unanswered questions about MP3. Ogg/Vorbis appears to be a much cleaner process, and an active development community. It is supposedly Much Better than MP3 in terms of quality, documentation, legal issues (i.e., there are none), and encoding speed (the beta encoders are already faster than real time). They even have an XMMS plugin, which means that it's good enough for me!

I started a "has anyone thought about parallelism?" thread on the vorbis-dev list today and got several immediate replies. Talked to one of the dudes who is -- I think -- one of the main contributors, and we came to the conclusion that it should be possible to do a similar thing to the vorbis encoder that I did with parallel bladeenc (although there are still some unanswered questions). So it might be interesting.

Must continue with minime hacking now... must code minime... must code minime... must code minime...

August 18, 2000

Tastykakes

Another day of coding.

Sent off an old modem to dad (mom's modem got fried last week). Sent an old CD rom to John, along with an ISA card and associated cable. Damn, I'm just a nice guy.

I'm re-downloading M17 mozilla (from CVS this time, as if it will make a difference) so that I can get SSL in mozilla (see previous journal entry about how netscape must die). It will probably compile for the next few hours.

Did a reconcile of our wedding registry gifts between what the stores say we got and what we actually got. We got more of some things than the stores listed, which means that people found deals elsewhere. I'm all in favor of people saving money when they buy us stuff.

(netscape just finished downloading itself, and is now running configure)

I really need new foam ear thingies for my telephone headset --
they shed on my ears and it looks like I have a five o'clock shadow on my ears (and my, that's attractive!) after using it. Must remember to go to Radio Shack tomorrow...

Getting closer to LAM release. The RedHat folks are freezing this weekend (did I mention this in a previous entry? Can't remember), so they want a version that is "as close as possible", but we're anticipating putting a out a LAM update RPM when it goes stable. Ugh
-- and he (the RedHat Guy, whose name has a non ASCII character, so I can't type it too easily) found an embarrassing bug in the 6.3.3b27 that I put out earlier today. 6.3.3b28, coming right up!

Found a particularly annoying bug in tping today; I thought it would be a simple bug to fix, but turned out to be hard to find until I realized that some buffers were getting allocated too small, thereby creating overflows. Damn the overflows -- LAMming speed!!

I have one major issue that I want to solve before putting LAM through all the regression tests; he can't get PTY support to work on SCO Unix -- at least one LAM node bails before MPI_Init. Hmm. I can't tell if it's his setup or if LAM is actually doing something wrong. Hmm.

dell.com says that my new computer is estimated to be shipped on August 22. Yummy!

I ripped all my Yes and Led Zeppelin CDs today. I have many, many Yes CDs. I'm in the middle of Pink Floyd now. I installed the beta vorbis XMMS plugin, and it works like a champ. However, it takes up 100% of the CPU vs. single-digit% when playing MP3s. Hmm. Let's hope it gets better (it is still beta, after all).

Forces of Nature

Spent some time with Brian and inilib today; fixed a bunch of things in the docs, but it looks pretty damn good. Grasshopper has learned much in his inilib time.

"When you have learned to snatch the error code from the trap frame, it will be time for you to leave."- The ancient masters

Spent some more time with the SCO LAM user who's been having problems. One of the two problems can definitely be chalked up to UTFS; the other may also be (he's testing now, and has to install a new compiler). In which case, LAM may be in the clear for all the regression tests and eventual release!!

Looks like the trip to Berkeley is going to happen next week. So I'll likely be up in the Bend in the early part of the week, and go to CA from there.

August 21, 2000

Of Palm Pilots and Daisys

A productive weekend. Forgive typos; on a low-bandwidth link (minime doesn't seem to want to compile on Linux again... grr...)

My new computer unexpectedly showed up on Saturday (wasn't expecting it until mid this week or so; most likely after I went to CA). Wooo hoo!!! It's decked out to the gills (I can't resist the opportunity to list all its power features):

Pentium III/800mhz. 32k L1, 256K L2.

256MB ECC/RDRAM.

20GB disk.

12x DVD drive (and windoze DVD software).

8x/4x/32x CDRW drive.

3 button mouse.

Altec Lansing THX speaker setup. This stuff is amazing -- an approximately 2'-per-side cube subwoofer and 4 speakers. We hooked it up to the VCR on Saturday to watch Episode one -- amazing sound!

21" monitor trinitron monitor (19.8" viewable, .25-.26 dpi).

32MB DDR nVidia GeForce2 GTS 4x AGP video card (I don't know what most of those letters mean, so I assume it can be directly translated from Ancient Hebrew to "fucking cool").

It's fast fast fast. However, I have noticed I/O constraints that are typical on Intel architectures. Oh well -- you can't have everything (where would you put it?). But with the speed of this machine, I'll likely do at least some local development rather than ssh to nd.edu and doing everything from up there.

As practically obligatory, I went out and bought the Matrix DVD to test my DVD drive with. Hopefully, I'll get to test it later today (gotta find some Linux DVD software...).

Other things this weekend, did some "apartment" errands; got me a bookshelf, keyboard tray-thing for my desk, a 4 drawer filing cabinet. Tracy got me a warm fuzzy robe for my birthday (because I really liked the complimentary robe on our cruise); soon enough it will be cool enough to wear it around here. Might as well subscribe to the telecommuting lifestyle, eh?

I should point out that this new 'puter ran rip/encode CDs like nobody's business (and what's what I've had it doing...).

August 25, 2000

The first rule of the LSC is...

Extremely interesting quote from the paper on small-world phenomenon:

This we see that minimizing the transmission rate of a network is not necessarily the same as minimizing its diameter... in addition to having short paths, a network should contain latent structural cues that can be used to guide a message towards a target.

I finished the paper today (ignoring all the complicated math stuff that went right over my head and into the wall behind me. I hope I don't get fined for the mark that it left).

CorporateTime may be nice, but it certainly has an interface that rivals that of a blind baboon's arrangement of sock drawer. I can't tell you how many times I made incorrect appointments in ctime last night because it put pm when I was expecting am, or when it put am when I was expecting pm, or, even worse, when it put pm when I really meantpm, but I changed it to am on general principle (or vice versa). I'm guessing that the ctime interface designers were in the Southern hemisphere, where all this makes sense.

I should mention that I went to see Arun's room in Stanford Hall last week. I promised him that I'd put it in my journal, but thought better of it so as not to ruin the surprise for anyone who hasn't seen it yet. So all I'll say is: it's FABULOUS. If you haven't been yet, I strongly urge you to go see it. It's much better than Cats; I'll go see it, again and again.

I'm helping proof a book that Jeremy is writing -- spent much of the day doing that. Hats off to Jeremy for a great use of the word "esoterica". To celebrate, everyone should use the word "esoterica" in a sentence today. Together, we can form a secret personhood of politically-correct dictionaphobics who use big words just for the pure art of it.

Also started Dog doing some LAM development. Yet another reason why LAM will take over the world -- when you have programmers like Dog, who in their right mind will refuse?

August 26, 2000

Colored and mixed paper only

Interesting note that I discovered in pine yesterday and only correctly identified today... I'm on a few ezine lists, and have been for quite a while. Only yesterday did I actually scroll down to the bottom of one of the messages (past all the advertisements, etc.). At the bottom was a note that did not look like it was part of the letter -- indeed, it turns out to be a message from pine itself:

[ Note: This message contains email list management information ]

where the "email list management information" is a menu option. Selecting it brings up a pine screen explaining that the message contains meta information that can automatically unsubscribe you from the list... select here to unsubscribe. Not a difficult thing to implement, but I've just never seen pine be able to do this before, so it must be some kind of standard.

Indeed, it turns out that a line in the message's header triggers it (names changed to protect the guilty):

List-Unsubscribe:

And it seems that pine can handle more than just List-Unsubscribe -- there must be some set of approved tokens after List- that pine knows how to handle. Interesting random note.

Lummy and I rented the Fight Club DVD last night so that he could see it ("The first rule of the LSC is that you do not talk about the LSC. The second rule of the LSC is that you do not talk about the LSC!"). The plan was to watch it on his new Viao (I know... don't even bother mentioning it...). We got back to our hotel (Skanky, Inc.), but the DVD wouldn't play. With a little further investigation, we discovered that the DVD decoding software had not been loaded. Lummy's playing with it now (the Win2K CD was here at the office); we'll give it a whirl later.

I noticed today that Ace of Base's song Wave Wet Sand has some satellite-like noises in the background (not that I've ever actually heard a satellite making noises, but I've seen enough movies to know exactly what they sound like such that I can pick them out of a lineup without any hesitation. "Yes officer, #3 is the same exact sound from the KDP1138 from Enemy of the State"). Coincidence, or plot? Only higher volumes and sleep-induced learning will tell.

August 27, 2000

Posession of a stolen shovel

Just saw Harold and Maude on DVD with Lummy (on the Viao, but got a bigger screen and real speakers for this). Trippy, yet interesting movie. It's somewhat of a mix of "get everything you can out of life" intermixed with a bunch of really funny suicide scenes (the fire one, I think, was my personal favorite). Definitely a black comedy if there ever was one. Where else can you see a Jaguahearse? Or a mother who wears a different wig every day?

I'd recommend it. It's a funny movie if you're in the right mood. There's a bunch of subtle things in there to keep you thinking, too. Overall thumbs up: I rate it as 5 minutes.

August 29, 2000

xor is not good encryption

It's been a while since I've done a journal entry, mostly because I was traveling all of yesterday. Woof. Let's see what has happened...

Spent most of the day down in the "lower Bay area" at Cleanscape -- the Attol people. Saw overviews of their products (which are pretty cool, IMHO) for testing software. It all started at SC99 when I saw their products/docs at their booth. Pretty cool stuff -- it would represent a fundamental change in the way that we do software in the LSC, but I really think that it would be a positive change, and allow us to write higher quality code.

Saw their presentations all day, met bunches of their people, etc., etc., and had lunch with them. In addition to the Attol line, we also briefly discussed their "qef" tool, which is a "make" replacement. It has a lot of the features in it that we have discussed in the context of the Software Carpentry stuff, but it has the disadvantage of being proprietary, and therefore not useful to us since we want to distribute source code (i.e., users would also need "qef" in order to compile our stuff). At present, it cannot "export" its build process, for example, to work on systems that do not have qef installed. Bummer.

Lummy had another meeting after this, and I went in search of a Fed Ex to send Jeremy the edits that I had made to the GGCL docs. After a good bit of searching (and I didn't even have a map!), I found a Kinko's with a Fed Ex drop, but the last pickup of the day had already happened (it was about 5pm by this point). So then I had to find the real Fed Ex place and then go pick up Lummy.

We chatted a bit more about the Attol stuff. He's somewhat against it, mainly for the reason that the test suites that it generates need the Attol run time systems in order to run. This stuff is proprietary, and distributed in binary form (e.g., libattol.a), and therefore we couldn't distribute it to anyone. Hence our test suites would only run for us, not for anyone else. The Cleanscape people were nebulous about "perhaps we can work out an agreement for distribution of the run time...", but neither of us have faith that that would actually be able to happen in a way suitable for freeware. Additionally, there's a pretty steep price tag. We should be able to afford it, but it's always a concern.

We got the latest/greatest version of the software from them, and will probably install it in nd.edu for Lummy and others to play with (Rich Lee and I played with it several months ago; we both liked it). We may also be able to make a "fake" Attol run time library that would be suitable for distribution -- stub out the necessary functions with little or no content in them. We'll see how it goes.

Needless to say, it's fairly obvious how I'm leaning -- I think these tools would be great for the LSC. It would get us out of the testing framework business, something that has occupied a lot of our time in the past. It also gives us cross-platform testing capability
-- any flavor of unix [that is supported by Attol, which is just about all of them], and 'doze. Could be useful.

We got back to the lab around 7-8pm after unsuccessfully trying to find food in the South Bay area. We got High Tek Burritos instead --
I got the world-famous Godzilla High Tek Burrito. I highly recommend it to anyone coming to Berkeley.

Answered all the e-mail that had piled up during the day, and started on some issues with inilib that Brian raised. It got late, we got tired (I had done a lot of driving...), and we left before I finished.

This morning, I set to work on inilib again, and saw an email from Brian with a key insight to solving the current issue (having to do with the compiler complaining about non-const references in temporaries). Running with that, and with the ultra-cool C++ keyword mutable, I was able to fix things the Right Way. inilib is looking good. We have a code review scheduled for this Friday, but I think we're essentially done. Getting very close to release! I'll plug it into the jjc/Minime later today and see how it really shapes up.

Had several hours of BLD planning with Lummy, Eric Roman, Mike Welcome, and Paul Hargrove. More discussions/arguments/resolutions. Looks like neat stuff. Lummy and I are going to spend some time writing out a list of requirements from what we have figured out in our "round table brainstorming" sessions, and see if the process can move forward more formally after this.

A great word emerged from the planning sessions -- "flamework", which evidentially means something like, "a framework that we're all arguing about."

September 1, 2000

All things being equal, LAM rocks

And version 1.0.6 of the MPI 2 C++ bindings has been released with extraordinary little fanfare. See what's new (it's actually nothing very interesting :-). The test suite still hangs in MPICH, but they say that that's ok, 'cause neither they nor I can figure out why... Seems to be some kind of Heisenbug in MPICH itself (shrudder).

Took the red eye with Lummy last night. Got to Cincinnati at 6am. Got to South Bend around 9am. Came to the lab and have been here ever since.

I gave my talk on the generalized master/slave parallelism stuff at lunch. It seemed to go well, but I wish that I had had a blackboard or whiteboard to use. :-(

Had a code review with Arun w.r.t. LAM/gm. Arun seems to have some kind of medical condition in his thumbs that prevents him from hitting the spacebar -- for this, I forgive him for the enormous lack of white space in his code (making it squished together and hard to read -- but who am I to judge? Oh... wait. I'm his boss). We recompiled LAM and his test program with the Solaris compilers so that he can use bcheck to find some Random Badness (there's at least one write to unallocated in a simple MPI_INIT/MPI_FINALIZE program --
oops).

Spent the rest of the afternoon finishing up the MPI 2 C++ bindings so that it can be released so that Elliott can continue working on what Mike Shepherd started -- finishing the rest of the C++ bindings for the MPI-2 functions. So 1.0.6 has been released and I created a tag in CVS, so now I'll go commit all of Mike Shepherd's stuff. Woo hoo! (also have to re-import the C++ bindings to LAM/MPI... mmm... find stupid CVS manual for 3rd party imports... ggggrrrrrphhhh...)

Gonna go meet Lynzo and some other random bones for dinner after the pep rally. Go Irish, beat Aggies! (I have to admit, I'm not hopeful this year, but good ol' 87 Jabari Halloway is one of the captains -- if anyone can lead that team to victory, it is he).

September 3, 2000

I am serious. And stop calling me Shirley.

Notre Dame won yesterday vs. Texas A&M -- 24 to 10. It was out home opener, it was hotter than two dogs... er... lying around after a big run (it was apparently 116 on the field).

Quote of the Day from Arun when we briefly discussed the yesterday's game when I came in today. I made some remark how I got a little sunburn and showed him my ultra-cool watch band tan line (chicks dig it, just like chicks dig MPI). Arun replied, "It must suck to be genetically inferior that way."

Classic.

Saw many old friends this weekend -- had dinner with Schleggue (although he joined us late), Lynzo, Vern, Pam Tyner, and Rachel Canata at Macri's on Friday. We then went to Corby's and then Senior Bar. As is my moral obligation, I got Lynn nicely drunk on vodka tonics. Game day was fun; hung out with Ed and Suzanne a bit and then saw them later about 10-15 rows below us in the student section. We sat with Dog, Jeremy Siek, Katie, Mike Niemier, Brian Bussing and his fiance Dana Collins. A good time was had by all, and we all drank a lot of water ("it's not just water -- it's Notre Dame Water"). We saw our boy Jabari out on the field, and he looked good -- he made some good plays, had some good catches and blocks; he generally did us proud. After the game, I even saw him take a bit of a leadership role with the guys on the field, further confirming my previous journal entry that if this team is going anywhere this semester, Jabari is going to have a lot to do with it. For those for have never met Jabari, he's a great guy -- really nice, tries to study hard (I can't even imagine trying to get all my work done *and* have a hellish practice and travel schedule; it's hard to be an NCAA athlete at Notre Dame...), etc. Jabari rocks.

We went out to dinner last night at Outback and all of us had too much to eat (Mary and Pete Calizzi joined us, too). A good time was had by all. We saw Ruth Riley there with some of her family/friends/whatever, but we didn't bother her. We went back to Dog's place afterwards and watched the Matrix. Then everyone hit the road (no one was staying close). It was good to see them all again.

I'm here in the lab for an inilib code review (and I'm late, 'cause I'm typing this entry...) with Brian, so with a big shot out to all my homies out there, PEACE, OUT.
(BTW, we're listening to "The Moog Cookbook" here in the lab. Does life get any better than this?)

September 4, 2000

Wedding 2K

Spent the day continuing setting up my new machine; still haven't got X quite right because I can't get KDE working right. I'm working in plain old twm, and it's stifling. Ugh.

Did some more cleanup around the house (it's still a wreck from all the wedding presents), and finally watched our wedding video with Tracy (it actually came last week, but we were both traveling). There are some utter classic moments in there (funny how everyone else's wedding video is cheesy, but yours is fantastic...):

Renzo, while we're standing around before the ceremony: "You just give the signal, and we'll get you right outta here."

Fr. Hesburgh: "Jerry and Tracy..." (actually, I have to provide some context here -- Fr. Hesburgh was fantastic, and he recovered quite well from his little error)

Faller (off camera), "Hey Jeff -- seafood!" (the camera caught this whole scene quite well. Had to back it up and watch it a few times)

Dog: "We couldn't get that bastard Sepeta up here because he's hitting on their dates!" (pointing at Barker and Faller)

There was much Meghan in the video as well. It was funny, too, to notice that Patrick got just about all the face time in the ceremony, and Chris got just about all the face time during the reception.

Some other funny scenes as well; some classic dancing/reception footage. One that Tracy didn't even see right away (it's off to the side of the frame, and it happens very quickly) -- we had to back this up and watch it a few times. After the wedding party dance, I stole Diann away from Darrell, who is left standing on the dance floor, looking forlorn. Shipman notices this, runs over into Darrell's outstretched arms, and they start dancing. The look on Courtney's face and her resulting body language is absolutely classic. Renzo quickly steps in with Courtney, and the camera pans away. The whole thing takes about 3-4 seconds.

September 5, 2000

Miles of code before I sleep

I was updating my xmms RPMs today (for Mandrake), and noticed that they have an ogg vorbis xmms plugin RPM. I installed it and played some .ogg files with it. I was pleased to notice that my previous concern about the vorbis xmms plugin hogging the CPU while playing songs has been fixed (or they just compiled it better than I did); playing a 160+kbps ogg stream has the load hover around 0.05 (i.e., comparable to .mp3). Very nice; perhaps this vorbis stuff has promise!

Spent much of the day working on pending LAM issues:

Finally fixed the SCO user's problem. Turned out to be a race condition in the file descriptor passing code. Interesting that it never showed up on any other operating system; it may be a SCO-specific issue (the sender was sending three file descriptors and then closing the pipe; SCO apparently discards any unreceived messages when the sender closes, even if the receiver still has the pipe open). Who knows. Putting a simple sender-waits-for-an-ACK scheme fixed the problem. It's interesting to note how hard it was to find the problem, and how it was trivial to fix it once the exact problem had been determined. It was really hard to find the problem because my troubleshooting was limited to e-mail only; I do not have a SCO machine to test on, and the user's boss ixnay'ed the possibility of me getting a guest account to test with.

Found a real race condition in the LAM code to launch executables on remote nodes (at lamboot time, not at mpirun time). It is possible for output from remote nodes to be dropped before mpirun has a chance to see it if rsh exits too quickly. It's not immediately clear to me how to fix this problem... It seems to only have become evident with a few LAM users with the advent of faster processors and networks.

Fixed a minor issue with the --with-rsh logic in configure.in that a helpful user pointed out.

Added some much more user-friendly "there is no lamd running" messages (via the lam-helpfile) to all the LAM executables and to MPI_Init.

Released 6.3.3b32 with these changes. Pending issues:

The race condition with rsh.

The MPI 2 C++ still seem to be broken under some conditions (e.g., when using --without-fc). @#$%#@$%#@$%#@!!!!!

An IRIX user is complaining about some socket issue at mpirun time. I've pinged him to try the 6.3.3 beta, but I doubt that this will fix his problems. We'll have to see how this one pans out.

My 800mhz machine is fast (provided that it's only doing one thing at a time -- it is still an Intel box, after all...). Times expressed in min:sec:

800mhz machine

Ultra 30 (athos)

Run autoconf and friends for LAM/MPI:

0:07

0:23

Run configure for LAM/MPI:

0:32

1:22

Full build of LAM/MPI:

3:20

12:56

I did the build on athos, which is admittedly not the fastest machine (not only is it only 300mhz, it has limited memory; I should have used a hydra, I suppose, which would have been half the mhz of the intel machine and had a lot more RAM). But the build was about 4x faster (again, with the big caveat that the machine is doing little else at the time).

But these figures certainly do inspire me to do some development locally rather than remotely to nd.edu. Happiness all around!

September 6, 2000

T.P.R. Report / Initech

When I drove down here a few days ago, I noticed some water dripping behind my glove compartment. We didn't go out and have a good look at it until today. We picked up the floor mat (which was good and wet still), and it was soaked underneath with a healthy chunk of mildew growing on my bottom carpet.

Bonk.

I have an appointment on Friday morning to take the car in and have whatever it is that is broken fixed (I am a code wizard, not a car wizard).

Finally got my IO streams book from Amazon today -- I accidentally put the wrong apartment number on the "ship to" address, and UPS got really confused. I called yesterday and they re-shipped it again to the right address (no charge, whoo hoo!). Got the Office Space CD, too. Yummy (already ripped into MP3s, and I'm listening to them right now...).

ROMIO and MPICH released new versions today. Luckily, the new ROMIO is just about the same as the old one (configure/build-wise), so since I had the foresight to document what I did last time, I mainly followed the same steps and ROMIO seems to be integrated into LAM/MPI just fine.

\begin{bitch}

CVS third party importing sucks, for multiple reasons:

It does not record which files have disappeared or moved from release to release. That is, the initial import is fine. But when you import a new version over the old one, you would think that it would just snapshot the new one and keep the old one as just history. i.e., files that existed in the first version but do not exist in the second version should not show up upon checkouts. Not so.

For example, in the MPI 2 C++ bindings, we moved a bunch of header files from one directory to another. I did the 3rd party import in CVS of the new version, and then updated my local copy of LAM. Suddenly I had 2 copies of all the header files -- one in the old location, and one in the new location. Other than cvs remove'ing each old .h file, I didn't see any way to correct the situation. So I just blew away the old 3rd party imports (well, actually, I just moved them... never delete!!), and imported the C++ bindings as if it was their first import.

If you third party import a distribution tarball that uses automake, plan to be hosed. It screws up all the timestamps such that it tries to invoke automake and friends when you ./configure/make it. And since it's a distribution tarball, you don't have things like acconfig.h, so autoheader will fail. And it goes downhill from there.

The only solution that I found was to do a massive touch of all the files in the third party source directory tree such that every file in the tree has the same timestamp. Icky. Horrible. Shrudder. But it works.

But we shouldn't need something like this -- I'm open to better solutions (perhaps just including the tarball itself...? Hmmmm...!)

\end{bitch}

Did a bunch of LAM work today, but I might have just found a new issue under Solaris. It seems that mpirun is hanging. Ugh!!! Was it something that I did in the extra synchronization that I added for SCO?

September 10, 2000

Singing backup chicken

Ahh... the Nebraska game.

It was an amazing game. I really did not expect that ND would play so well -- we were ranked 23/25 (according to what poll you looked at), yet we stayed head to head to Nebraska (#1) for nearly the whole game. Our offense was a little off, but then again, Nebraska has a great offense. We had 2 amazing runbacks (one from a punt, the other from a kickoff) for 14 of our 21 points. At the end of regulation play, we were tied at 21.

We lost in overtime; we got a field goal, they got a touchdown (sadly, overtime has never been good to us). So we lost by 3 points. But it's a helluva lot better than the spread -- 13.5 points. It was a fantastic game. Tracy, Jim, Anna, and I were watching it in a local Damon's (sports bar). When Nebraska finally won, a few Nebraska fans started cheering loudly. I turned to them and said, "You just beat #25." That shut them up immediately.

So even though we lost, I can only picture it as a win. They won by a fluke (and a really, really fast quarterback); it really could have gone either way (and yes I would have been saying that if we had won, too). And then didn't play down to us, we played right on par with what the news media calls the #1 team in the nation.

We must go up in the polls for this (it doesn't look like they've been updated yet). It would be nice to see Nebraska go down, but I don't know if that will happen (FSU, #2, barely won against Georgia Tech -- who isn't even ranked -- yesterday; it looks like Georgia Tech had a pretty amazing game as well). Michigan (#3) had a pretty convincing win over Rice, so maybe...? Who knows. I've become convinced over the years that the two sports polls are based on a random function, anyway.

On a lighter side, Tracy, Jim, Anna and I went to a restaurant (can't remember the name...) after seeing "The Cell" (which I give about 2 minutes; it was... ok, but not good or great). We caught the tail end of the University of Louisville vs. Grambling football game. I've never seen comedy in football before, but this was definitely it. UL won the game 52 to nothing, and the score said it all. The Grambling players really looked like they were trying hard, but their attempts were just comical. I can only imagine that they don't get a lot of funding, or perhaps their coaching and practices are terrible, or... I have no idea. But it was the funniest thing that I've seen in quite a while. UL just stomped all over them (and I'm not even a UL fan!).

September 11, 2000

Do elephants sweat?

Wooooo-eeeeee.... the paper is up to 25 pages now (and I haven't written the majority of section 7 yet!). I spent the entire day revising it. Properly designing a software system is a lot of work. But (like I've said countless times before), it's cool stuff. There are some really delicious issues and problems that would never expect from a plain ol' manager/worker problem. I think I've got one more major revision before I unleash it to some others to read.

Had some more interaction with the guys who are having rsh/lamboot issues. Seems like rsh is not the problem after all. It may be faulty handling of stderr/stdout processing. The guy was running some simulations on his cluster; he said that he would try some new code of mine when that finished. We'll see if we can finally solve this problem.

Blockbuster sucks.

They sent a threatening letter to me at my parent's house in Philadelphia claiming that I had not returned the Fight Club DVD to the Berkeley Blockbuster store for almost two weeks. The happened to mention that the matter had already been turned over to a collection agency. Great.

I checked with Lummy and he definitely remembers returning it (we rented 3 DVDs; Fight Club had to be back in 1.5 days, the others were 5 day rentals). We returned Fight Club before it was due, and watched the other 2 later. I was with Lummy to on one of the "return to Blockbuster" trips, but not the other, and I couldn't remember which was which.

Anyway, I called the Berkeley store and told them that I was absolutely positive that I had returned the DVD. They guy looked it up in the computer and said, "Oh yeah... we found it on the shelf later." Over 2 weeks later, apparently!!

So I was about to be fined and have a big bad black spot put on my credit record because of some clerk kid's stupid mistake in Berkeley. Blockbuster was about to fine me without even checking with me (the letter claimed that Blockbuster tried to call and snail mail me, but I never got any messages or snail mail). What the heck is that all about? And then they send the final notice to somewhere that I haven't lived for well over a decade.

The whole thing kinda pisses me off. I don't know how excited I'll be to go back to a Blockbuster.

September 12, 2000

The cockpit? What is it?

I received 2 packages today -- how exciting! God, Internet shopping is great.

The first package was from Amazon, and it contained all the CD's that I ordered (I finally have all the CD's for the MP3s that I own -- some of which I have been looking for for quite some time. See yesterday's journal entry about the word "soundtrack" in internet music search engines... grr...): MI-2 soundtrack, Chemical Brothers/Surrender, Groove soundtrack, Go soundtrack, Fight Club soundtrack, Various Artists for the Masses. The ones that weren't already ripped are finishing MP3 encoding right now...

I'm listening to the Groove soundtrack. Sound like hip stuff. Nothing earth shattering so far, but it's good coding background music. It's really heavy on the bass (even on my mondo sub-woofer's minimum setting!), so I can't turn it up very much because I live on the second floor of an apartment building. Since I like to have semi-loud music on while I'm coding/working, does this justify my saying "I need a house to support my coding style"?

The book Advanced Programming in the Unix Environment by W. Richard Stevens. It came highly recommended by fellow Llama Nick. This book has everything -- would that I had known of its existence before! It could have saved me much exploration and experimentation with pseudo-ttys, various IPC mechanisms, passing file descriptors, random issues with SIGABRT, and other insundry bits of Unix system-level things. <sigh> I was glad to see that I had gotten 5 of the 6 guidelines for daemon processes in Minime, though (I didn't set minimed's umask to 0 -- oops. I was very careful about every file that it opened, but setting the umask would be better).

I can't remember where I ordered this book from; I found it on www.bestbookbuys.com. I highly recommend this URL for anyone who is buying books off the web --
it saved me somewhere between $10-20 on this book.

Speaking of handy URLs, someone pointed out http://www.amazing-bargains.com/ to me the other day, particularly their their section about buy.com. They always list some good deals for buy.com, like coupons for "$10 off any order of $50 or more" and whatnot. I wish that I had known about that a month or two ago -- I bought a PCMCIA network card from them. Ah well -- next time.

Still working on the paper. The text portion of the first half of the paper still heavily reflected that I originally wrote this as a list of bullets, and is requiring much re-writing. The second half was mostly ok 'cause I had already re-written much of it. :-)

Happy, happy, joy, joy...

I think we finally fixed the race condition in booting LAM. Many thanks to some helpful LAM users and their patience for helping slog through this obscure issue. We've got a few more tests to run to ensure that it's done, and I sent the new code out to the Debian user who initially reported the bug, but I think that I finally understand what the problem was, and how I fixed it.

I found a new <blockquote> attribute the other day -- type=cite -- that looks really cool in netscape (be sure to check this journal entry out on the web). Doesn't appear to do much over normal <blockquote> in lynx. I wonder what it will do in pine.

September 14, 2000

Mysteries of the milkshake

Exciting changes today...

Darrell called me with the joyous news that PacBell finally hooked up his DSL today (it only took 2 months. The most comical part of the saga was, after 1.5 months, after 2 house calls from PacBell technicians, Darrell got a call saying, "We finally figured out what the problem with your DSL line is. Your local Bell office doesn't support DSL.")

I spent about an hour or two with Darrell setting up our DNS servers. Darrell already had experience with this, so most of the pain and learning curve was avoided. Seemed pretty straightforward afterwards, but took a little understanding to get there. So Darrell and I are now secondary DNS servers for each other (kresge.com and squyres.com). We did some testing and it all seems to be working. Pretty cool stuff.

Darrell's with NSI (the evil empire), and he submitted his DNS change to them earlier today. They supposedly updated at 5pm EST, but as of now (12:23am EST the following day), nd.edu machines still don't see the change.

I'm with register.com, and it took a little explaining to them exactly what I wanted to do (had to do it on the phone). Turned out that it was their silly web interface that confused me, and we submitted my DNS change as well. They supposedly update tomorrow morning. Indeed, nd.edu machines don't see the change yet, but when I'm on my machines, "whois squyres.com" shows all the new stuff. Cool!

I've already added a few names to squyres.com --
introducing the new, improved JeffJournal! When the DNS change propagates out to the world, the JeffJournal archives will be located at the following URL:

If that isn't vain, I don't know what is. But hey, I only do it... because I can.
Had to do some screwing with my apache settings to get the virtual hosting stuff working with www.squyres.com, wedding.squyres.com, and www.fhffl.com. Learned some things about how to get Apache really confused today. Could be useful someday.

Arf -- just got a bounced message from nd.edu from an automated message on wedding.squyres.com. It seems that I had router.squyres.com as the first entry for that machine in /etc/hosts, which doesn't exist in DNS. Oops. Fixed.

In other fronts, I was continued to be distracted by getting motivated to figure out what the numbered ports that showed up in netstat -a were on my 2 machines. Turns out that most of them had to do with NFS (which I used between my router and my desktop so that I can server my MP3s from the big disk on my router to the xmms on my desktop).

I got further inspired to ditch NFS because I thought of a truly cool way to serve up my MP3s without NFS -- using http and the streaming capabilities of xmms (I already have a web server running, so...). I wrote up a minimalistic PHP script that allows me to navigate the directories and files in my MP3 directory tree. Clicking any of them invokes a PHP thingy to generate an .m3u MP3 playlist file on the fly, and send it to xmms. With the directory-browsing aspects of the scripty-foo, I can queue up multiple levels of MP3s:

Actually, I could have just said "I can enqueue any tree of files, to include the special case of a tree of one file."

It was surprisingly easy. It's truly cool. I may someday be inspired to make it a bit more aesthetic and have more options... but why?

xmms stops just short of offering a full set of remote controls from the command line (I had to add an appropriate application handler for .m3u in netscape to call xmms), but I guess it's sufficient.

Ok, back to work now... the paper is really almost finished. I was halfway through the last code review when Darrell called me today...

(BTW, the jeffjournal client is fantastic -- it just informed me that I left a <CENTER> unclosed from line 37 before I mistakenly submitted it, causing all kinds of formatting madness, and potentially threatening the world's existence. We are pleased.)

September 15, 2000

How to Succeed In Coding Without Really Trying

nd.edu finally joins the rest of the masses in recognizing my new DNS server. Welcome to the new and improved JeffJournal! For all of you out there who bookmarked the JeffJournal in your web browser, it has now moved:

Had to re-rip some CD's 'cause their MP3s seemed to be a bit skewed. Sometimes they cut off right in the middle of a song or something like that. I attribute this to when I was ripping CDs on my laptop, which has limited disk space. Turns out that when grip runs out of disk space, it just merrily stops the current song and goes on to the next with no indication of warning. Hence, I believe that some percentage of my MP3s are flawed, so I think I'll have to re-rip some of them over the next few months.

I finally finished a first copy of the manager/worker paper yesterday. There really are some delicious complications in the whole aspect of Things that make it fun. I even wrote the whole paper without writing a single line of code -- it's 100% pure design. There's a good chance that I'll use that paper as a guideline to write a parallel vorbis encoder. Gotta practice what I preach, after all. And it can only make the paper better.

I missed an MPI talk at ND yesterday. Bonk. It sounded like it would have been interesting. :-(

Tracy and I won't be going up to the Purdue game this weekend; her travel schedule was too much this week. Oh well. :-\ Hopefully, the boys will rally with the loss of Arnaz and Irons and the Irish will still prevail.

I'm noticing that my bandwidth between my desktop and my router is really crappy -- I'm just copying over the MP3s that I ripped on my desktop and only getting anywhere between 47 and 69 kB/s. Ick. I see the collision light coming on on my hub a lot; seems like this may be causing too much binary backoff. Might be time to invest $50 in a switch...

Spent some time on LAM yesterday. I noticed an annoying security issue yesterday, and spent some time hacking around in the lamd and the rest of the user-level LAM libraries ensuring that all internal files that LAM uses are opened with "other" and "group" permissions zeroed out. And then it turns out that Solaris doesn't like to abide by the umask when it opens named sockets. Ugh. So I had to go the ssh route and move all the LAM sockets and temporary internal files into their own directory (which does abide by the umask) to guarantee security. Ugh.

That's all for now; more news from Washington as our reporters check in.

September 16, 2000

Do you Yahoo?

A good day. We beat Purdue with a last second field goal to make the score 23-21 in favor of the Good Guys. We watched the game at the local BW-3's, and met some subway alums there. I guess I haven't really watched too many games away from South Bend (where most everyone is an ND fan), and I haven't really met/talked to too many subway alums. They're interesting folk -- no ties to ND, but are completely rabid about ND and its football program. The people that we met were really nice and we had a good time with them. I'm sure that we'll see those folks again, as well as other subway alums here in Louisville (the NBC affiliate down here broadcasts SEC games, not ND games, hence we have to go to sports bars to see the game).

There were some Purdue folks in the bar, too, and they were dumbfounded when the field goal actually went in (to be fair, we were too :-). By the numbers, we probably should have lost that game -- I don't know for a fact, but I'd be willing to bet that Purdue has us beat in just about every stat. Our guys played well, but we lost two key players (QB on offense, and ?DB&? on defense), so both squads were critically short. The new QB stepped up pretty well, but it was his first college game and he made a few mistakes. Still, he did pretty well and I certainly don't fault him for anything. At the end of the day, he delivered, and we won the game. He's got lots of time to improve, and I'm certainly pleased with what he did today. Good job, Greg. Looks like the students were pretty pleased at the end of the game; they were all over the field in and around the players. Rock on.

So we'll see what happens in the polls tomorrow. Purdue was 11 or 12 or something, and we were 21 or something, and I think we'll both be 2-1. We'll see.

We went to dinner with Janna (Jim+Anna) again, which was fun. New microbrew here in town. Not bad beer, but a little too sweet for me. Good conversation, and much fun was had. Janna has a satellite dish, and next week's game is on PPV, so we'll be heading over to their place to watch it. Hmm... actually, checking the network schedules, it looks like it's on ABC. That would make it a bit more convenient...

I finished my paper other day (I think that I mentioned this in as journal entry previously), and posted it to the vorbis-dev list yesterday, too, just for the heck of it. Finally got a response from someone today who said that it was good stuff. Good to hear, but they didn't have any ideas, suggestions, comments. Oh well.

Since my computer has been idle most of the day, I started running the distributed.net stuff. It appears that they're focusing on the OGR project. I don't really know what it is, but it appears that most of the keyspace has already been exhausted from the stats graph. It's really slow. Since I started the client last night around 11:30pm on my 800mhz machine, it's only done about 4.3 OGR packets. Wow.

I haven't been running bind for 72 hours yet, and they just released a new version. Apparenly bind 9.0.0 has been released. I'm a lazy bastard -- I'll wait for the Mandrake RPM. :-)

September 17, 2000

Your spleen and you: do you have a good relationship?

Not much to report today. Spent a little time upgrading my PGP tie-ins to pine, so that it actually does things correctly (been meaning to do this for quite a while, actually). It will decrypt multi-part messages, messages that are signed, or messages that have additional content besides just encryption. Happiness.

Did some more organizing of my finances and finally got my credit card statement to balance with what is on my bank's web page. Woo hoo!

Signed up for a better AT&T plan today. The service is exactly the same, it's just an arbitrarily complicated pricing scheme to make plans seem different. It's amusing, though, 4 of AT&T's big plans (and don't consider these descriptions legally binding -- go to AT&T's web site for full descriptions) are:

$0.10/minute, any time of day. This is apparently what plan we were on.

The interesting thing is that AT&T marketing makes it sound like they have actually calculated the mathematical derivative for each plan. For example, and says, "You should use this plan if you are spending over $x.xx a month, or if you are spending over
$y.yy, you should use this plan..."

But here's the kicker (as I'm sure all good, thinking people out there noticed): spending $x.xx on which plan?!? I hate marketing dweebs. Do people actually fall for this stuff?

Anyway, we did the math (i.e., compared the plans over our last 3 phone bills), and signed up for the $0.05/minute any time plan. Indeed, 2 months ago, this would have saved close to $40 on our bill. Yikes! (Granted, there were some pretty long wedding planning phone calls, but still...)

September 18, 2000

The Art of Barbering

I just got a haircut today in a local Louisville barber shop. I have a long-standing theory that you can tell a lot about a town from their barber shops.

Barber shops are a mostly male-oriented club. True, you'll see mothers in barber shops to bring their sons in for haircuts, and you'll even see the not-too-uncommon female barber (indeed, the barber shop where I went in South Bend had one male barber -- the owner -- and two female barbers). I guess it would be more correct to say that the clientele is almost entirely male.

Humorous anecdote: I went to my typical barber in South Bend a few days before my wedding to get a trim. The woman asked me if I wanted my normal military high-and-tight cut. I told her no, I was getting married in a few days and my bride-to-be told me that she wanted "some hair on my head lest flashbulbs reflect off my head and ruin all the pictures." An older guy was getting is haircut down the row from me. In a low, grisly voice, he said, "You're getting married? Come over here, boy, we gotta talk."

The conversations that flow around barber shops tends to reflect the popular attitudes of the area. Here in Kentucky, I hear about tobacco crops (they actually have pro-tobacco ads on TV here), the military, and University of Louisville and University of Kentucky football.

In Frank's barber shop on the campus of Notre Dame, it is filled with ND memorabilia. Frank loves to hear about student perceptions on campus, football, the band, ROTC, or any other ND-related or military-related topic (he was in the military himself, in younger years).

At the Ft. Knox barber shops, the talk is actually fairly sparse. There's some chatter, but mostly people are there because they have to be there (regulation haircuts and all); it's part of the job. But there are some retired folk who sill come on base for haircuts and the gossip with the barbers and soldiers.

The barber shops that I used to visit back outside of Philadelphia are much the same. Typically somewhat 40-60 year old male barbers who have the look and feel of someone who has seen and done everything. The ability to strike up a conversation about any random topic. Sports are common, the military is another. Politics, of course (especially with this being an election year), is a big topic as well.

My conclusion is that the barber shop is a social island in the midst of hustling and bustling metropolises. The pace tends to be a little bit slower there than the rest of life. Granted, South Bend and Louisville aren't huge cities, and neither are the suburbs outside Philadelphia where I would get my hair cut. There's typically some kind of talk going on about something, and -- especially in a small barber shop -- the barber knows many of the patrons by name and how they usually want their hair cut.

Indeed, I've asked most of my barbers why they chose to cut hair for a living. Most of them laugh and make some kind of remark about how the never-ending demand (how often have you ever walked in to a barber shop and been seated immediately?), but then they have all said that it's for the people. Many had careers before becoming barbers, but left them for one reason or another and became barbers because of the wide variety of people that they would meet. Hence, they're using barbering as a vehicle -- it's not for love of cutting hair, for example -- to see a sample of the world that we live in. The local barber probably has a pretty good feel for the community around him/her -- probably more so than most. Indeed, the Art of Barbering (as I call it) seems to have little to do with cutting hair. It seems to be very similar to salesmanship, or bartending. Some people are good at it -- naturally easy to talk to, good listeners (yet still expressing their opinions in order to keep the conversation going), etc., etc.

This is hardly a startling conclusion by any stretch of the imagination. But I sometimes wonder what a long-term study of barber shops, their clientele, and the conversations that occur there would show. Who knows -- it might even be worth some kind of degree in Sociology or something. :-) But the barber shop is something that many of us take for granted and rarely notice. It's just something that you have to do once a month or so.

There was no point to any of the above. I'm just pointing out something that most of us take for granted, and that we rarely notice. No real reason.

The Art of Barbering Too

Absolutely true! In San Diego (I believe America's 6th largest city), the barber shops are remarkably similar to South Bend, or anywhere else I've been. (Ask Jason about Vitos... the cops... etc.)

There's just something about going to a place where they do your side burns and the back of your neck with host shaving cream and a straight edged razor. (To me, there's something particularly Arun-esque about this line of conversation.)

From Arun:

Interesting comments, I hadn't really thought about it, but thinking back it must be quite interesting. I imagine the barber shops/beauty salons of Las Vegas Hotels must be especially interesting. I got my hair cut at one and in the short time I was there there were 3 wedding parties passing through in one stage or another.

This raises an interesting point -- are there [at least] two fundamental kinds of barbers? Those who have a handle on the local community and those whose community is mainly composed of transients (e.g., tourists)? And of the second type (I have to admit, I don't think that I've met any of those type):

Why did they get into barbering? The same reasons?

What do they yield from the Art of Barbering? It certainly isn't a feel for the local community -- there isn't one. What do they get a feel for? What are the conversations in their shops like?

And in this case, I suppose the Art of Barbering can be abstracted to a higher level, such as those who primarily interact with tourists (but then again, Vegas is truly unique!). For example, what are the differences between clientele of the T.G.I. Friday's in South Bend vs. the clientele of the T.G.I. Friday's in Vegas?

Again, this has no point. Just idle wonderings of someone waiting for X latency between squyres.com and nd.edu...

September 19, 2000

Goulash or spackle: you decide

My car looks fantastic!
I had to take it in to be detailed to get rid of the mildew smell from when my AC self-imploded (read: the output valves got clogged and all the water ran off into my front passenger footroom. Eeewww!!). I took the car in this morning, and when I went to pick it up, I was amazed: the car looks 5 years younger. They vacuumed and shampooed everything, and used the make-the-plastic-look-new stuff. The buffed and shined, and gave my car a complete exterior car wash. It looks amazing.

I could see people that I drove by gaping at my car, then touching their nose, pointing to my car and saying to their neighbor, "You see? That's what a 1993 Honda Civic is supposed to look like."

Spent too much time on LAM/MPI today. But I resolved some important bugs:

We finally got confirmation that we fixed the lamboot race condition. Hurray for the good guys!

I found a bug in the lamd today such that any new process that it forked (e.g., via mpirun) would inherit all the file descriptors of the named unix socket client connections that the lamd had open. Oops. The spawn code now closes everything except stdin, stdout, and stderr (which it replaces with whatever mpirun/lamexec gives it, anyway).

I made the show_help() function a bit more robust in that it will try harder (and smarter) to find the helpfile. It will even display a specific error if it finds the helpfile but can't open it (e.g., if the process is out of file descriptors). Indeed, we now save errno properly so that when we use the %perror or %errno tokens in the LAM helpfile, it will display the correct errno, not just the last one.

We still may be having issues with really large numbers of nodes, though. Theoretically, we should be able to go up to 1024 -
(stdin, stdout, stderr, and a socket to the local lamd) ranks since that's how large the type fd_set that is used with select(2) can handle, but we seem to be falling way short of that for some reason. There's a user in Germany who is trying to use LAM with 528 nodes (he was thrilled when I gave him a copy of the 6.3.3 beta with lamhalt in it -- he says that a lamboot can take up to 10 minutes!). I am still investigating this.

An engineer from GE Aircraft Engines mailed me today, concerned about the [accidental] inclusion of the GNU license in LAM 6.3.2, because they want to use LAM internally. I told him that all was well -- its inclusion was accidental and I would never cut off my shuga-momma's company like that.

Other random acts of goodness:

Hooked Janna up with John's extra ND/Stanford tickets.

Saw a neat article today (from dad) about how Scott Malpass has really, really grown the ND endowment since he started managing it. Did you know that ND was one of the initial investors of Yahoo!?

Got into an interesting discussion with Arun and Rich yesterday about barber shops when Rich said something about "Arun-esque". This triggered a long forgotten memory about the word "Arunesque", which I shared with them. Long story short: "Arunesque" means "to celebrate", or "to perform a ceremony for".

Since they don't seem to broadcast News Radio down here, I have had to replace it with something else. The Drew Carey show seems to do nicely. I've always liked Drew Carey, and his shows are pretty funny. I highly recommend them to anyone who hasn't seen them -- I'd rate most of them at 17.5 minutes.

I took the most recent copy of LAM's inetexec.c (the code that uses rsh to spawn things on remote machines), C++-ized it, and started working on it to do tree-based boots, and to allow nodes to fail during the boot. I stole a bunch of minime code to do this as well -- the result will get merged back into minime before it gets merged back into LAM -- because I wanted to do it in a small system first. Minime isn't large, but it sure isn't small (12,000+ lines of C++ code).

Tracy's music group at church had a little "congrats" reception for us last night. Free food and wine, plus they gave us a bunch of gift certificates. I love all the free stuff that you receive when you get married; I should do it more often. No, wait...

Miles to code before I sleep...

(I've pointed this out before, but I just love jjc. It pointed out 3 places where I didn't close my HTML tags properly,
and let me go back and edit it before I submitted. With all the<code> tags that I used in this entry [which pine
does not show, sadly --
href="http://jeff.squyres.com/journal/">see the web page], I
accidentally repeated <code> instead of the proper
closing tag a few times. Happy, happy, joy, joy...)

September 21, 2000

El Blockbuster sucketh

The saga continues.

Blockbuster sucks.

How much do they suck? Let me count the ways...

My dad mailed me today that I got a nasty letter from a collection agency demanding the return of the Fight Club DVD to the Berkeley Blockbuster. This is after I got a threatening letter from Blockbuster a while ago saying "return the Fight Club DVD or else". I had already called them and got it straightened out (I did return it on time -- they lost it... and later found it). See previous journal entries for the story so far.

So anyway, this collection agency is threatening to screw with my credit for some mistake that I had nothing to do with. I had to call the Berkeley Blockbuster store again to figure out what was wrong. The manager pulled up my account and said, "I see we cleared you on the Fight Club problem, but I see a late charge on Hot Boys..."

WHAT?!?!

I've never even heard of such a movie, nor does it sound like I would want to see it. Ever. I conveyed this to the manager and he sounded very skeptical.

"Did you report your card as lost?" he asked me.

"No -- I have it right here in my wallet".

Puzzled silence from California.

"Oh wait... I'm looking at someone else's account; they rented Fight Club as well. How do you spell your name again?"

<sigh>

So he finally pulls up my account. "Oops... looks like we marked you as credited here in the store, but no one notified the collection agency..."

September 24, 2000

Internet, internot

Bummer. We lost to Michigan State yesterday, and in the last few minutes of the game, too. Bonk. So much for the season...

We watch the game at Janna's house, and had a good time with them. We stayed for dinner. I hooked Jim up with a new version of WinAmp afterwards, and I have a bunch of his and Anna's CD's to rip this week.

Many errands to do today -- clean the apartment, thank you notes (no, really!), etc.

September 27, 2000

I am pepperoni

Heisenlocks are hard to fix (where "Heisenlock" == "a deadlock where you can't know the deadlock and it's location at the same time", a la Heisenbugs). Particularly the ones that seems to move around.

How do you know when you have fixed it? You stop getting deadlocks. But if it only locked periodically to begin with (as is the nature of Heisenlocks), how do you know that you just haven't tested enough to run into a deadlock?

I pose this question because a) it's happening to me today, and b) it happened to me with PIPT. After months of testing, the PIPT decided to lock up right in front of our sponsors. After I finally figured out the problem (several days later, mind you), I noticed that I hadn't changed the problematic code in a long time. That is, the bug had survived for months without causing deadlock. But then it suddenly did. <sigh>

It' rare to encounter Heisenlocks, understand the whole picture, and say "Aaaahhhh.... yes, this is exactly the problem that I am looking for." Indeed, the code is typically so complex and the race condition so thorny that it is difficult to get the overall picture until after the fact.

Hence, we have one of Jeff's laws of multithreaded programming:

Easy race conditions are typically obvious to find. Heisenlocks tend to be caused by extremely subtle race conditions that usually "could never happen" because of x, y, and z, where one or more of x, y, or z (or, more likely, some previously unconsidered "tautology" w) is proven to be false -- typically after multiple days of hacking, around 3am amidst much wailing, gnashing of teeth, and caffeine.

I certainly do not believe in changing random things until something seems to work as a whole solution. Sometimes I am reduced to this behavior (e.g., when I run out of ideas), but I always work to pin down the exact reason for success/failure after I find something that "seems to work". It is crucial to understand why it works, lest you fix only a symptom of a problem, not the real problem. This is the only way to be sure to fix a problem rather than guess that it is fixed because it "seems" to be fixed.

October 1, 2000

I have failed

I noticed that one of my students -- we'll call him "Fred" to protect the guilty -- had the following process running yesterday on one of the LSC machines:

fred pts/17 Tue 7pm 3:09 telnet rodrigues-8a.student.nd.edu

I am greatly saddened; all the Righteous have long since struck "telnet" from their working vocabulary, and save it only for debugging of ASCII protocols such as SMTP and HTTP, and use some form of encryption for normal remote access (e.g., ssh).

Alas, Fred, where did I go wrong? How did I not stress the importance of security? I feel like a parent who has just found out that their child has been a habitual drug user for multiple years.

Oh yea, the way of telnet is easy -- it is fast, universal, and yea, it may be ingrained in typing habits. But the path of the Righteousness is never easy. Installation of ssh takes time (but is not difficult), and requires remembering to type "ssh" instead of "telnet" (half as many characters, I might add).

And so spoketh the great System Administrator in the Sky:

...He who uses telnet for personal use shall be damned in the fires of script kiddies. His boxen shall become IRC bots, and be owned by demons half his age. He shall be scoffed by his new owners as yet another useless academic. His boxen shall become slow and bogged down with new traffic, and there will be great wailing and gnashing of teeth. None shall hear his screams (for the Righteous do not look at unencrypted traffic).

Fred (you know who you are): you need help. If you don't get help from NDLUG, please, get help somewhere.

October 3, 2000

Mangos and Margins

After the whole hydra time sink, got some good things done today...

Officially re-opened the hydra for business today.

Someone noticed a minor error with parallel bladeenc last week, and I finally got around to checking it out (in between compiles of real work today). Turns out he found a bona-fide bug in the shutdown routines -- it only showed up under MPICH because LAM rocks (i.e., if you do a singleton init with MPICH, you get MPI_COMM_WORLD == MPI_COMM_NULL, which is icky). I noticed that I had a few unreleased things in parallel bladeenc, but I didn't release them -- I just edited a 0.92.1b4 tarball with the fix, and called is 0.92.1b5. Freshmeat announcement in in their queue. Maybe someday I'll test and release the unreleased stuff that I have in CVS, but not right now...

I hooked John up with SSL/IMAP on www.squyres.com (a.k.a. shipman.ws -- my first non-.com hosting!). I also hooked him up with authenticated and SSL-encrypted SMTP access -- pretty cool stuff. So he can relay through www.squyres.com to his heart's content, because he's fully authenticated using SASL, and all of his traffic (not just his IMAP traffic) is SSL-encrypted. Gotta figure out how to make pine do that (encrypt and SASL-ize SMTP traffic); he's using Outlook Express.

I hit the RedHat guys up for some free stuff for SC'2000. I hope it's not too late to get stuff from them...

Talked to Regina today, more about buying a house. She had some good advice.

Called and volunteered at my church. I'm such a great guy. ;-)

Turns out that I'll be leaving for ND Thursday morning and staying there for about 1.5 weeks. The Stanford game is this weekend, and then I'll be staying on to meet Rusty when he comes to campus next week, and for various meetings, etc., etc. Larry Augustine is coming to ND this Thursday, and I might get to meet him. Should be fun and interesting.

Had a pleasant experience with headsetzone.com today. I ordered a new telephone headset the other day (once you start using headsets, you'll never go back. They're geeky looking, but, man, they're fantastic! The telemarketer-grade ones are truly awesome [which is what I have]) since my current headset is getting fritzy. They called me today about my order because I ordered an AC adapter, not realizing that the amp already comes with an adapter. So they kindly whacked the extra adapter from my order before sending it on its merry way.

I think that .com's are starting to realize that service is very important -- you can't just put a bunch of products up on an https and expect people to buy.

Random question: what happens when you put version control meta directories under version control? Apparently, that's what one former LSC student tried to find out. I ran across this directory today by accident (line broken up for web/browser display purposes, and name changed to protect the guilty):

Do you think that God uses CVS? If so, what version are we? Are we a branch, or the main trunk? Can you imagine meeting a later version of yourself? Just think of all the new, cool features that you'd have!

A: "Ah yes, this is Jeffv1.7. The current version, Jeffv13.2 is much more advanced -- it has additional pincher claws, direct audio/visual/pseudo-senseing input feeds, extra-sensory perception (v7.2), electro-skeletal implants for strength and flexibility, web slingers (not spider-man like, these are the real thing), he's on the Space Football team as first string quarterback, etc., etc. Oh, and it can code like nobody's business."

October 5, 2000

Blueberry pineapples

Candles from Pier 1 seem to burn poorly. I will not buy any from there in the future. But then again, perhaps it is Louisville's great altitude above sea level...

Got all the nmap stuff working in my threaded booter. Cool stuff!

Tried to import boost into my project today so that I could start using GGCL and a cool progress meter class that they have, but I was sadly disappointed in the usability aspects of it. For one thing, it extracted itself in ".", not in a separate subdirectory. Then there is no README or INSTALL files, no Makefiles, no configure, no nothing. Just a bunch of files and you're left to figure out how and what to use. Disappointing.

I started a rant about this on the boost list, and one guy is being somewhat silly. I decided to wait a few hours before responding again just so that I don't really start slamming him; I am new on the list, after all.

I watched the Voyager season premier tonight. Good stuff. Left some hooks for later in the season, too. Could be very interesting --
this is the last season, after all.

Brian reminded me that I totally forgot to put the XMPI hooks into LAM. Doh. So I spent an hour or two on that tonight. Adding a single function in LAM requires many things:

A new file in share/mpi with the body of the function

Modify share/mpi/Makefile.am to add the new file

A new fortran binding for the function in its own file in share/mpi/f77

Modify share/mpi/f77/Makefile.am to add the new file

If adding profiling versions of the function, add entries in share/pmpi/Makefile.am and share/pmpi/f77/Makefile.am

Add a new "block" type (essentially an enum for that function) in share/include/blktype.h; shift the hiwater block type up to accommodate the new function

Add a new string for that enum in share/etc/blktype.c

Add the appropriate prototype in share/include/mpi.h

Add the four name #defines in share/include/MPISYS.F (eight if doing profiling versions of the functions)

Write a man page for the function in its file in share/mpi

It's off to South Bend in the early AM tomorrow. Miles to drive after I sleep...

October 7, 2000

Caffene-free Microsoft

Didn't get a lot done research-wise today, but it still seemed like a good day.

I made some progress in LAM; cleaned up a little code, made a fix that a helpful LAM user suggested, etc. We currently don't have a hope of compiling LAM with a C++ compiler -- it was originally written with pre-ANSI function declarations. As such, there are still billions of them throughout the code, and it would take a long time to convert them all the real ANSI declarations (which C++ compilers require). Don't quite know what to do here -- it doesn't seem like it would be easy to write a scripty-foo to automagically convert everything... Harumph.

Talked with Jeremy about boost things; reorganizing the directory tree, a potential build process, etc. I sent our ideas to a guy on the boost list who I was discussing this stuff with. He replied, but I haven't had time to look over what he said yet.

Talked with Arun about LAM progress. Seems like it is going well, but annoying mid-terms will halt its progress for about a week. Similarly with Brian and XMPI.

Went to Larry Augustin's talk today. No real shockers in his talk
-- we've heard most of it before (open source will save the world, etc., etc.), but it wasn't a bad speech, I suppose. Others didn't like it at all. Oh well.

Had to make a command decision on the SC2000 paraphernalia today --
the company couldn't do beach balls in the time that we needed them. :-( So we opted for footballs; we'll see if they can do those in time.

Arun and I listened to "Slut" for several hours this evening. Wonderful. The song is not what you would expect at all -- it's quite hauntingly beautiful. I suppose that my image of the song would be shattered if I actually listened to the lyrics and found out that it's some kind of pig-worship satan song or something. It's amazing how I could listen to that song on repeat for hours on end and not be able to tell you a single word of what they were singing. It's that good.

I opened up the LAM/MPI CVS tree for anonymous read-only access tonight. We'll see if people actually check it out...

October 11, 2000

A reddish green

We're giving out cool freebies at SC'2000. The orders went in this morning:

The pocketknives got nixed. With extreme prejudice.

500 LAM LED-light keychains. They'll be translucent blue and have a white LAM logo on them.

900 mouse pads (I don't know what the hell we're going to do with the extras -- having 900 mouse pads in one place just sounds like an inherently dangerous operation. Are there FCC rules against that?). They're all LAM/MPI mouse pads, with the LAM logo and URL in the top right, "Dept of CSE/ND" propaganda (phone, fax, URL) across the bottom (Kogge paid for it all, after all), and a bunch of MPI function bindings across the majority of the surface area.

The cool thing is that we've got three flavors mouse pads (300 each):

C

Fortran

C++

That is, they vary in the language of the bindings that are on displayed on the mouse pad. We're actually predicting that the fortrans go much faster than the C or C++ ones.

Anyway, it's all cool stuff (mainly working on the assumption that if it's free, it's gotta be cool). Should be a fun time down at SC'2000. A picture of our booth is available at http://www.indiana.edu/~rindiana/. A map of where we'll be located on the show floor is at http://www.sc2000.org/exhibits/floor.htm (scroll down to the bottom -- we're a purple booth, number R701).

October 13, 2000

Fuzzy ethernet

Some food for thought.

PBS is just plain sucking. It's unfortunately been flakey ever since we upgraded it. :-( I did find a bug in our AFS/PBS shepherding code a few days ago that resulted in tokens being allowed to expire during PBS jobs that ran longer than the length of your initial token (which I think it defaulted to 10 hours, regardless of what your real default is), but that was our fault, not PBS's.

Yesterday, there was one job that was "stuck" in the queue and wouldn't die. The job was long done and gone, but PBS thinks that it's in an illegal state, and won't let it leave the queue. Hence, the node that that job was on wasn't released. Today, there are many more jobs like that (but those jobs are still running). I have no idea what the problem is, and I'm kinda annoyed.

We asked again for PBSPro (i.e., the commercial version) -- we first asked about 3-4 weeks ago -- and the PBS guys replied that it was taking them longer than they thought to setup their online store (even though PBSPro is free for educational users). :-( I'm kinda hoping that PBSPro will fix some of this flakiness that we've been seeing. :-(

Rusty from Argonne was here yesterday. His talk was good; I'd seen most of the material before, but it was good stuff anyway. We had good chats with him about optimizing MPI collectives (there are some really cool algorithms for this out there..), the future of LAM and MPICH, MPICH's Abstract Device Interface (version 3), my threaded booter (I gave him a copy of it, too), MPICH's mpd, etc. We had dinner at the Lumsdaine Grill, because Someone forgot to get a babysitter so that we could go to the LaSalle Grill. Ah well -- it was a good home-cooked meal, so I shouldn't complain. :-)

I downloaded the ADI-3 document, and it's huge! Compared to the spartan RPI (request progression interface) approach in LAM, ADI3 is a gargantuan.

I just noticed a post on the Beowulf list -- someone posted LAM vs. MPI/Pro (a commercial MPI) vs. MPICH results. The TCP numbers are clearly in LAM's favor. This, obviously, is because LAM rocks. However, MPI/Pro and MPICH have VIA results (which are obviously better than TCP results)... we need a VIA device... You see the results for yourself. LAM ROCKS!!!.

I've been working on IMPI stuff this week. I got the IMPI attributes on communicators working (i.e., on MPI_COMM_WORLD -- since we don't do anything other and MPI_COMM_WORLD yet, we don't have to maintain these attributes on other communicators, which would take some additional bookkeeping, because relative rank order can change, etc., etc.). I also got MPI_Bcast working in fairly short order.

I noticed a good number of typos and one inconsistency in the IMPI standard. Hence, I am proud to say that I am personally responsible for every item in the IMPI errata document. Well, ok, I only helped discover the first one (an issue with the protocol hiwater/ackmark values), but I still had a hand in it.

This is all for the SC'2000 IMPI demo with HP and MPI/Pro -- we're going to run a GUI Mandelbrot program across all three MPI implementations. Should be pretty cool, actually. We had our second teleconference today, and things appear to be going well. We plan to test the stuff across the internet next week. HP and MPI/Pro have been using LAM to test their IMPI implementations. I gave them instructions for CVS access today, so that they can get the MPI_Bcast and color stuff.

I just can't help it -- LAM ROCKS!.

Seriously, though, it is very cool to be working on a project that matters. That is, LAM is probably only used by a few thousand people around the world (at most), but there are many devoted fans who use it every day. Indeed, many people's software relies on ours to function properly -- much real-world depends on what I do in LAM to function properly. It's very cool.

The level of responsibility can be a bit scary at times (indeed, I remember the first time that I noticed a .mil site downloading LAM; I told Lummy about it, and he just smiled and said, "sleep tight!"). Real world stuff uses my code. Hence, if I fuck up, Bad Things can happen. For example, I know for a fact that companies like GE and Exxon use LAM/MPI.

But isn't this the level of responsibility that a good engineer should embrace? I think so. Being Careful about what you do is not just a state of mind, it is a way of life.

Saw a talk from Vince's advisor today about link-time optimizations. Interesting stuff. Similar to things that are available in Solaris (e.g., -O5, where multiple runs generate profiling feedback data that speed up subsequent runs), but it was neat to hear how it works. He was using it in conjunction with MPICH, so I set him straight in his ways -- since they're using TCP/IP, if they really want asynchronous message passing, they should use LAM since we can do it (via the lamd mode, which has its own tradeoffs -- the asynchronous message passing mode isn't free, so to speak).

He sounded intrigued, and said that he would get the latest version of LAM and give it a whirl. And so we progress, one user at a time, towards world domination...

Well, ND's network is going to start shutting down for maintenance in about 15 minutes, so I'm outta here. Next journal entry will be from home.

October 14, 2000

Calamari airlines

ND vs. Navy -- finally a fun game to watch.

Aside from two big mistakes the defense made late in the game (and to be fair, it was at least our second- or third-string who don't have too much experience), we dominated the game. Those are what I like: boring and dominating. This is the whole reason that the wave cheer was invented -- the fans need something else to do to occupy their time.

But CBS's coverage of college football really sucks. They don't get good angles, their camera operators get faked out and don't follow the football, they rarely show replays (even on penalties). And their announcers talk more about anecdotes than about the game that is being played. They suck.

NBC's games take forever, but you get the whole nine yards (hah!) with them -- tons of replays, game strategy speculation, etc., etc.

In other news, ND's network seemed to come back up without much of a hitch. I was on briefly at about a quarter of one this morning and it was back. And the latency from squyres.com to nd.edu seems to be a lot better (granted, there's no students on campus right now, so traffic in and out of nd.edu is probably pretty low. But at least I'll probably have good connectivity for the next week. :-)

October 15, 2000

Clairvoyance and Corn Flakes: Coincidence or Fate?

Last night, Tracy and I went to see a local production of Dracula. I'm a big fan of theater, especially after having done a bunch of productions in stage crew in both high school and undergrad college. The production was actually quite good -- it was theater in the round, with a fully-functional single set.

The technical setup was actually quite impressive (being an engineer and an ex-stage crew type, I tend to notice these things). I couldn't find the control room, for example -- it was that well hidden. Or perhaps the control room was distant from the actual production area, and the techies watch by video (I'm guessing here, but that would be a pretty cool setup).

This production had a few extra twists that separated it from others that I have seen. For example, Lucy had a female friend, Nina, who died before she did. Nina came back as a vampire and started attacking children around London.

Props to a bunch of the special effects, too:

Some various pyro, bangs, pops, flashes, etc.

Using deep sustained bass noise, very hard to hear -- the kind of sound that you subtly feel rather than hear -- that created a feeling of dread and fear. Very cool.

The professor killed Nina with a wooden stake through the heart while she was sleeping in her coffin. Since it was theater in the round, it actually happened right below me -- not 10 feet away. The stake actually appeared to go into Nina, and blood squirted everywhere. Again, very, very cool. That alone made the price of admission well worth it -- who wouldn't pay to see a beautiful vampire seductresss screaming in the throes of death, with blood squirting everywhere?

Once or twice, there a character had a sudden moment of clarity and realization. The clock in the corner of the study suddenly got very loud (tick, tick, tick), as if the focus of the world suddenly got very narrow. And then the ticks got subtly farther apart
-- creating the illusion of slowing down time, and heightening fear.

Dracula "disappeared" at one point by means of what I assume was a hydrolic trapdoor in the floor of the stage (I caught a glimpse of it). He was surrounded by a cloak, which suddenly fell to the floor, and he was no longer in it (having been in theater for a while, I was proud of myself for anticipating the classic misdirection designed to make you look away from him for a second while his head disappeared downward -- no one else that I was with noticed it). Most excellent.

In the final scene, where they drive a stake through Dracula's heart while he's sleeping in his coffin (more blood squirting everywhere -- yummy), they kill him, and then close the coffin. A few seconds later, his hand pops through the top of the coffin in a feeble attempt to strangle the professor, who successfully evades his grasp. Seconds later, they open the coffin again to really kill Dracula, but all that is there is a skeleton. Cool!

All in all, a good production. The actress who played the maid was a little weak, but the badass transformation of Count Dracula to a Vampire (multiple times, too!) made up for it.

The Director's Cut of the movie The Abyss was on TV tonight; I hadn't seen it in quite a while. Most people aren't aware that there is a 10-15 minute sequence at the end that was chopped from the version that was released. It was all about war and violence in the human race (a sort of commentary on today's society), and how the water people almost killed everyone on the planet with enormous tidal waves. With this sequence, much more of the movie makes sense.

I'd advise renting it to those who haven't seen it -- I'll give it a rating of 10 minutes.

We finally finished all of our thank-you notes from the wedding today. Woo hoo! We had gin and tonics in the excellent ND drink glasses that Brian/Arun gave us.

And speaking of alcohol... I think Arun's proclamation of not drinking until Momar's 40th anniversary is a sham!! He admitted in his journal that he had Kalua pancakes, and later had One Enormous SuperPankake with some kind of flavored liqueur in it.

Hence, I think Arun's thin guise of "not drinking" has fallen away
-- we now see him for the closet alcoholic that he is. Was it really "Sprite" that he was drinking all Sophomore year (by the gallon, I might add)? Does he really like "water" and "Dr. Pepper" that much? I think not, gentle readers. Yes, it's true
-- Arun was even kicked out of the 1996 Olympics (Bulgarian all around gymnastics team), for his excessive indulgence in what he called "pixie sticks", and "Mr. Pibb". Said Mr. Rodrigues at the time, "I just love pixie sticks and Mr. Pibb. Don't knock it until you've tried it! Now don't bother me -- I've got to go practice my Triple Lindie."

(...catch the rest of this exclusive story in a special expose section in this week's National Enquirer)

My fricken' router has frozen 5 times tonight. Destroyed a good uptime, too. It seems that one of the NICs is getting overloaded (I'm trying to ftp/scp/whatever 4.5GB from my router to my desktop, which hangs the machine after a while). Sucks!! I don't quite know what to do about this yet -- I need to get that data over to my desktop so that I can burn a CD of it. Arf!

In other linux woes, during one of my router crashes this evening, it caused the xmms on my desktop machine to freeze. So I did a "ps" to kill it. I found no less than 662 copies of xmms running. No joke.

My desktop has an uptime of over 37 days, and I've been logged in to a single KDE session for probably over half of that time. I guess there's some kind of leak in xmms that's causing that to happen. Weirdness. For example, I see that there are already 11 copies on my desktop now.

Some testing shows that a new one appears every time a new song starts. I'll bet that they are terminated-but-not-reaped threads (remember: linux emulates threads with duplicated processes). <sigh> Open source software can suck sometimes. :-(

Did some LAM work today. Turns out that I was a bit sloppy and checked some crap back into CVS that didn't work. Oops. :-( Caused Arun a bit of pain, too. Double oops. :-(

But it's fixed now -- it compiles (and seems to work) with and without IMPI support. I also added some stuff for XMPI to drop communicator name traces during MPI_Init for MPI_COMM_WORLD, MPI_COMM_SELF, and MPI_COMM_PARENT (if it exists). I added man pages for MPI_*_set_name and MPI_*_get_name, too, just for good measure. I've got to finish the IMPI extensions to MPI_Reduce tomorrow.

Found a new "hauntingly beautiful" song today. It's not quite "Slut", but it might be close. It's Tori Amos' "Carnival", from the MI-2 soundtrack. I've put it on repeat, but my router (which streams my MP3s to me) has been rebooting, so I haven't heard it continuously enough yet. I'll keep you posted.

October 23, 2000

You insult me.. and of course, my cane.

It's been a few days since I did a journal entry, mainly because I've been traveling. Let's catch up...

Left on Friday night to go to Chicago. Tracy and I flew Southwest from Louisville to Midway. Flying Southwest is an interesting experience. It's a cross between the best of "People's Express" (where you sat on milk crates in the hold, but they were damn cheap tickets) and the Orient Express (there's some really shady people on there, and most people don't speak English). Got to Midway around 8pm, picked up our Avis car, and drove to Jill's.

Seeing Jill was great -- Jill owns her own condo on the north side, right near the lake. We had dinner and caught up with Jill, which was much fun. The next day, we walked along Lake Michigan (very cool) and went to the Hogshead Bar to watch the ND vs. West Virginia game. The game itself was kinda sloppy; we had moments of brilliance, but all told, the final score didn't tell the story of the game. We won, but save a few critical plays, WVa almost beat us.

I randomly ran into some people that I knew at the Hogshead -- two of my old roommates, Mike and Brian (it was good to catch up with them), and an old CS grad named Dan (journal policy not to put in last names to protect the not-so-innocent). He works at a .com in Chicago called www.ubid.com. We chatted about that for a while. His brother is a froshy at ND, and is thinking about CS. Good for him!

After the game, Jill and Tracy and I ran to Marshall Fields to pick up a wedding gift for the reception that Tracy and I were going to that night (stoopid Marshall Fields -- they don't have their wedding registry online yet!!). Tracy and I raced up to Lincolnshire for the reception (the wedding was about a month ago, in Italy) and made it pretty much just in time.

It was fun -- I didn't know anyone (it was one of Tracy's co-workers who got married), but we saw a bunch of GE people that Tracy knew, and they were nice folk. We had a good bunch of laughs, and a good time was had by all. Hell, the booze was free -- how can you go wrong?

By the end of the evening, however, my ears hurt from the music. They had a live band, and they were actually pretty good -- it was a Benny Goodman orchestra-style band, but played all kinds of music. Their singers were quite good, and very lively (dancing on the dance floor while singing, etc., etc.). They even had a mixer boy, but I came to hate him because I saw him keep edging up the "master volume" slider. Bastard. I hope that his MPI programs rot in hell.

We flew back Sunday morning and got back here around noon. I did e-mail but was otherwise uninspired to do any work, so I lazed around and watched TV. A good Sunday. :-)

Bandwidth to nd.edu is sucking again. Well, it's not sucking, but it's certainly not nearly as nice as it was during break last week. For example, streaming MP3s from nd.edu to squyres.com is pausing all the time. Icky.

After having been gone for the weekend, I am shocked to discover that my Mojo level has fallen to about 850,000 (it was about 980,000 when I left). This amazes me -- I left my mojo server running all weekend, but I personally did nothing with it all weekend, and yet somewhere in there I spent about 130,000 mojo. How could that happen?

That's not the whole story, of course -- I do have about 100,000 mojo "coming in" (when people spend mojo with you, it doesn't necessarily come in right away; there's a credit system for totals up to 10,000 mojo -- see http://www.mojonation.net/ for more details), so I actually didn't lose all that much -- but it still seems wrong. That is, I have mojo going out at a much higher rate than it is coming in!

I hope that it's just still bugs in the system. It doesn't take an accountant to realize that even though my consolidated total isn't much less than when I started, you can't spend what you don't have, so if mojo [actually] is going out faster than it is [actually] coming in, you're screwed!

Did some more research into DSL for my church. They want to get DSL for the following reasons:

They have 3 separate computing resources right now that they want to consolidate into one bill:

The Youth Center, which is physically distant from the church's main administrative offices, uses e-mail, and has a $9.95/month Juno account.

The main admin offices have an AOL account at something like
$21.99/month, with 7 e-mail accounts.

They have a web site that's hosted at a local company for something like $19.99/month.

This comes to a total of something like $42/month. DSL will at least double that, but there are other factors as well...

They only have a total of 8 e-mail accounts, but have at least 12-15 people who need e-mail. Hence, they're maxed out right now, and need to expand.

They only have so many phone lines at the church; when people are on the phone for e-mail or web, that's one (or more) phone lines that can't be used for regular business.

And actually, the admin offices are already wired on a LAN, so they're pretty well setup. After some preliminary investigation, prices in this area for 192kbps/SDSL (the church is technically considered a business, so they can't get the cheaper residential rates) are between $100-120/month.

Still need to contact a few more vendors (I'm doing it during lengthy compiles and/or network transfers nd.edu<-->squyres.com) to get some more options. It's not just the base bandwidth that they charge for
-- they all have different services in terms of number of mailboxes offered (for free), how much web space they offer, whether there's a dialup line (for the Youth Center), Etc., etc.

WHOO HOOO!!!

My boss from my army unit just e-mailed me -- he got me a tentative position in Army high performance computing; apparently I'll be in the "hacking" group. This could be interesting!

This is just the results from a few preliminary meetings that he has had with a group (in Aberdeen, MD, I think). We'll see where it goes.

But it least it looks like I won't be forced to go back to be a signal platoon leader somewhere. Whoo hoo!!

I've changed my "Dissertation" topic on the journal to "Technical", because I find that most of the "Dissertation" stuff that I send is only sometimes related to my dissertation work. Most often, it's just some techincal stuff that may or may not be related to my dissertation, or anything at all, for that matter.

There's enough bad vibes in here to run a Vodoo factory

I did much work on IMPI today.

Lesson for the wise: never write/debug parallel programs with only two nodes. Always use at least three. Three is probably better than four, actually, if your program has to work for all general cases.

I already knew this, but I discovered it again the hard way today. I'm working with HP and MPI Software Technology on our IMPI demo for SC'2000; I thought that I pretty much had LAM ready to go on Friday. Today, I tried it with three clients (instead of just two, up in nd.edu) -- i.e., two clients in nd.edu and a client down here in squyres.com for a local display (the demo is a GUI plot of the Mandelbrot set --
the plotting is calculated in parallel, and the results are sent to the display master to be shown on X).

Everything worked great with two clients, but started barfing horribly with three clients. Ugh! I had to go around and fix all the places where I had made bad assumptions and whatnot.

So, kids, please don't program in parallel with just two nodes --
always have adult supervision and use three, four, or two hundred nodes.

It didn't help that there were actually other bugs in the demo code that we're supposed to run (the parallel Mandelbrot stuff was originally written by the MPICH guys and then modified by the NIST folks for specific purposes of the IMPI SC'2000 demo). I found at least two bugs today (remember: broadcasting pointer values across multiple architectures is meaningless) -- possibly more, but I think I've blocked them from my memory to prevent further trauma.

I also had a few bugs left in LAM -- the code for calculating host and client colors and sizes looked like a Darwinian experiment gone horribly wrong. I had to evolve that code into something better and greater -- to make it more than the sum of its parts. Now, it rocks with the rest of LAM.

I just can't help it -- LAM rocks.
It all seems to be working now. It's happily checked back into CVS, and hopefully I'll be done with that for a while...

Conversed with a guy at GE Aircraft Engines today. They're using LAM for somethingorother. He asked for a good feature on Friday (see his post on the LAM list); so I moved our discussion off the list and we'll iterate through a few things trying to get it right.

In related news, GE acquired Honeywell today. And "Just Jack" will stay on as CEO for an additional several months (he was going to retire next April, IIRC) until the end of 2001. You just can't go wrong with "Just Jack".

Past present participle future improbably never tense

(this is a few days old -- I started it before last weekend. So take all present tense to be past tense)

Learned some wisdom today. It was painful, so I'm going to share in the hope that others may save some time...

On the eternal quest to have "proper" Makefiles, we had quite an elaborate setup for dependencies in LAM/MPI (the automake stuff for generating dependencies is broken for non-GNU make). The only problem was, it didn't work for VPATH builds. We were somehow under the mistaken impression that you didn't need make depend in VPATH builds.

Sidenote: For those of you unfamiliar with VPATH builds, it's a slight variation on the GNU standard "./configure ; make all install" Scheme of Doing Things. It allows you to use one source code tree to build multiple binary trees. i.e., you download a random tarball, expand it to its source tree, and then run "./configure ; make all install" multiple times simultaneously. What's the benefit? For building on multiple architectures, and/or with different configure options, of course! If you think about it, this is a really handy mechanism.

It works like this (I slightly lied above): you expand the tarball, and make a new directory to build in. And then run configure (and make) from that new directory. For example:

(The final "gmake" is necessary because Sun's native make isn't VPATH enabled)

Hence, you can have multiple of these puppies running simultaneously, all from the same code tree. This is really handy in development, too, when you need to test on multiple architectures simultaneously.

But now I see the error of my ways (it took developing on Solaris and Linux simultaneously with the same code base to show me this piece of wisdom). Hence, I set about to make our depend target work properly for VPATH and non-VPATH builds. Easier said than done.

Although I already knew this, I have finally and firmly decided that make's rules for syntax (particularly quoting) SUCK.
We use the GNU tools automake and libtool to build LAM/MPI (the use of libtool doesn't actually matter here, I just wanted to use it to mention our sponsors -- buy GE products today). Now previous journal entries have shown how automake can be your friend, but automake can also be your enemy (very similar to power tools, in this respect). This journal entry has nothing to do with automake (buy GE appliances).

In our automake setup, we include a top-level Makefile.depend file that has our "depend" target. It was fairly lengthy and involved, and it applied to the whole tree, so this made sense.

For an hour or two, I tried to make it do VPATH stuff properly. This involved the following:

Getting the source file list

Running makedepend on all of the source files

Sounds pretty simple, eh? Not so, gentle reader, not so. Here's why:

First off, GNU make sucks. I don't know if this is a documented "feature" or not, but it certainly makes no sense to me. So when you have a list of source files (e.g, "BLAH = foo.c bar.c baz.c"), GNU make happily prefixes each of them with the VPATH for you.

Whoo hoo! This saves a lot of trouble of doing it manually. After all, none of the source code files are actually in this directory -- we have to add some kind of prefix to get to each of them.

However -- closer examination reveals gmake's suckage. The last file in the list does not get the VPATH prefix applied! Why? I have no idea. But it pretty much fucks up the whole scheme -- it's pretty useless to get all but the last one.

It's not ok if you only get five chicken McNuggets when you order the six-piece combo at the drive through. Heck, no. You get all six or its throwdown time.

As such, I had to write code to a) strip off the VPATH prefix from each entry (if it was there), and then b) add it back on to every entry. Not that this was extraordinarily difficult (but escaping the sed expressions in the Makefilewas a bitch...), but I shouldn't have had to do this.

With the re-VPATH-prefixed list of source files, you can run makedepend. But oops, it barfs. It seems that it can't find the file lam_config.h. Arrgh -- that's the one that configure generated via autoheader. It seems that automake isn't smart enough to add -I$(top_builddir)/share/include to CFLAGS --
it adds -I$(srcdir)/share/include instead. What the hell is the point of that?

(translation: automake is adding a -I for the source tree, not the build tree. But the config .h is always put in the build tree -- not the source tree. So I'm not quite sure what the logic is here)

So we have to manually add the -I for the build tree. Not nice -- we shouldn't have to do this -- but very easy to do, so move on.

All the dependency entries are for "VPATH/foo.o", and "VPATH/bar.o", etc. instead of "foo.o" and "bar.o". That is, we're building foo.o, not ../../foo-1.0.3/src/foo.o. Hence, the Makefile has to show the right dependency.

CRAP.

So we have to add some more sed mojo to post-parse the Makefile and strip out the VPATH prefixes from the generated dependencies.

Ok, run again. Seems to work this time. Let's try it on the whole source tree...

Barf-o-rama. One of the source directories in LAM has almost 250 source files in it. Adding "../../lam-6.3.3b44/share/mpi" to every entry in the list quickly overflowed the shell's buffer for a single variable. Hence, it just dropped all the additional filenames.

So I had to add a loop around the file list to only process about 20-25 at a time. <sigh> This really became painful at some point; I hurt.

Trying once more... #@$%@#$%@#$%@#!!!!!

Since we're running makedepend multiple times, it only saves the output of the last run in the generated Makefile. Hence, it saves the dependencies of the last 20 or so files; all the previous dependencies are snipped each time makedepend runs.

Luckily, makedepend has a -f option to specify where to send the output, so we can save it in a temp file and tack on successive results to the end of the Makefile.

Try again.... <sigh> Still no love.

Now it's not ditching the previous results at all. Since makedepend isn't running on the main Makefile, it doesn't snip the previous dependencies. Hence, we have to do it ourselves. Redirect some input to ed to snip out all lines after "# DO NOT DELETE" (seems pretty ironic, doesn't it?) and catenate the new results on after that.

Finally... it works.

That whole process actually took quite a while -- adding additional quoting for make (especially in the sed expressions) made it arbitrarily difficult. So somewhere near the end, I said fuck it, and moved the whole thing off to a bourne shell script. It actually became much easier at that point -- I should have done that much earlier. The depend target actually became pretty small at that point; it just calls that script with a small number of arguments followed by the list of files (also as command line arguments to prevent single-shell-variable-overflows).

The moral of the story: it works now. It works for VPATH, it works for non-VPATH. If you want the script, LAM's anonymous CVS access
-- it's config/run_makedepend. The depend target itself is in the top-level directory, a file named Makefile.depend.
Save the planet: reuse code. Feel free to steal/improve this depend target. Your country depends on it.

October 25, 2000

I love Kung Fu movies...

Some quickies...

Dad got the "LoveLetter" virus on all his 'doze machines at the store yesterday (it spread itself via mounted drives and went rampant across three machines). Viruses suck; it automatically overwrite all .jpg and .vbs files on all three machines. It's not quite clear where it initially came from, either. Dad had up-to-date virus protection, but he had an older version of Norton AntiVirus, and it wasn't automatically checking e-mails, so I suspect that this is where it came from. <sigh>

Possibly going to see "Rent" with Janna in December. That should be fun. I saw it in London, and laughed uproariously at "You can take the girl out of [New] Jersey, but you can't take the [New] Jersey out of the girl." Being from Philadelphia, this is enormously funny to me (we make fun of New Jeresians all the time). But it's apparently an American joke, because no one else laughed.

And old ROTC cadet of mine (Trent) is now out of the military and working at GE Appliances. Small world.

Not sure if I'm going up to ND this weekend or not; should know by the end of the day.

The HP guy (CQ) found some bugs in my IMPI code for synchronous sends. Ugh. This is proving troublesome to track down...

The motherboard/PROM on the Airmics mail server is fried; it is crashing multiple times a day. Suckage. They trying very hard to get the new server setup, but it just takes time...

October 29, 2000

There's no private property the LSC!

Many days, no journal entry. The usually nemesis is at fault: traveling.

I've been up here at ND for the latter half of this week. Mainly for SC2000 coordination (the freebie mouse pads arrived way early. Yay!) and other miscellaneous tasks. I also made my famous "hockey puck" chocolate chip cookies this week for the efforts of the Engineering Graphics department (ok, mainly because Joanne from EG said that I owed them cookies for their efforts). For the uninitiated, it is widely known that I make the World's Best Chocolate Chip cookies. They're roughly the size and shape of hockey pucks (hence, the name); none of these twice-the-diameter-of-a-nickle and paper thin kinds of chocolate chip cookies for me. Hell no. Soft and chewy in the middle with a 1lb pound bag of chocolate chips in the mix just "so that there should be enough". One of these cookies can serve as a meal. A double batch made 12 cookies this week.

Anyway, that all went well, and we finished up our virtual posters for SC2000. I had to use some evil powerpoint animations in them, but they'll be ok. We still need a result graph from LAM/myrinet (more on this below) for the slides, but everything else is finished.

Sidenote: Myrinet is a proprietary network that runs at gigabit speeds. i.e., orders of magnitude faster than 100Mbps ethernet. You can run TCP/IP over Myrinet -- they provide a driver for it -- but it's at a significant cost in performance over "native" communication over the Myrinet hardware. "Native" communication is provided though a library called "gm". Hence, we're adding a "native gm driver" to LAM to utilize this ultra-fast communications over Myrinet in LAM directly, rather than relying on TCP/IP over Myrinet. This is what Arun has been working on since the beginning of the summer. We want to have [at least] a beta of this stuff working to show off at SC2000.

Arun and I tried to make a result graph for LAM/gm -- just a basic one showing "TCP over Myrinet is good, but gm over myrinet is better!", but unexpectedly got bad seg faults and couldn't produce anything. This generated the rest of my Friday evening, and most of Saturday morning.

Before all we could launch into extensive debugging, though, we had to otherwise finish up the slides. Got some good slides for LAM/gm (Arun), XMPI (Brian), and IMPI (me). After everyone else had left, Lummy wandered in (while I was still working on the slides; perhaps 6:30pm or so). Had a long chat with him about the future of LAM and whatnot. It was especially interesting with the prospects of MPICH's going through an entire re-write (with the focus on their ADI-3 work now -- already a 70+ page document!). MPICH 1.2.1 is probably pretty near the end of the line for that code base; MPICH 2.0 will probably have some elements stolen from MPICH 1.2.x, but will likely be mostly from scratch. This is really cool stuff, actually.

I spent the rest of the night upgrading the version of GM that we had. We reported what appeared to be a bug in gm to Bob -- one of the authors (a very helpful guy, actually), and he said, "you're using a really old version of gm -- you should upgrade and see if the problem just goes away)". Ugh. How embarrassing! Turns out we were using gm-1.1.3, and the latest is gm-1.2.3. Oops.

myri.com is apparently connected to the world through a 300 baud modem; it took about an hour to download the 1.2.3 tarball (only a few megs). It took a few tries to get it installed properly -- we have really old Myrinet hardware (probably a few generations behind current stuff). Myrinet utilizes a kernel module in Solaris, so you have to take some care to build and install it properly. And compiling on the Solaris 2.5.1 140Mhz machines is just painfully slow. Ugh.

So I finally got everything up and running around 11pm or midnight. I ran some test programs, and finally decided that everything was working properly. Then I ran a simple test program through bcheck. Badness. Lots of "read from uninitialized" errors from within libgm itself. Crap!!

After a lot of source diving in libgm, I determined that the problem was a buffer that was supposedly being initialized by an ioctl() call into the gm kernel module. The upper libgm was providing the buffer and expecting the lower kernel module to fill it in. It took a lot of hacking around and source tracing in workshop to absolutely verify that the lower kernel module was, indeed, filling that buffer properly, but it remained a mystery to me as to why bcheck would think that the buffer was uninitialized. Worse yet, sometimes bcheck reported that everything was fine -- no read-from-uninitialized. Hmm.

Hesisenbugs suck.

It didn't occur to me until Saturday morning that bcheck couldn't possibly know that the buffer was filled -- bcheck only monitors the process under debug; it doesn't monitor the kernel module at all. So it makes perfect since that while the buffer is initialized by the kernel module, bcheck simply has no knowledge of it, hence, it reports it as uninitialized when upper libgm reads from it. Although this doesn't explain why bcheck sometimes reported that all was well, I'm 99% sure that this is what is happening. Bob later confirmed my suspicions, too.

Hence, I [effectively] added a memset() to the upper libgm code, and bcheck finally only reported Truly Bad Things --
similar to what we had to do in LAM when we know that uninitialized buffers are ok ("when you optimize code, all coding guidelines and rules are out the window, and painfully splattered on the ground below").

I then set about trying to debug a simpler example than NetPIPE (which a de facto MPI latency/bandwidth benchmark program) -- the program that we were trying to use to get some result graphs for LAM/gm. I made a simple "hello world" ping pong MPI program, and tried to debug that. Arun came in around 11am or so, and we set about stepping through the internals of the gm progression engine inside LAM. Not for the meek.

It's good that Arun came, 'cause he wrote the stuff, and I wasn't completely familiar with it (indeed, I had only seen the internals once before -- when we had a code review about a month or so ago). So his explanations and rationale were quite helpful. We finally tracked down a repeatable kind of error in the simple ping-pong program, but then had to leave for the football game.

ND vs. Air Force. Wow. A real nail-biter, there. I can't believe that we won. It's horrible to say, but our offense really did not look good at all during the game. We had one decent drive, and it was full of 3rd and longs. The rest of our points were off lucky Big plays and the like. :-( Granted, I was in the stadium and didn't have the benefit of instant replay and the like, but it didn't look pretty from the student section.

Our defense was kinda shaky, too. We had some great stops a few times -- held them to 3 points at least twice, for example, a blocked field goal (which put us in overtime -- and later gave us the game), etc. But they were able to throw all over us all day. Our pass defense was just not good.

But in overtime (!) we managed to win the game. Air Force went first and we held them to 3 points. We then came back and got a touchdown, putting the final score at 34-31. Amazing. It's our first overtime victory -- we were previously 0 for 3 in OT.

Some other random points about the game:

Great flyby from 3 F-16s (or F-18s...?) during halftime. Well timed, and it was lead by some 1LT who graduated from ND in '97.

There was some woman behind us who was clearly visiting some friends here at ND. Whenever she opened her mouth, stupid came out. Some memorable quotes:

(during the band's halftime tribute to the military, where there were various military people on the field with the band, the American flag and the flags of all four services were flying on the field) "Is this some kind of Halloween thing?"

"I just love that Leprechaun guy! I just wanna scoop him up and hug him!"

"So they're not really downs, are they? They're attempts at downs, right? So why does everyone call them downs?"

Saw Tony and some other JeffJournal fans after the game. Felt kinda silly, because I didn't recognize Tony right away (it's the beard! I swear it!) -- duh. But then later, I realized that I really hadn't seen Tony since last spring, and I felt [somewhat] better. :-)

Tracy and I went to see Pay it Forward afterwards. Not a bad flick. Not quite as complicated and intricate as I had hoped, but still not bad. so I think I give it an official vote of "sympathy".

This morning... back in the lab, and I think I've narrowed the problem down in LAM/gm to an unexpected receive. A pointer is not getting reset properly in the gm progression engine, and when an unexpected receive (definition below) comes in, a linked list is attempted to be searched for a request that is no longer valid (and has actually already been freed). Hence, sometimes it works, and sometimes it doesn't.

I suspect that this is just an error from the "translation" of the TCP engine to gm. i.e., we literally copied the TCP progression engine and gm-ized it; I suspect that this bug is just an error in the gm-iziation process. Hopefully, this will be the last Big Bug...

An "unexpected message" is one of the Big Concepts for MPI implementors. It is possible that a user does a send from one rank before doing a receive on the target rank. Hence, the message may actually arrive at the target before the necessary bookkeeping has occurred to setup to receive that message. Hence, when the target gets such a message, it files the message in the "unexpected" queue. When the matching receive is finally posted, it first checks the unexpected queue to see if the message has actually already arrived before going to actually check the message passing hardware for the message. There's a lot more to it than that, of course, but that's the gist of it.

November 1, 2000

But it my leadership that got you in that dress

This is prep-week for SC2000 -- so most entries are likely to be technical. Deal.

The boys from HP have done it again -- they found a rather gaping hole in my IMPI implementation in LAM. Doh!

Quick explanation:

When a "long" message is sent across IMPI boundaries (where "long" messages are defined as longer than an agreed-upon number of bytes), it is broken up into 2 or more packets, where packets have a previously-agreed-upon maximum length. The first packet of a long message is sent "eagerly" (i.e., right away), and is marked as "first of a long" -- it is called a DATASYNC packet. When the receiver gets a DATASYNC, it allocates enough space for the whole message, does some other bookkeeping, and then sends back a SYNCACK telling the sender "go ahead and send the rest of the packets; I'm ready."

Messages in MPI are identified by the communicator that they are on (essentially, a unique communications space) and the tag that they use (a user-specified integer that distinguishes between messages). Messages that have the same source, destination, communicator, and tag (for purposes of this discussion), have the same signature -- meaning that multiple messages with these same characteristics would be judged by MPI to be the matching messages.

When you send a message in MPI, you have to receive with the same signature. Hence, the signature of a send and a receive must agree.

Note that the signature has nothing to do with the contents of the message. Two messages with the same signature may contain completely different data, and even be completely different sizes. More to the point -- the message signature is a user-specified set of attributes, so it's up to the user to assign meanings to them; MPI just provides a flexible way to distinguish between different messages with the signature mechanism.

MPI has a message ordering guarantee for single-threaded, non-wildcard operations. That is, two messages sent with the same signature must be matched in the order that they were sent by the receiver. That is, if I send message A with signature Z, and then immediately send message B, also with signature Z right behind it, A must be received before B. If you think about it, it's pretty intuitive, actually.

Something that I hadn't thought about before was that long and short messages have different protocols. Take the following example:

Send long message A with signature Z.

Send short message B with signature Z.

Given what was discussed above, only the first packet of A will actually be sent to the receiver, whereas the entirety of message B will be sent to the receiver (because it's shorter than the length of one packet). However, A has to wait for an ACK and then the rest of the packets from the sender before it can be fully delivered to the receiver.

My implementation of IMPI didn't take this into account at all --
it just served up messages as soon as they became available. It didn't take into account the fact that long messages may be "in progress" and a short message may sneak in before the long message completed, and thereby violate the MPI message ordering guarantee. Doh!

Hence, I had to spend the majority of today writing diagrams and flow charts, and then implementing a "gate" at the delivery end of IMPI such that it watches for long and short messages, and has a somewhat-complicated state machine to only allow messages by when long messages are not already in progress. If a long message is in progress, the just-received message (even if it's the first of a long, itself) is queued up. When the long at the head of the queue finishes (i.e., we sent the ACK back to the sender, and the sender sent us the rest of the packets in the message), the rest of the queue can progress until either the queue drains or the first of a long is encountered (then we have to send the ACK back to the sender, wait for the rest of the packets, etc.).

Not a simple undertaking.

After all this, I got it working with HP's IMPI implementation. I found a bunch of memory leaks in our proxy agent (the "impid") and fixed up all of those. There's still a ton of "blocks in use" when the impid quits, but those are all from the internals of Solaris and there's nothing that I can do about them. :-\

After fixing those, I released LAM 6.4a6 to HP and MPI Software Technology (the two MPI vendors that we have to demo this stuff with next week at SC2000).

I love bcheck. I can't imagine how I programmed before I discovered it. Go RTFM if you don't know what it is.

One thing that bugs the crap out of me, though, is our implementation of what is called the IMPI server. The IMPI server is basically used as a rendevouz point at the beginning of a run. All the MPI implementations meet there, have some coffer, exchange some meta information, and then go off than shake their booty.

Needless to say, this all contains lots of socket code. The server allows you to specify what port it sits on to listen for the MPI implementations to meet at, or take a randomly assigned OS port. It's frequently convenient to use a fixed port for repeated runs, so that you can just do !! (or up-arrow) in the server and client windows and not have to change the port number in the various command line arguments.

However, sometimes when one tries to fire up the server again, it complains that the socket is "already in use", and you can't reclaim it for several minutes while the OS times out. Result: you have to go change the port number in all the command line parameters, which is a pain.

The thing is, I don't know why it says that the port is "already in use" -- I don't know the conditions that lead up to this. Indeed, take something like sendmail or apache -- it can always fire up on the correct port (25, 80, respectively) no matter what state it was previously shut down in. This suggests that it's not a client action that guarantees that the port will be open, but a server action. But I'll be damned if I know what it is. :-(

If anyone has any insight here (and is still reading this :-), please enlighten me...

November 10, 2000

Would you like a mouse pad?

There were some rocky parts, but I think we had a good SC2000 overall.

This is an epic journal entry. Cope.

Sunday

Some of us met in the lab where we gawked at the LAM and LSC shirts that Jeremy picked up on Saturday night. They rocked. The nd.edunetwork went out around 11:45 (note: this is important for later).

Long flights, a three hour layover in Midway (what are crazy place), arrival in Dallas. Lummy met us at the Dallas airport. ATA lost Arun's luggage, but we waited there for a while anyway. Got in, had dinner (which was Much Fun), and started some slides in Arun/Brian's room.

The hotel has high speed internet access, but nd.edu was down. Luckily, nd.edu's vBNS link was still up, so we could get in via Berkeley or Argonne. So life was still ok -- we could still get to our e-mail and do some work. No biggie.

Monday

We got to the exhibition floor somewhere around 9am. We appraised the situation, said hi to all the good IU and Purdue folks, and started to get our stuff together. The commodity link to nd.edu was still down, so we started downloading LAM and XMPI via Argonne.

Then.. BAM!!!

nd.edu's vBNS link went down.

And stayed down.

Life sucked.

As Arun said in his journal, epics have been written about less. We cobbled together [mostly] working versions of LAM and XMPI from backup and working copies at Berkeley and IU and our laptops. Ugh. We were all cursing Ameritech (supposedly the cause of the nd.edu's outage, but I still blame the OIT).

It was a race against time to get all our stuff downloaded, assembled from the various repositories around the country, couple it together with some missing software from ftp.gnu.org, battle a shaky SciNET (the network on the SC2000 show floor -- it kept going in and out), and get it all working.

The deadline was 7:30pm -- my IMPI demo. We finally got enough downloaded, and I met with the HP people. We were further confounded by the fact that the union folks made us clear the aisles in order to lay all the carpet between all the booths. Hence, I couldn't travel to the HP booth to coordinate with CQ (HP's IMPI guy -- his real name is Asian, and probably unpronounceable to us Americans, so he goes by "CQ"). I finally got over there around 4pm, and we did some testing.

After a bit of futzing, we got it up and running with HP as the master and displaying on his machine (we had to download and install ssh because they didn't have it, and the IU demo machines didn't have telnet (yeah!). But it all worked out.

After some more battling (battling low battery power, shaky SciNET connections, and pesky sales droids), we got it to work properly with the IU booth machines as the master. Whoo hoo!!

We also converted Matt from Purdue from MPICH to LAM. We reduced the complexity of his Makefiles dramatically, and showed him the goodness of lamboot, mpirun, etc. He said, "I'm a convert!". Another happy customer.

MPI Software Technology (MST), however, wasn't quite as lucky. :-( They didn't bring the right kind of fiber connectors to get on SciNET, and then the local Fry's was out of the right kind. Their IMPI implementation was not quite finished, either. I managed to download a recent copy (nd.edu came back up that evening) of LAM's IMPI distribution tarball. I downloaded a copy to their LAN and helped him get it up and running (they previously had some problems trying to install LAM, but I don't quite know why...). Rossen thanked me, and started debugging.

So my demo went off at 7:30pm and it seemed to go off well. I had a varying size of crowd watching. I was a bit annoyed, though, because literally at the last minute, I got switched to the other Imersadesk, and nothing was setup right. It took a good 10 minutes to get it setup right just so that I could bring up my slides. It was somewhat embarrassing because the NIST folks (the people who funded our IMPI work) were standing there waiting for me to start talking. But it eventually turned out ok.

We gave out a surprising number of LAM key chains (they were quite popular!). We walked around a bit and saw a few people, and it was generally pretty good.

We left there, dropped our stuff off back at our room, and went to the Beowulf Bash (which was conveniently in our hotel). It was pretty cool; when we got there, they were announcing that more deer was coming immanently (and it did :-). We chatted with Dave from Myricomm (and ND grad) and swapped ND stories. I also chatted with Doug from Paralogic, Don from Scyld, and and Dan from Scyld.

Dan chatted with all of us for a while -- they do some really cool stuff in Scyld for their clusters. They have an rfork() call that forks things onto nodes (and an associated rkill()), and do process migration all over the place. They directly load the BIOS to boot linux in 3 seconds, and the get everything else from the cluster master. I don't know all the details, but it sounds good.

I also chatted with Dan about the parallel MP3 encoder that I wrote a while ago (he downloaded it was amazed that he downloaded something from a .edu site -- particularly the LAM/MPI site -- and he ran ./configure / make with his MPICH distribution, and it just worked). He also wanted to talk about a parallel ogg vorbis encoder, and wants to write a paper about it on Linux Journal (I think it was LJ -- can't recall offhand). This could be really cool. I think we might do it.

I sent Dan an e-mail later saying, "let's do it -- how do you want to precede?" We'll see what happens. Also, Scyld is interested in LAM -- to do so, we would probably need to ditch the lamd. In such a case, Scyld would have to provide some services like process management (which I think they already do), an out-of-band messaging channel (which might be harder), potentially trace gathering, and name/value publishing. We'll see how this all works out.

After all the schmoozing, Brian and Arun and I had cigars downstairs and had a good chat about all kinds of things. Rock on.

Tuesday

Saw some MPI papers in the morning. Two were about one-sided implementations. The third was about... er... something. One guy presented results with LAM. Whoo hoo!!

We schmoozed all day. We officially ran out of key chains. We got several t-shirts from several companies, including a really nice button down shirt from Veridian (the PBS folks).

We talked to all kinds of people -- so many that I actually don't remember everything that happened that day. It was good. I do remember chatting with the Myricomm folks quite a bit, though, and chatting with the PBS folks, NIST people, HP,

I stopped by to see how MST was doing with IMPI. They were still having some problems, but I didn't have time to debug with him. I came back later and helped some more -- turns out that he wasn't zeroing out the upper 12 bytes in the IPV6 address, so LAM wasn't able to find a match in the source address. Hence, dropped packets. This turned into goodness; the MST/LAM ping pong tests started working.

Dinner was with the Research@Indiana folks at Fish: An Upscale Seafood Restaurant. All us ND students sat together (except George, who sat with Jesus, 'cause they got there a bit after us). Our conversation was mostly about the GPL, licenses, etc. It was pretty good, all around. A good time was had by all, and the food was excellent.

Wednesday

Got in a bit early to setup the LAM and XMPI demos. We had some real problems. :-( We uncovered some bugs in XMPI at literally the last minute, so I canceled the XMPI demo, and we did just the LAM demo. We actually had some problems there, too -- we had problems making a user MPI program fail in a controllable way (we wanted to show the usefullnes of running an LAM/MPI program under a debugger). But we finally got it, and it worked out ok.

However, we did have major problems with the Sun Workshop debugger -- we just couldn't get it to run. gdb didn't work, either. We had 4 UltraSPARC 10 machines to run down here, but they weren't quite setup the way that we were expecting. In particular, we asked for tcsh to be our default shells. But after some painful processes of elimination, we proved that the tcsh that was installed on those suns was broken -- it caused gdb to fail, and it sometimes caused logins to hang and have tcsh CPU usage to go around 95%
or so. VERY annoying, and very difficult to track down
-- how often do you actually suspect the shell itself? No, you assume other things are wrong (like your . files, the OS, etc.). But switching to csh fixed everything. I've never see anything like it before.

But we didn't figure this out before the LAM demo, so we actually run on nd.edu machines and used gdb (firing up the Workshop debugger invoked just too way too much time). The demo and talk actually went well, though.

I talked with a whole bunch of people throughout the rest of the day -- we wandered the floor some more, talked to some ASCI people, Tony and company at MST, the Compaq sales guys, etc., etc. During my "booth duty" time, I chatted lots of people about LAM/MPI and ND (including some people whose sons/daughters are currently at ND), and particularly with a guy from Sweden about LAM who mentioned that he wanted the ability to checkpoint LAM/MPI processes so that he could take his nodes down and do maintenance on his cluster. And then when he's done, restart the process and keep the MPI job going. I initially said no, you can't do it because of the "socket problem" (i.e., you can't checkpoint sockets -- more info below), but then I started thinking about it, especially with respect to the Condor checkpoint library (very cool stuff). We chatted about this for a while, and I ended up putting it in the background because other things were going on.

Spent a bit more time with Rossen and his IMPI. I don't recall what the exact error was, but we found it and fixed it, and after Rossen worked out the rest of the details, it later worked with LAM/MPI in the pmandel code. Woo hoo!

Spent a good amount of time debugging XMPI and LAM's demo (and figured out the tcsh/csh issues). After figuring out the csh problem, LAM pretty much fell in line right away. Brian and I spent the rest of the afternoon debugging XMPI and stayed after everyone left. We fixed up most everything and fixed up some nagging bugs.

Renzo called in the middle of this and we setup stuff for the BC game at ND this weekend. He's in Vegas this weekend, so no family dinner with Lynzo and the chunky monkey. Bonk! :-(

One of the problems was actually an error in Sun Workshop 5.0's <fstream< implementation. VERY ANNOYING. It turns out that using getline(fstream&, string&) to read in a blank line will start returning true for eof(). ARRGGGHHH!!!

Once we figured this out, Brian and I left for dinner (around 8pm). We passed the Myrinet folks, and chatted with them for a while (lots of laughs -- we share the same exact feelings about writing software, users, distributing software, etc., etc.). They recommended an Italian restaurant for dinner.

Brian and I headed out for dinner, and I brought up the checkpoint/restart problem with Condor's library. We talked about this for a while (we were in one of those cool Italian restaurants with paper tablecloths, so we could draw on it with the provided crayons, etc. Very handy!). A good dinner, with good food. We caught a cab back.

Thursday

More LAM pimping. Had more good chats with Myricomm/Bob Feldman; seems like we could have quite a future there. Near the end of the day, Talked to infiniband people about using their stuff as a high speed fabric for LAM. Had a look at some other booths; talked to the NPACI people, who had some REXEC people, and shared some info about LAM (since REXEC has some common elements with LAM).

Went over to the RealWorldComputing booth; they have some cool stuff, including SCore MPI. Meant to look at that last year, but...

Then we talked to a few linux integrators, pimping LAM. One hadn't heard of LAM (bastards!), but the other was Linux Networks. "Hey Jeff... we talked last year" was the greeting. Amazing. And apparently, Dog and Brian had been there about 5 minutes previously. But we had a nice chat and he gave us t-shirts.

Then the expo was over. We cleared our stuff out of the Research@Indiana booth and went back to the hotel. As we were getting on the shuttle bus, I said to Arun, "hey... some Swedish guy came up to me yesterday and gave me a great idea about checkpointing MPI jobs in LAM..." and then I stepped on the bus. I heard behind me, "Hey... you're the LAM guys! We've been meaning to find you!"

Turns out that the Condor grad students were standing right behind us and heard me mention checkpointing and noticed who we were. It further turns out that they've been having similar ideas -- wanting a checkpointable/migratable MPI. So we chatted on the bus, and then chatted some more in the bar before they had to catch a cab back to the airport. REALLY cool stuff, and we think we can do it. There's some delicious complications, but the fact of the matter is: no one else can do this, and it would be truly fantastic if we could do it.

Condor wants a checkpointable MPI and one that they can schedule/migrate around in Condor, and we want a checkpoint/restartable MPI. This could be the start of a really, really cool collaboration. I'll jot down the notes that are in my head in a technical journal entry after this. I'm still brimming over with goodness about this; I actually think we can make it all work (and get a bunch of papers, become famous, and take over the world). How cool is that?

We then met everyone else from the LSC and went to dinner at the Spaghetti Warehouse in the West End. Good food, and good conversation -- a good time was had by all.

Yes, I would like a mouse pad.

I forgot to mention that I am Mouse Pad Pimp Daddy. We came to Dallas with 900 LAM mouse pads (300 C, 300 Fortran, and 300 C++). WE HAVE NONE LEFT!!! I think that I personally handed out about 700 of them.

November 13, 2000

On diatribes and dianetics

This weekend was good -- I got back to SBN on Friday evening and briefly stopped at Ed-n-Suzanne's for a most excellent tuna sandwich. I then met up with Renzo and we ended up going to Senior Bar, where we ran into lots more people, Stina, Jason [current 'bone section leader], lil'Putt, Jill B., Jason B., Deli, Catherine K., etc. It was a good time. We then hiked to my office to get the parking pass.

The next morning, I was blading to where Renzo and Schleggue were parked when I ran into Jill B. again. During the conversation, my phone rang; it was Renzo, asking where the hell I was. Oops! I was now very late. But I eventually got over there, and Schleggue, Renzo, and I had some good conversation before we ended up heading over to the Putt tailgater.

More fun was had there by all. Tracy eventually joined us (she drove up that morning), and we all headed into the game. We smuggled Renzo and Schleggue into the student section, which was cool. Mike N., Brian B., his fiance Dana, Jeremy S., and Katie M. joined us as well. It was a fun game; a few nervous moments, but we ended up stomping on the hated BC Eagles, so the day ended well.

Thinking that we were smart, we ordered Papa John's right after the game from the stands on the rationale that it would take forever to get the pizza and we'd be at Oak Hill long before it arrived. Indeed, the PJ person told me that it would be 60-90 minutes before the pizza came.

We ran into Vernon my the car, and invited him along. Jason Brost left a message on Schleggue's voice mail (apparently the #$@%@#$%
wireless circuits get very busy in SBN during football games, and many calls don't get through, so they get switched to voicemail) indicating that he might drop by. So we decided that we didn't order enough pizzas. I called PJ back (it was 30 minutes after our first order at this point) to see if we could add another pizza to the order. The PJ person told me that the delivery guy had already left. DOH!!

So Tracy and I got out of the car (which was stymied in a long line of cars waiting to exit the Hesburgh library parking lot area) and started jogging to Oak Hill (me), and to PJ itself (Tracy). I didn't beat the pizza guy, but he went to the wrong address anyway, so he ended up coming back not long after I got there (which was a good bit before Renzo et al. arrived in the car). Good exercise to jog from the Hesburgh parking lot to Oak Hill, but God, I despise running...

Tracy and I went to mass at the Basilica the next day, but it was so crowded that we had to stand in the vestibule for the whole mass. After a brief trip to the Grotto, Tracy headed home, and I went to a SC'2000 roundup meeting at Lummy's. We chatted about LAM, SC2000, and future directions. Looks like Jeremiah, Ron, and Brian will eventually be joining the LAM Team. Woo hoo!!

Ron also mentioned an ANSI-izer tool that we could use on the LAM source code. Mmm.... I've been wanting to do that for quite a while. Since there ate 900+ source files in LAM/MPI, the standing rule has been to ANSI-ize each file whenever you edit it (it's just too much to go through and do them all at once). But having a tool to do it would be fabulous...

Ron also mentioned the LXR, which we might use to create an annotated, self-referencing hyperlinked version of the LAM source code. That too, would be quite cool. Lummy's big on web-enabled groupware things, so we're probably going to explore a few of those for the LSC as well.

I drove home, took care of a bunch of emails and things that popped into my head while I was driving, and then watched the X files with dinner.

November 14, 2000

Candelabra

Ok, so I didn't spend much (any) time on the Condor/LAM stuff yesterday. I spent most of the day finishing up the Password Storage and Retrieval system (PSR) originally written by Dale Southard. We use it with our batch queueing system (PBS) to get AFS tokens when jobs are submitted, and to automatically refresh tokens before they expire so that AFS authentication lasts throughout the entire submitted job.

It's pretty cool stuff -- it uses public/private keys for storing the user's password and whatnot. I've made it fully automake-ized, cleaned it up a bunch, added it to CVS, fixed a few bugs, ensured that it works with both Transarc's proprietary development AFS libraries and the krb4 freeware AFS libraries, and updated the patch to the OpenPBS source code (it's dynamically generated now, too). I finished early this morning and sent it off to Dale for review, and to Bob at PBS so that he can give the patch a once-over.

Hopefully -- that will be it, and I'll be able to release it and get it out of my hair.

Today will be spent answering 3 old LAM emails and working on the LAM/Condor description:

Keith from Citifinancial: he has discovered that when in fault tolerant mode, if you mpirun before the lamd's have discovered that one of the other lamd's is down, mpirun will get the wrong information and sit forever trying to spawn a job on a node where the lamd is gone. Hence, deadlock. Need to fix this.

Dave from GE: wants to get the native signal/error handler fired when LAM intercepts a SIGSEGV, SIGBUS, SIGFPE. Seems like a reasonable request; need to work with him a little more to get the details right.

Patricia from Dec: thinks that she has found a problem with MPI_Intercomm_merge in LAM. Need to check this out; I think she sent a sample program that shows the error.

November 16, 2000

Winter is the finest 7 months of the year in Wisconson

Been cleaning up LAM code for the past 48 hours. Trying to make it compile with a C++ compiler. You have no idea how painful it is.

And just when I thought I had a handle on it (I got liblam.a and libmpi.a and a bunch of supporting apps to compile cleanly), I moved into the lamd tree.

Oh, pain, pain, pain!

I'm in function pointer hell.

The original Llamas did everything in the pre-ANSI way, which was to simply declare a function pointer with the right return type, but with no arguments in the argument list. I guess this works...?

Part of the problem is that many of the lamd functions are supposed to return function pointers to the [effectively] to themselves. More to the point, they have to return pointers to functions that have the same signature as themselves. That is, function A has to return a pointer to function A (or a function that has the same signature as A).

After dinking around with this for quite a while, I sat back and thought about it, and it turns out that C/C++ can't do this legally. i.e., you can't declare a function that returns a pointer to a function with the same signature. It's a recursive problem -- trying to do so changes the return parameter type, which then changes the function signature, which then changes the return parameter type... etc., etc.

A more concrete example:

ret_type func_name(arg_list);

The goal is to have a function signature (call it func_sig) that encompasses all of that. However, func_sig must equal ret_type, which, if you think about it, can't be. Hence, C/C++ is unable to describe this abstraction.

This is actually very interesting (to me, at least), because I've never run across something that C/C++ just couldn't do because of its language specification. Sure, there are tons of things that C/C++ is not good at, but I can't recall ever running across something that it just couldn't do because of its language.

November 17, 2000

Extra thrifty lima beans

New version of Mojonation came out a few days ago. I noticed this because I suspected a memory leak in Mojonation because my router would become increasingly slow (although I never checked its memory usage... doh!) and swapping activity would become much more pronounced (I have a loud disk drive in that machine :-).

So I restarted mojonation today, and it told me that there was a new version available on the web site. Among other things, it fixed a memory leak. :-) We'll see how this bad boy performs now...

Additionally, Lummy sent around a hot tip about Linux's hdparm which allows you to tweak the performance of your IDE hard drives. I tweaked a bit on my laptop and got a good amount of speedup. Same for my router -- tweaked a bit and got some improvement (from about 4.something MB/sec to 6.something MB/sec). On my desktop machine, the performance increase was dramatic. I went from 4.83 MB/sec to 25.50 MB/sec! That rocks!

Per request, I created web archives for our LSC staff internal mailing list today. Some peals of wisdom have been mailed across the list (C++ tricks, location of Friday lunch order files, etc.) and been lost. Web archives fix that.

I also made it a real mailing list instead of a sendmail alias. GNU mailman ROCKS.

I forgot to mention in the journal that a few days ago (or was it last night? Time has no meaning...), I formally released the Password Storage and Retrieval system (PSR) that allows OpenPBS jobs to run with AFS authentication. I also pinged the Condor guys about it (today) since I seem to recall that Dale said something about how they were interested in it. But I could be halicinating.

Speaking of Condor, I mailed off the huge technical entry about LAM/Condor (curses -- it just occurs to me that I set the category incorrectly on last night's journal entry!) to the Condor folks. Erik says that he'll read it this weekend in depth and discuss it with the other Condors next week.

I wonder if they refer to themselves as Condors as we refer to ourselves as uber-auth^H^H^H^H^H^H^H^HLlamas.

Off to do some LAM debugging, and them more dissertation writing. Gotta get a skeleton together at the very least.

Got to Hell, Costas

I found Arun's The Moog Cookbook in my laptop as a leftover from SC2000. So I had to rip it into MP3s and have been enjoying it all day on my surround sound speakers. It's no "Slut", but it's not bad.

And of course, I'm gonna have to buy the damned CD now. Damn morals... arrghh...

Had a dentist appointment this morning. He tells me that all four of my wisdom teeth are gonna have to come out, as well as one more that's as rotten as a skunk roadkill in Alabama in the middle of July. And baby. that's rotten.

Spent the majority of the rest of the day finishing typing up my notes on Condor/LAM. I'll send those in a separate journal entry.

I did spend a little time looking into anti-virus software for my church. What a scam. You basically have to subscribe to anti-virus software these days -- pay a yearly fee for the privilege of continuing to get anti-virus updates. On the one hand, I can see how the company is continuing to provide a service, and that service should be paid for. But on the other hand, it's more like a tax -- if you run in the Windoze or Mac world, you need to have anti-virus software. Hence, you will have to pay whatever they charge. And it's not like there's tons of competition in the anti-virus world: there's essentially two companies, and their prices after 2 years of subscriptions are essentially a wash.

Don't let me get started on a rant here, but have you noticed how the whole security industry is founded upon the mistrustful nature of humans? Remember ARPANET? (of course, few of us "young 'uns" actually remember the ARPANET, but we've all read about it) There was no security -- everyone just trusted each other. There were no passwords, no secured protocols, no encryption. It just worked.

Such a system is inconceivable these days -- releasing the 'net to the rest of the world has brought out the worst in humans. Online scams, cracking, stealing of information, viruses -- it's all now commonplace and people almost expect it. Or, even worse, they have the attitude, "I don't have any important information -- no one would bother to hack into my system..." But that's a whole different topic; I digress.

So to combat this, the whole virtual security industry sprang up pretty much overnight. It's probably a multi-billion dollar business. And it can't even offer any guarantees. And it's all because humans suck, morally speaking. Especially the high-school punks who break in just for the sport of it, and don't realize that each of their pranks actually cost thousands of dollars. These kids don't even have a realization that what they are doing is wrong. It doesn't matter how easy it is -- it's still wrong. Just because I know that the Smiths leave their front door unlocked during the day doesn't mean that I actually walk into their house and start poking around.

And viruses. What the hell is the point of that? They're not directed attacks. They are potentially wide-spread attacks with massive collateral damage to innocent people who did nothing wrong other than open an e-mail attachment. Why? What could the virus writer possibly derive from that? Some kind of sick, twisted joy at the fact that their virus brought down hundreds of mail servers (e.g., Melissa), or wiped out thousands of hard drives around the world? My dad's hardware store got hit with a virus recently. It instantly went out across his Windoze network and infested 3 workstations. Luckily, the virus was fairly benign -- it only whacked all his .jpg and .gif files. But it could have been much, much worse. And that computer network is his livelihood -- it all that data goes away, he's screwed. All because some high school kid thought it would be fun.

I'm grossly stereotyping here, sure. So sue me, but I'm mad.

This may seem to be a bit of a stretch, but bear with me... I talked to a guy in GE Medical Systems one day -- he was a manager in their produce development section. I told him that I was a computer scientist. He said he loved to get newly graduate comp sci majors working for him. He said that without fail, within the first month or so of all new comp sci hires, he would take them down to a hospital and show them real patients whose lives depended upon the software that they wrote. A bug, a simple seg fault, an overflowed buffer, a bad logic test, and someone will die.

So the things that we do on computers (as computer scientists) we tend to imagine all stays "in the computer", and it can be hard to realize that what we do actually affects real life. But it does. The medical systems example is rather extreme, but I even went off in a previous journal entry about how LAM/MPI is used in people's daily lives, and the things that LAM/MPI is used for are in even more people's daily lives. Indeed, my favorite example of one project that uses LAM/MPI is the US Naval Surface Warfare center (SWAPAR). They use some of the MPI-2 dynamic process management features of LAM/MPI to simulate large scale naval battles, and use that to help shape navy tactics and policy.

So what we do is real. It matters. And it matters when that punk releases a virus that goes off and destroys a few thousand random hard drives. It matters a lot to the people whose hard drives it crashed. And it offends me that others in my profession do these kinds of things.

But to end this very random and wandering diatribe on a positive note, the next time you're sitting in a movie theater watching some naval battle and some "military smart" friend tries to explain the actual tactics to you, just nod sagely, touch your nose, and say, "Yes, I know. I wrote the book that wrote the book. I am an uber-author. I am the alpha to this omega. I am a Llama."

Migrating racks of LAM

I've got a bunch of things that I want to put down about a possibility about making LAM/MPI be checkpoint/restartable. I'll break it into multiple parts:

Some LAM terminology

The "checkpointing sockets" problem

Possibilities

lamd problems

Possibilities with Condor

Checkpointing without Condor

Making this portable

Other problems

Some LAM terminology

Since others will be reading this text, I'm going to throw in some LAM definitions that I'll be re-using throughout the text below:

lamd: The lamd is the LAM daemon that is run on every host in a "normal" LAM run-time environment. It provides several services to running LAM/MPI jobs, such as process control, an out-of-band messaging channel, key=value global publishing, a scoping mechanism, etc.

C2C: An acronym for "client-to-client", meaning that MPI communication goes directly from the source process to the destination process. This is usually via TCP sockets, but can also be via shmem or GM (myrinet), or whatever other network connects to MPI ranks.

nsend() / nrecv(): the function calls in the LAM/MPI implementation that are used for the out-of-band messaging channel. That is, MPI ranks can use nsend() and nrecv() to send messages to each other. These messages go from the source rank to the local lamd, then to the remote lamd, and then to the destination rank. Hence, the out-of-band messaging channel goes through the lamd, not through C2C channels.

LAM universe: one instance of the LAM/MPI run-time environment. That is, the LAM run-time environment is typically instantiated with the lamboot command and a file specifying a list of hosts. The LAM universe then exists among that set of hosts.

Here's a few assumptions that we make because of the LAM/MPI environment:

LAM/MPI is completely user-level. All processes belong to the user -- nothing runs as root. That is, each user has their own set of lamd's and user MPI programs.

LAM/MPI currently cannot "overlap" universes except in batch systems. By "overlap", I mean have multiple, different LAM universes of the same user on the same machine. i.e., while a user can run as many MPI programs as they want in a single LAM/MPI universe (and even have them share the same machines safely without interfering with each other), you cannot have multiple LAM/MPI universes on the same machine without a special exception. It will be trivial to make LAM be able to overlap universes in a Condor environment, but I felt that I should mention this.

The "checkpointing sockets" problem

So the Condor project has a library that can checkpoint a running program and start it up again at a later point. It can even migrate it to a different machine. That is, it serializes the entire image of the process (stacks, heap, program, data, etc., etc.) and dumps it into a file (or socket, apparently). The astute reader will recognize that things like open files will present a problem in this scheme -- particularly in the case of migration. i.e., if a process has an open file and it migrates to a new node, what happens with read() and write() calls in the process to that open file on the new node?

The answer is that the library leaves a "proxy" agent (I think their terminology for it is a "shadow process") back on the original node. So read() and write() calls on the new node are proxied back to the original node where the real operation takes place, and the result is piped back to the new node where the program is running.

This is all fine and good for most system calls -- i.e., intercept all system calls, shuttle them back to the proxy agent, and then pipe the results back -- but it doesn't work for sockets. More to the point, it could work with sockets (at least I think it could), but then performance on the sockets will suck, and that is unfortunately important to us in MPI-land (i.e., latency would rise dramatically, and there could be potential bandwidth issues as well, depending on the proxy implementation). Hence, we have "the socket problem".

The solution is to close all sockets before allowing an MPI job to be checkpointed, and then re-establish them after the job has been restarted. Multiple problems arise from this, though. The MPI job will assumedly still know where its sibling ranks were located (and could therefore reestablish sockets to them), but zero or more ranks may have moved -- so trying to establish sockets to the old addresses may not work anymore. LAM needs to become aware of which ranks moved and where they moved to.

This is particularly problematic with LAM's shared memory/TCP scheme. i.e., if rank X migrates, it needs to re-figured out if rank Y is on the same machine or not. Specifically, it needs to re-initialize its entire connection table and either [re]connect its sockets, or [re]setup shared memory to communicate with Y. Even more generally than the TCP/shmem problem, this is definitely going to change the RPI somehow.

There are other issues as well -- how do we start up a LAM job under Condor? LAM currently uses a separate daemon process (the lamd) for a bunch of additional services, such as process control (fork/kill), an out-of-band message channel, and a global database for arbitrary key=value pairs (for MPI-2 MPI_PUBLISH). I guess it also functions as a scope mechanism as well -- providing a "universe" for a single user.

Possibilities

For efficiency reasons, we may only want to only checkpoint/migrate some ranks -- not all of them. Hence, there are two kinds of ranks: a rank that will get checkpointed (and possibly migrated), and a rank that will not. It seems to make sense to notify the entire parallel application (i.e., all ranks) when even one rank is checkpointed with intent to exit (e.g., because it will be migrated). So there's even two types of checkpoints: (a) one to just save the process's state (i.e., checkpoint the entire parallel application just for save/backup purposes), (b) and one to migrate one or more of the ranks to a different node.

We'll discuss (b) first (checkpointing for the purpose of migrating), because it lays the groundwork for (a).

Checkpointing for migration: the checkpointed rank

So it seems that LAM needs to take some actions before it allows itself to be checkpointed, and them immediately after it restores from a checkpoint. So if a LAM job can get some signal when it wants to be checkpointed (possibly via nrecv() from the local unix named socket, which we currently implement with SIGUSR2 so that the MPI process knows to go check the socket), a signal handler can be fired, read the message, realize that it wants to be checkpointed, flush and close down and invalidate all its communication channels (including the local unix socket to the lamd [or lamd-like underlying services] sockets, GM ports, shmem, etc.), and then checkpoint itself. This will require at least one new RPI function so that we can keep the RPI abstraction clean and apply this to all of our RPIs --
close/invalidate procs (with the assumption that no new communication will happen before we re-invoke _rpi_c2c_addprocs() to re-add all the communication channels again).

The Condor guys tell me that there is a checkpoint_and_exit() function that, when called, dumps the state of the program out to a file (or a socket), and then exits. Very handy! When the process is restored, it just returns from this function. Ultra cool!

So after returning from this function, an MPI rank must obtain the [potentially new] locations of its sibling ranks. I'm thinking that this will come from an nrecv() from the underlying infrastructure (i.e., Condor) -- it will get an array of information saying where everything is (how to do different RPI's? GM ports vs. TCP addresses/ports, for example? Might have to re-init those as well; re-look for open GM ports, etc.).

That is, the run-time system that potentially moved the ranks in the first place will know precisely where all the ranks are, so it can provide the location information to each rank. Once this information is provided to each rank, the ranks can effectively re-do some of the stuff that they did during startup (contact their local "lamd", establish C2C communications with the other ranks by calling _rpi_c2c_addprocs(), etc. I'll explain why "lamd" is in quotes later).

Specifically, the sequence of events on a single MPI rank will be something like the following:

Receive SIGUSR2.

nrecv() a message indicating three things:

One or more MPI ranks is going to migrate.

Whether this rank needs to checkpoint.

Whether this rank is going to migrate.

Flush all C2C and local "lamd" communications.

Close down all C2C connections.

Close down connection to the local "lamd".

If this rank is to checkpoint:

If this rank is to migrate, call checkpoint_and_exit(). The steps below will commence when the rank has been migrated and starts up again, and returns from checkpoint_and_exit().

Return from SIGUSR2 handler and continue processing in user code as if nothing had happened.

I think that's essentially it. There's a bunch of details in there, of course, particularly in the re-initializing C2C connections bit, but that should all be resolvable with some clear and potentially clever re-entrant C2C init code. Hence, when we go through this checkpoint/migrate phase and re-establish C2C communications, we essentially re-initialize the C2C subsystem -- do the exact same thing as when we do it the first time. That would probably be the cleanest approach.

Checkpointing for migration: the non-checkpointed ranks

Upon further thought, I guess there is little difference between checkpointed ranks and non-checkpointed ranks. There could be a slight optimization in that it is really only necessary to send new location information for ranks that have migrated -- the old location information is sufficient for any rank that has not migrated. However, it may make it easier in terms of less complexity to only have one code path -- just receive all new location information.

However, the question does arise -- when one MPI rank out of a parallel job is migrated, what happens to the other ranks while the rank is in process of moving? There are two approaches:

Make the other ranks freeze and wait for the migrating ranks to be restored and C2C communications have been re-established. This certainly makes implementation of the MPI side easier -- the non-migrating ranks can just sit blocking on the nrecv() waiting for new location information. The underlying "lamd" can just delay sending the new location information until the migrating ranks have been restored.

Allow the other ranks to continue in the user program while the MPI rank(s) in question migrate. They would have to freeze at the first blocking communication involving the rank(s) that are being migrated. Any non-blocking communication can continue (e.g., Isend, Send_init, etc.), but would have to be "suspended", indicating that they just get put in a queue, and will only be attempted when the destination rank(s) are actually restored from migration and C2C communication has been restored to them.

This will add complexity to the MPI implementation, and it slightly changes the scheme presented above -- the non-migrating ranks will have to delay the second part of the scheme (i.e., starting with the nrecv() to get the new location information) until they get a second signal indicating that one or more of the migrating ranks are now ready.

This could get arbitrarily complicated -- take the case where N ranks migrate. What if they get restored at different times? i.e., if one rank gets restored much earlier than the rest -- does the underlying "lamd" signal the other ranks in the job with just the new location information for that one rank? Or does it wait for all N ranks to be restored before signaling everyone? The coarse-grain approach is clearly easier; the question is what actually happens most of the time: does Condor (and others) piecemeal restore migrated processes, or all at once?

So this raises some interesting questions:

With the "easy" model of making all MPI ranks wait until all migrated processes are restored, is there really much of a difference in migrating one rank versus migrating all ranks? Since they all block waiting for the one migrated node to be restored, particularly if that one rank can't be restored immediately. For example, the MPI rank that was migrated was running on an idle workstation that suddenly became non-idle, forcing the MPI rank to migrate. But say that there are no more idle workstations available, so this MPI rank must wait in limbo for a while for another machine to become idle. But during this time, the entire rest of the MPI application must also wait. What happens to the accounting records during this time? Are Condor users "charged" with the time that the rest of their MPI ranks are blocking?

There is also the argument that most MPI programs tend to operate at least in some kind of lock-step. i.e., the MPI ranks are at least loosely synchronized (e.g., per iteration). So even if the non-migrating ranks are allowed to continue, they'll eventually block anyway because they'll try to communicate with a rank that is in process of migrating (or, by the domino effect, try to communicate with a rank who is blocking trying to communicate with a rank that is in progress of migrating, etc.), which could potentially (and usually!) eventually cause the whole MPI process to block anyway. More to the point: is there anything gained by allowing non-migrating MPI ranks to continue while one or more MPI ranks are in process of migrating? My gut feeling says no.

Hence, it may make sense to really only migrate the entire MPI process at once, or only migrate ranks when it is known that they can be placed immediately. This may not be possible, so it may be easiest to just make all MPI ranks block until migrated ranks are restored and C2C communication is restored. The accounting issue still needs to be addressed, though.

However, I have very little experience in the dynamic process migration area -- I'm curious to what the Condor folks have to say about these ideas and questions.

Checkpointing for saving state (no migration)

For checkpoints that do not involve migration -- i.e., checkpointing just for the purpose of saving state -- it may or may not be necessary to close all communications channels. On the one hand, no rank is migrating, so it would seem silly to close and re-establish communications with the exact same location information. On the other hand, if we want to re-start the checkpointed process later, the re-started process will return from the checkpoint() (notice -- not checkpoint_and_exit()) function. If we re-start the process on an entirely different set of nodes (e.g., a PBS or Condor job is checkpointed and then later fails because someone powers off a node, so we restart the job in a later PBS/Condor job -- the ranks will be on entirely different machines and have a different topology), we will need to re-learn the location knowledge and re-establish C2C channels.

Using this argument, it's probably better to treat a backup/save checkpoint (even with no migration involved) as a checkpoint with all ranks migrating (per the procedures shown in the previous section), so that all ranks close all communications channels and then receive new location information from the underlying system (lamd/Condor) and then re-establish all communication channels.

This would allow the most flexibility for re-starting a job. That is, even if the job does get restarted from a set of migration files, it doesn't matter if it is on the same set of nodes or not -- it will re-establish all C2C communication channels and continue from where it left off.

lamd problems

The lamd is really helpful in standalone environments. But does it really make sense in a Condor (or other run-time system)? We mainly use the lamd for the following kinds of services:

Process control (startup, shutdown, abort)

Out-of-band messaging

key=value publishing

File transfer (mainly for non-uniform filesystems)

Scoping mechanism

Normally, each MPI rank is associated with a single lamd that is located on the same machine. They communicate through a named unix pipe. When the lamd sends a message to an MPI rank, it pushes a message down the socket and then tweaks the process with SIGUSR2.

Note that there may be multiple MPI ranks per lamd --
it is common to run multiple MPI ranks on a single machine. In this case, they all share a common lamd (although the MPI ranks don't know or care that they are sharing a lamd).

It should also be noted that the out-of-band messaging can also be the primary message channel for an MPI job. That is, C2C communications aren't necessarily setup. It's a run-time flag to mpirun -- the user can specify to use the lamd for all communication instead of C2C. Although this imposes extra hops on the all messages (even MPI_Send / MPI_Recv messages), it can provide true asynchroncity (sp?) for non-blocking messages. That is, LAM/MPI is single threaded, so it can only make progress on messages while it is inside of LAM/MPI function calls. In the "lamd" mode, once a message is given to the lamd, the lamd is a separate process, so it can make progress on the message independently of the main thread of control in the user program. While this may seem counterintuitive and incur too much extra overhead, several LAM users who rely on non-blocking message passing have told us that they can get significant speedup using this mode as opposed to C2C.

So LAM's normal model is that each MPI rank has a single lamd that it is associated with. This may be problematic with Condor (or any other run-time system) for multiple reasons:

If the MPI rank ever migrates off a given machine, the lamd will also have to be migrated with it. Hence, both processes will need to be treated as a single process by Condor, which I assume would create some special exceptions in the Condor code. This is not attractive.

Even worse, if multiple MPI ranks are sharing a single lamd, if one of those MPI ranks migrates and the others do not, what happens to the lamd? It would seem that we need to create a new one on the machine where the MPI rank migrates to, and then have the network of lamd's reorient themselves to include the new lamd. Or, if the MPI rank migrates to a node that already has a lamd, it can just join that lamd, and no new lamd is necessary. But this would seem quite complex to implement!

Hence, it would seem desirable to be able to ditch the lamd when running in some other run-time environment (such as Condor).

Possibilities with Condor

Our short conversation with the Condor folks is that a LAM/MPI program will need to interact with their "starter" somehow, or have a custom LAM/MPI starter written that knows things about MPI programs.

My first impression (and admittedly, I don't know much about how Condor works) is that the least-cost solution here would be to have a custom LAM/MPI "starter" that can mimic the lamd services. It would seem that Condor must already provide most of what we need; the starter can simply provide a translation between what LAM/MPI expects and the native Condor underlying services. Hence, the majority of LAM/MPI wouldn't need to change -- it just opens up a local unix socket to what it thinks is the lamd, but in reality it's a Condor "starter" (or whatever).

More specifically, some of LAM's calls such as nsend(), nrecv(), rploadgo(), rpdoom(), etc., can probably translate to Condor semantics without too much trouble. So if Condor can open a socket and effectively have an nrecv() implemented locally, it can receive local packets from MPI ranks, and then process and interpret them.

Admittedly, this would put more of a burden on the Condor folks, but I think we could help out a bit as well. :-)

Checkpointing without Condor

In a non-Condor environment, it would still be highly desirable to be able to checkpoint. Can we do this without the rest of Condor? I would assume that we could make it so. I think that the key for doing this outside of Condor would be a new pseudo-daemon in the lamd to handle these kinds of things -- to furnish the new location data, for example. We'll probably also need a command like rempirun to restart a checkpointed job. Possible scenarios include:

A separate LAM executable (mpicheckpoint) that can checkpoint a running MPI program to a set of rank files. The checkpointing will follow the same scheme as outlined above. A run-time flag can specify whether the job should stop or continue after the checkpoint. It might also be desirable to provide a LAM-specific API call for this as well (MPIL_Checkpoint(char* directory, int stop_flag) or something). Note: we're not talking about migrating here; see below.

A separate LAM executable (rempirun) can take a set of rank files from mpicheckpoint and restart the job on an arbitrary set of nodes. Note that this would not have to happen in the same LAM universe -- it could have much later, for example, after the LAM universe that the original job was running in has been destroyed and a new one takes its place. Some extra condor-checkpoint-library bootstrapping is probably necessary to restart the job, but after that, it just uses the lamd to get the new location data, etc., just like it would in a Condor environment.

A separate LAM executable (lammoverank) can migrate one or more ranks to different nodes within the current LAM universe. This can work exactly the same way as it does in Condor. As mentioned above, this will require an extra pseudo-daemon in the lamd to know where ranks are moving and provide new location data to all the ranks.

Making this portable

There is desire to run LAM/MPI in other run-time environments (as alluded to in comments above) in addition to Condor. Scyld is an obvious target, since they have their own set of process control stuff (bproc) and whatnot. Scyld might be a bit more challenging because they seem to only support process control, not the other services that we need. Someone (Jeremiah?) suggested that we might be able to get away with onelamd somewhere in the system; I'm not quite sure that this would work, but it will definitely take a) further thought on the issue, and b) investigation of bproc and the rest of the Scyld infrastructure.

PBS is another obvious target (as well as any other batch schedulers). It would be nice to ditch the lamd in a batch environment, and rely on the batch system's underlying services for process control (the benefits are obvious, not the least of which is job accounting and guaranteed cleanup, a notorious problem for non-native support in batch schedulers), but the out-of-band messaging and global publishing still need to happen as well. PBS's TM can do the process control and can do the global publishing too (IIRC), but I don't think it provides any kind of out-of-band messaging. That will require more thought... Our initial ideas about PBS/TM (from a while ago) didn't include ditching the lamd, but perhaps this is a bit more natural extension of making this whole concept portable (i.e., replacing the lamd with underlying services, when available).

Or will a "one lamd" idea work here, too? Not sure how such an idea will work, but it's worth thinking about.

The real trick, however, will be to do this in a run-time-decidable way. That is, it would be nice, at run time to decide which underlying service to use -- native lamd, Condor, PBS/TM, Scyld, etc. That is, a user can take the same executable (assuming that their LAM was compiled for support for all of them) between all systems without having to recompile/relink. That would be nice, but not an absolutely necessary goal.

Upon a moment's reflection, from the proposed schemes above, the difference between native lamd and Condor would not be known to the MPI process -- if Condor truly emulates the lamd, there's no need to know. Whether or not the LAM has been compiled with checkpoint/migrate support is an entirely different issue (because I assume we'll need to get some Condor headers/libraries and some #if code for the checkpoint/migrate LAM code).

In order to make this workable for PBS/TM and/or Scyld (i.e., to keep the abstraction level clean), we'll have to implement lamd services in the lower levels of PBS/TM and Scyld as well. Hmm. I guess we'll have to cross the line into the root-level services earlier than we thought!

For PBS/TM, all the TM stuff is in one file, so extending that should be easy. But to do true messaging, it may take a bit more --
we may have to do some actual hacking in the MOM itself. It could be as simple as adapting the lamd's to fit in the MOM. We'll have to see. As for Scyld, I have no idea. :-)

Other problems

Voluntary vs. involuntary checkpointing. Is there much of an issue here? Probably not -- I don't see why involuntary checkpointing can't work just like voluntary checkpointing.

How about open files and whatnot? Particularly after a migration? Condor can proxy this stuff back to the original node, but does this make sense in a batch situation? What if we don't own those nodes anymore? This might be ok for Condor, but about about PBS / Scyld? It would seem bad for PBS. :-(

Are we trying to solve the "node goes down" problem? i.e., involuntary checkpoint at timed intervals (to files, not sockets...?), and if a node crashes at some point, we can rempirun the set of checkpoint files (which would seem highly desirable). But what about open files, etc.? If the node crashes, there's no Condor proxy to take the request back to on the original node ('cause it's down). So does checkpointing with the Condor library solve the "node goes down" problem? Or perhaps only in a limited scope (i.e., your open files won't be preserved)...? Granted, anything outside of the MPI API is outside the scope of what we need to worry about, but this does seem to be a "real world" concern that would be good to take care of. Even if it just means setting open file descriptors to -1 or NULL upon restoration of the process so that the job can know that the files are closed or something.

So what happens to lamboot and lamhalt under Condor? Does they effectively become noops (we can't ditch them, because users will still invoke them)? And then mpirun talks to various Condor services (for example) to do the things that the lamd would have done? One of the current functions of mpirun is to serve as a rendezvous point for the ranks so that they can all become aware of each other. Does this still need to be? It would seem that it would need to be changed somehow -- since the migration problem changes all the location information anyway, Condor itself must provide a way to get this information, potentially making mpirun's rendezvous point irrelevant.

Does this (running under Condor, PBS/TM, or Scyld) make sense with the MPI_COMM_CONNECT and MPI_COMM_ACCEPT models? i.e., how does a Condor job get more nodes? Or how do multiple Condor jobs join together? In vanilla LAM, only jobs in a single universe can join together. Will this be true in Condor (etc.)? More to the point:

What would it mean to allow multiple LAM universes together? What about the obvious security concerns with this?

How will a universe be defined in Condor? Will you have to (for example) ask for M nodes and start M different jobs and have them CONNECT / ACCEPT to each other?

If this is the case (still only connect within a single universe), is CONNECT / ACCEPT useful within a Condor context?

The same question applies to SPAWN -- does the user have to request a maximum number of nodes ahead of time? Or, when SPAWN is invoked, does this have to allocate nodes from Condor dynamically and then spawn on them? This scheme would seem attractive, but it may cause the MPI application to hang while waiting for nodes to become available?

In a dynamic environment like Condor, is dynamic processing useful at all, given that a SPAWN may have to block waiting for the underlying system to make nodes available? Does the whole MPI application (or, at least the ranks who invoke SPAWN) have to block waiting for this to happen? (no one has answered this yet -- it's not even defined in the MPI standard)

Summary

So these are my initial thoughts. In spite of all the unanswered questions listed above, I believe that this can work. Some trips Wisconsin<-->South Bend and some teleconferencing and a ton of e-mail will likely be necessary. But this is ultra cool stuff, and will be immediately useful to lots of people in the real world. Plus, we'll get lots of papers out of it, become famous, and one or two people might degrees out of it. :-)

November 19, 2000

17 days and a wakeup

We effectively stomped on Rutgers yesterday. Woo hoo!!

We looked a bit sloppy at times; their quarterback was quite good, actually, although he was a bit too hasty and kept taking high-risk passes. So we kept intercepting them. :-) Aside from a few nervous points, it was a fun game to watch. Go Irish!

Spend some of yesterday playing with modules in LSC's AFS space. I preliminarily made up modules for PBS, LAM, MPICH, Workshop, and Forte6. We will probably make up modules for all the GNU stuff (although they'll be broken up into several modules -- the compilers and auto* and libtool, Gnome, and the rest of the GNU stuff, or somesuchlikethat). Lummy wants to go a bit hog wild and have our own copies of latex, X, etc. We'll see -- we've been trying to have a higher bandwidth discussion about this for a few days and keep missing each other.

This all precipitated because I'm genuinely worried about having all the GNU file utilities first in our path rather than the Solaris ones. If I want to work in Linux, I'll work in Linux. If I want to work in Solaris, I want to work in Solaris -- not Linux. I've been burned a couple of times by having the GNU stuff first in my path (ar, ranlib, make, etc.) rather than the Solaris stuff, and I don't want that to be. It just scares me, 'cause we'll end up coding for GNU-specificisms without even knowing it. And that will suck (that's one of my pet peeves: people who code for GNU-specific extensions and say, "just use gcc" everywhere. They don't understand what they are saying. Although I have personally discussed this with many people, I'll put it here in my journal to get it on the record: take the Alpha processor, for example. When you switch from Tru64 to Linux, you lose at least 10% of the performance [there are hard numbers to prove this]. And when you switch from custom compilers to gcc you lose at least another 10% of performance [I'm speaking of high-performance applications, of course]. gcc just doesn't have the punch on all platforms. Portability is only half the story).

Anyhoo, we're going to split it up somehow. The exact mechanism remains to be seen. Modules are pretty nice, actually, and surprisingly easy to setup and maintain. Although we've been meaning to do this for quite a long time, we really should have done this a while ago.

Saw the movie "Bounce" with Ben Affleck and Gweneth Paltrow (sp?) last night with Janna and Tracy. Yes, it was a concession to the ladies (who wanted to see it). I'll give it a sympathy, but that doesn't really rate the quality of the movie because it's just not my kind of movie. So if you want an honest rating, go see it yourself.

Today will be spent putting together a real skeleton for my dissertation. I've started this a few times, but really need to carry through and actually put all the .tex into one place and start shaping it up to be a real dissertation.

November 21, 2000

Who needs green beans?

Here's some interesting factoids that I learned this morning while having a cavity filled:

Dentists' drill tips are made of a diamond/metal carbide. They spin at many thousands of RPMs, and when combined with a little spay of water, vaporize whatever they come into contact with.

The jaw nerves are split in half. So when they give you novacain, it only numbs up half of your jaw/face. Right now, the right half of my chin all the way up to (and including!) my right ear are numb.

Modern cavity fills are multiple layered: I forget the name of the first one, then a "primer" layer, and then a bonding agent. The bonding agent (IIRC) is light activated -- so they have a "light gun" that shines a many-watt highly-intense light on the tooth to make the bonding agent cure. There's an orange shield around the nozzle so that the dentist can watch/direct the light without being blinded.

It's difficult to talk when half of your tougne is numb.

We have nerves in our teeth only for the sake of knowing when something is wrong. i.e., the nerves in our teeth on serve as warning indicators. Sharks do not have nerves in their teeth. Godgineer must have figured that since sharks lose teeth all the time (and promptly grow new ones to replace to lost ones), it would be less efficient to put the warning indicators in there. Since we humans only get two sets of teeth, having the "failure alert system" was a good engineering decision.

It feels really, really weird to drink something and only feel it on half of your tounge.

Dentist drills can go at different speeds, not only for the different types of work that they do, but also because it is possible to resonate within the jaw and within specific teeth. Hence, if patient starts resonating with a given drill, the dentist can switch to a drill with a different set of harmonics. (No I'm not making this up; it happened to me this morning!)

My sister is hosting the big Squyres Clan Thanksgiving Dinner this year; just about everyone in the family will be there. She came up with the bright idea early yesterday afternoon to rent a PlayStation "for the boys", and called my brother-in-law at work to go rent one (apparently his work is literally right across the street from Blockbuster). So he popped across the street and found a PS. But wait.. it wasn't a PS... it was a PlayStation2!! They apparently only have one, and someone had returned it literally 5 minutes previously. So Rob rented it along with several games and took it home to hook it up.

He didn't go back to work.

It should be much fun!

I've been playing with modules in the LSC AFS space. I have them pretty much stable and working now. There's two distinct sets of modules: ones that are cross-platform (e.g., LAM, MPICH), and several more that are platform-specific (e.g., we only have SSL/pine compiled for sparc-sun-solaris2.6 and sparc-sun-solaris2.7). Loading the lsc module loads a default set for a given architecture -- the default cross-platform ones and then a platform-specific lsc module that loads any platform-specific modules that we have for that platform.

All in all, it's pretty neat stuff. Kinda annoying, though, since aliases aren't inherited by the shell. So you have to go through some extra hoops and hurdles to make that work right.

It's also kind annoying that the IRIX machines on campus have their own modules, but use a much older version of the modules package. Hence, in order to interoperate -- and yes, this is counter-intuitive -- we have to use the older modules version, not the newer version. Go figure (using the newer module version with the older modules causes seg faults, but using the older module version with the newer modules works fine). So that causes some extra hoops and hurdles as well. Ugh. It would be nice if there was one uniform version of module stuff across all campus.

But they certainly do make it easy to switch between versions of things, and make maintaining packages easier because each package has its own discrete module.

December 1, 2000

Dave, I'm not really in the box

What, you expected some kind of regular entries? Pshaw.

As usual, my travel has interrupted my regular flow of journal entries. Here's on that summarizes the last week or so...

Flew back to Philly to my 'rents place with Tracy for Thanksgiving. We flew via Cleveland which was apparently getting snowed in when we arrived. So we diverted to Cincinatti and came back to Cleveland before being able to land. Luckily, the good folks at Continental were able to get us on another flight to Philly that night, so all was well.

Had a big clan gathering at my sister's in Allentown the next day, which was pretty cool. Everyone was there with the exception of my cousin Maggie. The PlayStation 2 was killer, too. I whomped my younger cousins at Tekken Tag Team, too, which was very cool, 'cause they have traditionally been much better than me at video games (go figure). "Who's your daddy?!?!", "Hey Chris, let me show you how to DIE", and "You're so weak, your momma tried to give you up for adoption and the Lemming family wouldn't even take you" were all popular phrases during this session.

We watched the traditional annual showing of "Airplane"; a Squyres classic. Brilliant movie.

"Give me Ham on five, and hold the Mayo."

"No thanks, we gave at the office."

"The cockpit? What is it?"

"That's when my drinking problem started."

"It's a damn good thing he doesn't know how much I hate his guts."

"Looks like I picked the wrong week to quit sniffing glue."

"I've got to concentrate... concentrate... concentrate..."

Spent part of Friday working on mom's computer at home hooking up DSL to it. Part of the problem is that mom's Windoze installation is really broken somehow. It takes over 6 minutes to boot (i.e., to get to the Windoze login popup). It seems like it's timing out while looking for something during bootup, but I never figured out what it was. The first time I installed the DSL software, it was really flakey. It's with Verizon DSL, and they use this weird (IMHO) PPP-over-ethernet stuff. So you still have to "dial up" to get connected to DSL. And the IP address comes over that, too, so it's not regular DHCP. This kinda killed my plan to hook up my Linux laptop to their DSL and do things like check mail, etc.

There might well be some PPPOE Linux software out there; I haven't had a chance to check yet.

Anyhoo, I got it working more-or-less properly, but it didn't help much that the hard drive on that machine is failing. Every time I ran scan disk, it would find more bad clusters. Not good. The PPPOE installation finally go so flakey that I removed mom's pre-existing Netscape and the whole PPPOE installation and started from scratch (gotta love the non-deterministicness of Windoze!), and that seemed to make it much happier. I installed Zone Alarms firewall, too, which was kinda neat. Since they don't have a fixed IP, and aren't "connected all the time", the chance of an attack are smaller, but are still there, so I installed it. It's not perfect, but it's not a bad firewall.

I was going to spend some time on Saturday trying to figure out why it takes so long to boot that machine, but I caught some weird 18 hour flu that's going around the north east right now, and it killed me for the whole day. Tracy and I were supposed to go out to a nice dinner that night, so that didn't happen, either. Bonk. But the Irish had a convincing win over USC to round out our season. So it's probably a pretty good chance that we'll go to a bowl. Woo hoo! I've been hearing Fiesta, but I haven't been following it too closely.

Dad and I used the small business pricing from Dell to order a new Windoze computer for Tracy (she currently has a P1-133 with 24MB of RAM, which is painfully slow) under the "Wayne True Value" auspices (since my dad owns a small business, they apparently don't check, but it is legal since my dad bought it, and I do lots of consulting for him). It's actually supposed to come today (1 Dec), according to UPS.

I also got plane tix back to Philly in about 2 weeks to install DSL at his store. He's got a LAN of machines that need to be adjusted and whatnot such that DSL will be safe to install (need to change all the IPs, harden up the unix server a bit, etc.). I won't be able to use linux as a firewall (my initial plan) because of PPPOE issue, so I'll have to use Windoze 98's internet connection software (blech; although I will be looking for some Linux PPPOE software so that I don't have to do this).

Tracy and I flew back on Sunday morning without any major incidents.

I drove up to ND on Monday morning to get there in time for Kevin Barker's MS defense. It was all about the percolation model. Pretty neat stuff. It still performs poorly right now, but it's still in the early stages of development. He passed! Whoo hoo!!

Went to dinner with Kevin, his parents, and Shannon. It was good to see Shannon again; she's funny. She's also starting a PhD program (this Spring, IIRC) in Ohio; rock on! Dinner was good (at Basil's). Then I went back and hung out with Suzanne and Ed and watched a few episodes of Level 9. Not a bad show; it's got a good mix of techno geek stuff and action.

Kevin turned in his thesis the next day, and ended up staying that night as well, so a bunch of us took him out to the Mishawaka Brew Co for a few beers: Mike N, Dog, Jeremy, Ron, Shannon. Great conversation all around, and lots of laughs. Good to see/hang out with Kevin again. Perhaps he'll come back to ND; that would rock.

@#%#@%#@$ I had forgotten to fill out my reimbursement form for SC2000, so I had to wait until the CSE offices opened in the morning before I could leave to drive back home. I had to drive straight to my church where I'm doing some volunteer consulting with them for their various computer things (as I think I've mentioned before, they have a LAN with about a dozen windoze machines on it). We talked some more about DSL (we're putting it up to the budgeting committee in about 2 weeks), ordered a site license for Norton Anti-Virus, and discussed a few other random things. The anti-virus media should arrive in a few days; we planned on me coming back next week to install it on all the machines.

Planning for DSL takes a surprising amount of details:

Moving their web site; it's on some local Louisville hosting service now, but DSL comes with 20 free MB of web hosting space

Moving their DNS name for the same reason

Re-training their web masters to use the new location, not the old location (should be easy, but...)

Change the IP addresses that each machine has; I think they're random right now. I'll have to change them to be 192.168.x.y or 10.x.y.z or whatever.

They use AOL for all their mail now; this DSL service comes with 20 free mailboxes.

Changing all their e-mail addresses to be @churchofepiphany.com (and decide what the format of the e-mail IDs will be; an internal policy decision for them).

Ensuring that everyone's address book and web bookmarks can be snarfed from the AOL software to the new software.

Setup/ensure that the dialup works for the one workstation that they have off site (this DSL provides a free dialup for remote users).

Shut down the AOL accounts fairly quickly after this all happens to prevent the two-email-address syndrome.

Shut down the Juno account that the off-site user is currently using for the same reason.

As a good engineer, I have to document everything that I do for the above. Most importantly, however, what needs to be documented is the firewall configuration. This DSL service comes with a Netopia router which can also act as a DHCP server and firewall. It's supposed to be easy to configure, but we'll see. This needs to be documented because I won't be there forever.

Some other projects that they may wish to investigate after the DSL stuff gets all happily installed (probably mid-late January):

Shared fax on the LAN (should be easy, I think).

Group scheduling of resources (conference rooms, the community center, etc.).

Random training classes, perhaps even some "intro" and "advanced" kinds of classes.

Had all my four of wisdom teeth out yesterday, as well as one more molar that wanted (he wanted to keep hanging out with the wisdom teeth, apparently). I was knocked out for the procedure. I think I pseudo-surfaced in the middle of it, 'cause I felt some rather strong forces (not pain, just pulling, etc.) on the right side of my jaw. It only took about an hour, actually. Apparently, my upper right wisdom tooth gave them a few problems (nothing major), but everything else went fine.

My jaw was fairly sore all yesterday; they gave me some mild pain killers and some antibiotics so that nothing gets infected. I go back next week to have my stitches removed. All in all, it wasn't nearly as eventful as I thought it would be (I guess I expected much more pain). My jaw is still fairly sore, and I'm not back on solids yet (checking is somewhat painful), but that's supposed to go away in a few days.

It's funky, though -- I can feel the end of the line of teeth with my tounge where that last non-wisdom molar used to be (on the upper left). So I can feel the end of my tooth line, which I have never been able to do before. Funky.

I'll be heading back to ND next week to visit with Bemen Dawes (sp?) from the Boost group. He's coming to visit with Jeremy, Rich, and Andy. I'll tag along for usability and other kinds of user-concerns, but probably not too much in the design and other stuff.

December 2, 2000

Your *what* hurts?

It's not really accumulating, but it is kinda nice (I'm a cold weather person).

My jaw is doing ok; a bit sore, but manageable. I actually managed to have a few slices of pizza last night.

Tracy's new Dell 800Mhz came yesterday, and I spent a good amount of time setting it up. Copied a lot of stuff from her old Windoze machine to this one (it was a pain in the butt to export/import her addressbook and message folders from Outlook 98 to Outlook Express 5.5.1 [Outlook 98 isn't available anymore, and Outlook 97, which I have on CD, doesn't do IMAP]). Finally got everything over, though.

One annoyance, though -- Outlook Express has some nice rule-filtering capabilities such as "take messages with such-and-such subject and automatically put them in folder foo". Very handy. But it doesn't work with IMAP inboxes! Why not?!

Got the latest distributed.net client and installed it on there, too. But it only seems to want to do RC64 --
it simply won't do OGR. Weird!

The new version of mojonation sucks. It keeps coming up with an error than causes it to lockup. That is, it only runs for about 5-10 minutes and then locks up (something to do with bad XML parsing). Woof. I also wonder why they don't use their mojonation-announce list to announce new versions; it's advertised on their site, etc., but I never get announcements from it. I only get new versions when I happen to notice them. Weird.

Just a few quickies today. Gonna spend some time on the ogg vorbis encoder and LAM today (-pty fix in Linux and still trying to get some reasonable fault tolerance issues worked out).

December 6, 2000

Kudos to you, sir. And kudos again!

These are interesting times that we live in.

A series of random things have been occurring. Hence, this will be a random journal entry, written while I eat my lunch.

I have discovered that the HTML element <HR> is not always centered by default, particularly when it is less than 100%
of the width of the browser. I don't know if this is specified in the HTML spec or not, but I have found that KDE's Konquerer browser (which actually isn't a bad browser, surprisingly enough!) does not automatically center the following:

<HR WIDTH=50%>

which means that all thousands of screaming jeffjournal fans out there that are viewing my journal archives in Konquerer are wondering why there are half-line separators on the left in their browsers. Oops.

I have now mended my ways, and write dramatic half-line separators like this:

<CENTER><HR WIDTH=50%></CENTER>

All is right within the world.

Along the same lines, it's quite tiresome to type out "<CENTER><HR WIDTH=50%></CENTER>" (particularly when you have to escape it to write it in example form so that you can read the "raw" HTML in HTML). So I think I need to add some special "escapes" to the jcc such that things like this are automatically done for me.

I'm thinking of escapes for:

Dramatic half-line separators

URLs will automatically be linked

In a wikki/doctext-kind-of-way, make lists easier (where "easier" == "use some abbreviated syntax that jjc will expand into the correct HTML for <UL&gt / <LI> / </UL&gt, etc.")

In a wikki/doctext-kind-of-way, make linking be easier (similar definition of "easier" as above)

In a wikki-kind-of-way, make <code> and <strong> and <em> be easier, 'cause I use them all the time.

Also, Lummy has managed to get SourceForge running on our web server. He claims that some of the features in their diary stuff are superior to jjc (not surprising). However, I might have to steal some of them and put them in jjc, since I've kinda grown attached to it.

Stealing such features, however, has a fairly low priority.

I'm actively working on a parallel ogg encoder (stole the oggenc code, adding a whole new parallel personality to it, and renamed it to poggenc -- oggenc effectively has a brother now). I re-read my white paper on generalized manager/worker using both threads and MPI (and wow, it was long!) to remember all the thoughts that I had about that. Found a few minor errors, and was annoyed to discover that my formulae at the end were just about entirely wrong. Math sucks.

So I've coded up a bunch of the framework so far, and have started classes for the input, worker, and output threads. I've added the necessary #define's for thread safety within Vorbis (which is isn't yet, in the way that I need for this -- it can't handle multiple threads simultaneously encoding on the same stream), and #define's for MPI. Seems to be going well.

Happily hacking Abstract turns into concrete Parallel vorbis

Hacking hacking hacking...

I was manually archiving the web logs on www.lsc.nd.edu yesterday (really gotta finish automating that process someday...), and I did the normal "bzip2 combined_log". I did this on a Hydra node (400Mhz UltraSPARC II). After a good many minutes, it didn't show any sign of finishing (the logfile was approximately 189MB).

I had anticipated it to take a while, but it was actually taking longer than I expected. The idea popped into my head: "I wonder how much faster the new Sun UltraSPARC III would be able to do this!" We actually have a SunBlade (750MHZ ULTRASPARC III) on loan from Sun (shhh!!!), so I copied the log file to its local disk and started a bzip2 of it (Solaris 8 seems to ship with bzip2 -- rock on).

After 50 minutes, the SunBlade finished. The Hydra node looked like it was about 1/3 of the way finished. Ouch. I then though, "uh oh -- I don't know if these two versions of bzip2 are the same; am I comparing apples and apples?" So I checked. Oops -- the Hydra was using bzip2 0.9.0b and Solaris 8 (the SunBlade) had bzip2 0.9.0c -- both of which dated back to 1998.

So I went out and found that the current version of bzip2 is 1.0.1. I downloaded it to both machines and compiled it with "-fast -xarch=native -xtarget=native", and re-ran the test (both from the local hard drive, of course).

The SunBlade finished in 7 minutes flat. The Hydra node finished in about 15:30. Wow.

Morals of the story:

The bzip2 that we have (had) on AFS sucked. I recompiled the new one with optimization and put it out on AFS. I got the OIT to update theirs (Solaris 7 tree), too.

The bzip2 that ships with Solaris 8 sucks.

The SunBlade was slightly more than twice as fast as our UltraSPARC II. But that only naturally follows, 'cause its clock speed was almost twice that of the Hydra node.

Just a few interesting data points, nothing more.

In Lummy's never-ending quest for good web collaborware, I found a bug-tracking system called RT, that seems to be a web-ified version of ANL's req system.

It seems to be pretty nice -- it has a bunch of features without being overly complicated (ever had a look at bugzilla? I can't even understand how to use that thing!). It doesn't do everything, but it seems to do most of what we need. Most importantly, IMHO, it has an e-mail interface (something that Jitterbug lacks), so the admins don't have to go to the web page to do quick-n-dirty bug tracking things.

I set it up on my router and let some of the guys in the lab play with it. General consensus was that it wasn't bad. I tried to get the CVS copy of RT going (has a bunch more features than the current stable release), but it seems to be not-quite-ready for prime time yet. We'll have to wait for that, I guess. :-)

Lummy said he might try and tie RT into the SourceForge that is running on lsc.nd.edu, but it's more likely that he'll just make some kind of primitive e-mail interface to the bug tracking system that is already in SF. Or, it's more likely that we'll all just bitch about it and nothing will get done. :-)

I noticed something annoying about the CVS version of Vorbis. Some background...

Vorbis is the music format. Ogg is the file format. That is, you pack vorbis data into .ogg files. There are separate libraries to do each. Additionally, there's a third library do write output to sound devices called ao. So to compile oggenc (the ogg/vorbis encoder), you need to configure it with:

--with-ao-prefix=DIR --with-ogg-prefix=DIR
--with-vorbis-prefix=DIR

This seemed pretty silly to me, especially since you typically install all three libraries and oggenc into the same place. So I hacked up their .m4 files to check the $prefix if the corresponding --with-* option was not specified, and submitted it to the voribs-dev mailing list. We'll see if the patches are accepted (they were really only a few lines of shell script; not rocket science).

However, the vorbis-dev mailing list seems to currently be down. I see from the web archives that a few posts (including mine) have been sent since last Friday, but I haven't received any of them. I sent a few queries but have heard nothing back yet. Hmm.

Speaking of oggenc, I have now done much coding of a parallel version (the part above about working on poggenc was written yesterday; I just haven't submitted this journal entry yet :-). I have been following the design laid out in my white paper about mixing threads and MPI for multi-level parallelism, and it seems to be going well. Most of the infrastructure is done, and I'm just starting to code up the parallel aspects (shipping the audio data to remote nodes, shipping the ogg data back, etc.).

When I have some semblance of a working copy, I'll probably ping Dan at Scyld again.

I obviously didn't go to ND as planned this week to meet with Beman. I hear his trip was a great success and many great things were discussed, but I simply couldn't do 9+ hours of driving this week over a 2 day span; it was just too much. I guess I'll meet him some other time.

Tracy's new 'doze machine seems to really chug through RC5 packets. And it has recently decided to start doing OGR packets (I couldn't get it to do OGR before; go figure). The distributed.net client seems to suspend itself unpredictably, however. For example, I left the computer on since yesterday for the sole purpose of RC5 hacking, and turned off the monitor. This morning, I turned it on and the last activity in the distributed.net log was from yesterday.

Weird.

I watched the SciFi channel's Dune saga, parts 2 and 3 (missed part 1). Not bad. I saw the original Dune movie a while ago, and I guess I rate these two as about the same. Some of the special effects were good (in the new one), some kinda sucked. I read the original Dune book, but none of the sequels.

But I'll probably watch the sequel movies when they come out; I enjoyed this version of Dune.

I went to have my stitches out today from the oral surgeon (had my wisdom teeth removed last week). No big deal there. The doctor, coincidentally, is an ND grad (he mentioned it when he saw my ND varsity jacket). I knew that I liked him for a reason.

More interesting, however, was my drive to and from the doctor's office. It solidified my understanding of "every action has an equal and opposite reaction."

The office is a few miles away, and I basically take one road to get there (a fairly main traffic artery). There are many lights between my apartment and the doctor's office. On the way to the office, I only had to stop for one red light. On the way back, almost all the lights were red.

Everything seems to work out evenly.

However, I noticed that the muffler is going out on my car. @#%@#$%@#$!!!!

And of course, in the 45 minutes that I was gone (which was the only time I left the apartment during the business day all week), I missed an Airborne Express shipment from outpost.com (free overnight shipping!) with Turbotax. Quicken/Turbotax is the only reason that I use a Windoze machine with any regularity. For those who don't, and if you happen to have a spare 'doze machine lying around, I highly recommend them. They're great products (I wish they had Unix equivalents; there's gnucash, which, as I understand it, is more or less like a less-mature Quicken, but no equivalent for Turbotax).

December 13, 2000

But honey, there'll *always* be women in rubber flirting with me...

Went and saw the play "Rent" with Janna and Tracy last week. It's a good play, depressing and cathartic. I've seen it before (in London), but none of the others had. We were supposed to go to Tracy's work Christmas party afterwards, but the play wasn't over until about 10:30, and by the time we would have gotten there, it would have been over.

Went and had a beer with Janna afterwards, which is always cool. As Anna likes to remind me, "they're my only friends in Louisville" (which isn't far from the truth!). I haven't had much opportunity to get out and meet random people here in Louisville, but this doesn't really concern me. I think that after I get my Ph.D., I might try my hand at an adjunct faculty position at the University of Louisville or something -- something to get me in touch with the geek crowd down here. Who knows.

But until then, like I said, I'm not really too worried. I do have a small group of friends down here, and I've met several of Tracy's work colleagues; they're all nice folk.

Flew to Philadelphia at an absurdly early hour on Saturday morning. Drove from the airport straight to dad's hardware store and started to work on hooking his store's LAN up to Verizon DSL. This actually entailed several things:

There is one SCO unix server on the LAN, 3 windoze 98 machines, and 6 DOS machines (!). This was actually mentioned in a recent /. article -- the DOS machines are fully functional; there's really no need to replace them. They use attractive ANSI graphics to do inventory queries, price lookups, etc., etc. 3 of those 5 DOS machines are actually cash registers, and are quite functional (and have been for many years). All 5 machines are very dependable. Sure, there have been a few random quirks, but for the most part, they have served remarkably well over the years, and continue to do so.

However, whoever at True Value designed the network did so poorly. It seems like the IP addresses were chosen at random. As such, they were unsuitable for connection to the internet. So I had to convert all the IPs to be of the form 192.168.x.y (one of the approved private networking domains). Changing the Unix machine was easy. Changing the windoze 98 machines was also fairly easy. Changing the DOS machines proved to be a little more work -- their TCP driver is loaded dynamically from the DOS command line (it's a .SYS file loaded by a proprietary INET command for their TCP stack). But get this: the IP number, netmask, hostname, etc., etc., are all encoded in this .SYS file.

I had to ressurect the memories of how to alter the .SYS files out of long-term storage (the percolation model, but several cycles were wasted during stalls while waiting for the memories to surface). But in the end, I triumphed. Actually, I have to hand it to those old DOS programmers -- once I remembered how to do it, it wasn't too bad of an interface for the time (it's all command-line driven).

So I got everything on 192.168.x.y. Just for the heck of it, the Unix server was alone on 192.168.30.x, the 3 windoze 98 and 2 DOS "lookup" machines were on 192.168.20.x, and the 3 cash registers were on 192.168.10.x. Everything appeared to be working smoothly.

First glitch: Dad opened up at noon on Sunday (special Christmas hours; he's not usually open on Sundays). At 11:59am, I get an intercom call from him, "The credit card functionality in the cash register isn't working -- it freezes up." Doh! I had tested cash transactions and they all worked fine, but I hadn't tested credit card transactions (the cashier swipes a card in a slot that it built in to the computer keyboard -- pretty slick, actually -- and the cash register makes some TCP or RPC calls to the Unix server [not sure which; I've never been privy to the internals of the True Value code] who has multiple modems that it uses to make the outgoing call to the credit card center, verify the data, etc. There's a little progress screen on the cash register [ANSI graphics, mind you!] during this time: "Looking for modem" / "Dialing" / "Sending" / "Waiting" / "Approved").

It took over an hour to figure out what was going on. Actually, my dad spotted the problem without realizing it. While rebooting (one of several while we were trying to figure this out) to get avoid the "hang" produced by the faulty behavior, he said, "Hey look at this --
one of these status messages that zips by quickly in the beginning flashed a negative number. Does that mean anything?"

It turns out that the cash register was calculating its ID incorrectly. In the True Value system, cash registers are numbered sequentially from 1 (stupid Cobol programmers -- they must have married Fortran programmers!). Each physical cash register has a fixed ID that never changes. It seems that our 3 registers now thought that they had IDs of -17, -18, and -19. Doh!!

So even though the main cash-register-processing-routines in the Unix server (every transaction is transmitted back to the main server in the back room) were happily accepting most transactions from negative-numbered cash registers, it seems that the credit card authorization routines were saying, "Hey -- you're a negative number cash register. This must be a mistake. Go away." And therefore the cash register would hang, because it would either not get a response from the server, or it would get an error response that it didn't know how to interpret.

So that identifies the problem (always an important -- and frequently overlooked -- step). Now, what was the cause? The only thing that I had changed was the IP address.

No way.

Way.

It seems that the True Value programmers calculate the cash register's ID number off the IP address. More specifically, if the IP address is w.c.y.z, the cash register's ID is (20 - z).

No, I'm not kidding.

The /etc/hosts file on the Unix server with all the original IP addresses had the cash registers starting with x.y.z.20. I had changed them to be 192.168.10.1, 192.168.10.2, and 192.168.10.3. Doh!! Fixing them up to be .20, .21, and .22 solved the problem.

How fucked up is that?!?

For the next part, for a long series of reasons that really aren't worth going into, we decided that my dad's windoze 98 desktop machine would be the DSL gateway into the LAN using Microsloth's Internet Connection Sharing functionality. So I first disconnected dad's machine from the internal LAN and brought up Verizon DSL on it. I installed a firewall, got everything working, etc., etc. They use PPP-over-ethernet as opposed to standard DHCP-style setup. Hence, it actually uses the Windoze dial-up networking functionality to establish a DSL connection to the internet. Really weird. I read some RFCs and position papers about this (PPPOE is actually standardized, and will be in the mainline Linux 2.4 kernel), but I still don't see the benefit. "You mean I still have to invoke kppp to activate my 'always on' connection?" It just seems weird to me.

So I activated the Internet Connection Sharing (ICS) stuff, and noticed that it changed the IP address from 192.168.20.x to 192.168.0.1. Hum. I'll bet that was for a reason -- default router on a network, etc., etc. So I looked it up in the online help -- sure enough, it says that you can use addresses in the range 192.168.0.2 through 192.168.0.253.

WHOA!! WTF?!? They changed a class C private network to a class D!! $%@#$%@#% I can see forcing the ICS server to be 192.168.0.1, but why the hell did they make a netmask of 255.255.255.0 instead of 255.255.0.0?!? That annoyed the hell outta me. And it had the following consequences:

I had to go change the addresses on the other windoze 98 machines to be 192.168.0.x.

Even worse, I had no DNS IP numbers to put in the Windoze 98 machines, since PPPOE "takes care of this for you" (yet another aggravating aspect of PPPOE -- DHCP can do this as well, but you can still manually override it on the client, if you want. You can't override it with PPPOE). Indeed, Verizon wouldn't give me their DNS server IP addresses for this very reason, "It's not supported". I managed to get them myself (whois and nslookup, no rocket science there), but they're not accessible from inside their internal DSL network. Arrggghh!!

Hence, I had to fully surrender the windoze 98 machines to the ICS setup: I had to set the clients to use DHCP to get an IP address (apparently the ICS server turns into a DHCP server on the local network), disable DNS, and enable a setting labeled "Use DHCP for WINS resolution", which apparently did the DNS resolution stuff.

I'm not sure what WINS is, but I thought it had to do with NETBIOS stuff. Apparently not...?

What further sucked was that these machines now had an automatic netmask of 255.255.255.0. Which means that they couldn't reach the Unix server, because it was on 192.168.30.1 -- i.e., outside the netmask range, so they were sending packets to the default gateway who had no idea what to do with them and probably dumped them on Windoze's equivalent of /dev/null.
So I had to put my unix server on 192.168.0.250. This is actually risky, because since the ICS server is a DHCP server, 192.168.0.250 is in the range of addresses that it is allowed to give out. Hence, I could have an IP address conflict. I can only hope that the DHCP server does the Right Things and sticks to low numbered IPs (there are only 2 clients, after all), and re-uses them when new DHCP requests come in.

That just really burns me up -- that the stupid Microsoft programmers automatically assigned a class D netmask to a class C network. Only assigning 0.1 through 0.250 via DHCP would be fine, but the prohibitive netmask prevents [safe] interoperability with anything else that is not Microsloth on the same local network. I suppose that I shouldn't be surprised by this, but it still sucks. In hindsight, I should have done this with a Linux router and it would have been much easier and less time consuming. <sigh>

MICROSOFT SUCKS!!!

I also had to reconstruct my mom's windoze 98 machine at home. It was basically so broken that it required a full reinstall. To make matters worse, the physical C drive had bad clusters on it, and every time you ran scandisk, it would find more bad clusters. Hence, it was going bad slowly. Not to worry -- we had a second physical disk already in the machine. So I just swapped the two disks (there are 2 IDE interfaces in the box; the boot disk was the master on one by itself, the second was a slave on a interface with the CDROM) and reinstalled on the old D drive.

Whoops -- the machine now takes over 5 minutes to boot (!). There were three distinct locations in the boot where it appeared to stall
-- waiting for something for which it apparently eventually would time out and continue. I only discovered today (i.e., after 2 days) that swapping the disks back to their original positions on the IDE interfaces (even though the boot disk is now different -- changed in the BIOS) eliminated two of the three delays in the boot time. I couldn't believe that this was true, so I swapped the disks and changed the BIOS settings back, and sure enough, it timed out in 3 places instead of 1. That's fucked up. The Intel architecture sucks.

And as for the last time out during the boot -- I have no idea what the heck that was. Dad has a total of 3 machines from this company, and the other two don't do this. It even did it after a fresh, clean install of Windoze 98, so it must be something in the hardware. That's fucked up.

Dad was a bit disappointed in the e-mail performance of Verizon's SMTP servers. It was really slow to send mail this weekend. Much slower than his old dialup account (56k). Indeed, several times it timed out and we had to click "send" again in his e-mail software.

A call to the Verizon help desk got a recorded message, "Verizon customers may be experiencing difficulty sending and receiving mail. We are aware of the problem and are working to fix it..."

It turns out that Verizon got heavily spammed. A /. article about it said that Verizon is convinced that it was deliberate and malicious. Apparently, they brought up more servers yesterday to try to cope with the load, but are still trying to automate the spam rejections.

Along the same lines, my dad does a lot of stuff for the ND Alumni Club of Philadelphia. One of the things that he does is send out mass e-mails to both the club members (it's a pretty active club, actually) and students on campus from Philadelphia (e.g., he passes along e-mails about rides home for students). Hence, he can send out an e-mail with several hundred BCC recipients.

His old ISP was a small local firm, and he got permission to do this. Verizon's max recipient count is 40. And especially in light of the spam problem they had this weekend, they were not interested in raising it at all. So I have to setup some special stuff on lists.squyres.com for Dad to relay his messages out. The problem is that he has a database with the e-mail addresses in it that he sends to, and it gets updated frequently (multiple times a week). So he needs an automated mechanism to import a whole new list of subscribers and completely ditch the old list. I'm thinking of some scripting with GNU mailman to make this work...

I hate Windoze. I am so glad that I don't have to use it on a daily basis. I can't imagine how people actually get work done with it. A very large company that I know (no names mentioned...) has their employees run "weekly updates" from the IT department on all their Windoze boxen (i.e., it runs automatically when you boot up). If it's Monday, it asks if it should run the weekly update. You can click on "No, please delay the update; do it later" up to 3 times. After that, it will run the update no matter what.

The updates routinely take around 2 hours -- if it runs smoothly, which they usually don't. The update frequently hangs/crashs in the middle of the process (your only indication of this is if you happen to notice that the hard drive light stops blinking for an extended period of time, upon which you have to reboot and start the update over again). You also can't use the computer during that time. They don't schedule them to run when no one is there (e.g., 3am), because to save money on electricity, everyone is required to turn off their workstation at night. Hence, this weekly update procedure is guaranteed to make their computer unusable for at least 5% of their ANSI standard work week (2.5% when the standard gets updated to reflect common practice).

I just can't imagine having to work in an environment like that.

On the drive between my dad's store and my parent's home, I saw a most curious thing: a cell phone tower that is disguised as a pine tree. It is painted brown and has evergreen branches on it. It's still a dead giveaway because it stands out much taller than any of the trees around it, but it did cause me to do a double-take.

I have to say: this election is working out exactly as rmurphy4 predicted. The latest bit: FL legislature picking electors.

December 15, 2000

The Moog Cookbook

I swear that Perk's older brother works in the MailBoxes, Etc., here in Louisville.

It's either him, or someone that looks exactly what Perk will look like in about 5-7 years.

It is 4:53pm.

Dad's network is finally alive again.

And I don't know why.

At time T, we were at state A.

It didn't work.

We changed one thing, theoretically moving to state B.

This, of course, entailed a reboot.

State B didn't work. So we changed back to state A.

And rebooted.

Suddenly it worked.

And before you ask, I'm quite sure that we only changed one thing, and then changed it back. Yes, something changed during that time, but it sure as heck wasn't from something that we did -- Windoze did something internally.

I think that this is what bugs me most of all about Windoze -- its nondeterminism. It doesn't matter how smart you are, nor how much you know about computers: sometimes it works, sometimes it doesn't. There are absolutely no guarantees about consistent behavior in Windoze.

I'm just bitter at the end of a long, frustrating day where I got absolutely nothing done. <sigh>

What if we read the space news?

I've been attacking the stack of mail in my inbox from my foray to Philadelphia.

Lummy asked me to step into the Boost discussions about directory structure and whatnot. They do not seem to understand the necessity of separating a source code tree from an installed tree. We'll see where this discussion goes.

Had scads of LAM mail, both from the list and people who mailed me individually. Plowing through all of that...

Got some good responses about parallel vorbis. Turns out that since vorbis is a differential encoding method, it will require the same technique that I used in parallel bladeenc -- sending some redundant input blocks to each processor in order to "build up state", so to speak. This is kind of a bummer; it means that the parallel output will likely not be diffable against the serial output. But some of the vorbis developers indicated that the general idea should be able to work. Hopefully, I'll be able to work on this later today.

Lummy bought me a webcam (Intel Personal Camera or something like that) for use in teleconferencing up with ND. I can use my headset with it, which is trez kewl. It's also detachable from the computer and can serve as a portable digital camera. It's not the world's greatest camera or anything, but it could prove to be useful. Brian and I had a few difficulties getting a netmeeting going on between squyres.com and nd.edu last night. Apparently, the fact that my Windoze box is on a private network behind my router is the problem. Brian/Pete found a kernel module for the NAT that should fix the problem, but I haven't been able to successfully compile/install it yet. We'll see how that goes; could be really useful in terms of communicating with the Home Office.

Windoze really really sucks. My dad, after I got DSL all working and whatnot -- his desktop computer is the Internet Connection master for the local LAN in his store, i.e., DSL comes into his machine and is routed to the rest of the local LAN from there -- was having problems with a tax program that he uses to pay taxes on the salaries for his employees. It's supposed to dial an external phone number through the modem and then Do Its Thing. But for some reason, it didn't work last night when Dad tried to do it.

So Dad called tech support for this program this morning. Their advice? "Yeah, we've had problems with people who have DSL -- even if you still use a modem, our software doesn't seem to connect properly. Let's try something; go to the control panel, network icon, and remove the entry 'Dialup Adapter'..."

Needless to say, this fucked up everything!.

A bit of background: Verizon DSL uses PPP over Ethernet (PPPOE) instead of normal DHCP stuff. PPPOE uses dialup connections to establish connectivity (it's weird; supposedly "it's easier on the user, because they already understand the dialup-to-connect concept". I think it's just stupid -- DHCP can do everything that PPPOE does, and not have to go through an additional dialup step). So when the tech weenie had Dad delete the dialup adapter, that totally fucked up the DSL connectivity.

I've now been on the phone with my dad for over 3 hours and it still doesn't work yet (we've finally managed to get the DSL connectivity back, but the internet connection sharing to the rest of the LAN is not working again yet). Windoze absolutely sucks.

I'd be willing to bet that the first few steps of the tech support checklists in just about every Windoze-based support center go something like this:

Ask the following question: "Have you rebooted your machine since the problem started?"

If the user answers no, have them reboot and move on to the next caller.

Ask the following question: "Have you uninstalled and reinstalled the product that you're having a problem with?"

If the user answers no, tell them to uninstall/reinstall and move on to the next caller.

December 19, 2000

Who the hell do the "Citizens for Broadcasting Decency" think they are?

I finally got my muffler replaced yesterday.

That was $140 I didn't want to spend. Ugh!!

A series of random thoughts:

Getting my muffler fixed (actually, it's the pipe between the engine and the muffler, hence it wasn't covered by the warranty) and running a bunch of Christmas errands took a good portion of my day yesterday.

I talked to Darrell on the phone and found out that his parents live just a few miles from Tracy's parents down in Florida. It's a small fricken' world!

I love the ability to track my UPS packages on the web.

For those of you who keep bugging me, yes, I did screw up on a previous journal entry -- I meant to say class B/C networks in the entry about DSL and MS's internet connection sharing, not class C/D. It was late, and I was tired when I wrote that entry.

Turns out that MS netmeeting doesn't work between two different private networks. i.e., from my 'doze box behind my router (which has a 192.168 address) to another box behind a different router (which also has a private IP address, perhaps of the 192.168 variety). It's not a shortcoming of netmeeting itself, per se, it's a shortcoming of the underlying protocol -- H323. Bummer. However, with a special H323 NAT module in the IP masquerading stuff in Linux, you can do a netmeeting between my private windoze box and somewhere else directly on the net -- as long as the box on the private net is the one who initiates the call -- which is quite handy.

I wrote a schload of Christmas cards yesterday (will send them today) along with our obligatory Christmas letter. I'll post it here in a few days; gotta wait for people to get the snail mail version first. :-)

December 21, 2000

The Real Deal with Bill McNeal

PSR is the Password Storage and Retrieval system that we use with OpenPBS to get AFS authentication with PBS jobs.

We've had problems with our installation of PBS over the last several months (ever since we upgraded to OpenPBS, actually). It turns out that one of the components of PBS, the Mom (a daemon that runs on each compute node and manages the user jobs that are launched on it) was at fault.

Actually, it was our patches to the Mom that were at fault. We had to patch the Mom to include bits to launch some PSR kinds of things (first, a program to get an AFS token, second, a program to "shepherd" the user's job and re-up the AFS token before it expires. Doing this allows a user's job to run for much longer than the life of their token -- their token is magically renewed for them for the entire life of their job).

We used the popen() call to invoke these two commands. Unfortunately, we didn't think that popen() would have the child process inherit the open file descriptors from the parent. But it does. Doh!!

Specifically, the Mom has multiple sockets open, including one that it is listen()ing on. To make a long story short, having multiple processes open sharing the same socket is a Bad Thing, and it caused Ickyness in PBS's runtime because it typically disrupted PBS's internal protocols.

Adding the following code to the beginning of the PSR executables solved the problem:

for (i = 3; i < sizeof(fd_set) * 8; ++i) close(i);

However, I still think that this is not perfect -- not knowing the internals of the Mom, I think it is still possible to get a race condition where Badness can occur. This can happen if the PSR executable is launched and them some Event happens on the socket before the PSR executable is able to close it. I think the real solution is to make the sockets be close-on-exec in the Mom, but I'm not sure. I've mailed the PBS guys to see what they think.

If you don't already use inilib, you need to. It will save your life! I classify it in the same category as the STL -- you could write something to do the same thing, but why?

inilib is a C++ library that reads and writes .ini files. While this in itself is unremarkable, its cool aspects include:

Simple 2D array-like accessors. For example:

foo["section"]["key_name"] = keyvalue;

Small API; easy to remember and use

Automatic write-upon-destruction semantics (if desired)

Script-like automatic type conversion semantics. This is truly cool. By abusing some of the properties of C++ on the back end, we can do things like this:

More to the point, you can use the inilib objects like Perl or PHP objects -- all the type conversions are automatic and safe. This is utterly cool.

So anyway, go start using inilib. You'd be surprised how often you want to save a config file from your program; inilib just works.

My sister got engaged last night. It was a typical dinner-romantic-walk kind of proposal, but I'm sure that Alan delivered it with style. Needless to say, Terry accepted. Woo hoo! So we'll have another Squyres wedding in the next 1-2 years. Alan's a good guy; I think he'll make a great addition to the family. Now we get to meet his family (who all live in Indiana, not too far from ND, I might add!).

December 22, 2000

I believe in straight lines

A Motley Bag o' Notes today.

By my logs, K-Mart calls are definitely on the rise (my home phone number is the same as K-Mart's, but with 2 digits swapped). Must be because of the holiday season. The callers are also getting more and more polite -- most people are saying "sorry". Interestingly enough, there's four standard responses from people during a given K-Mart call:

Ok

Thanks

Sorry

[click] (i.e., hangup)

I've been keeping logs on this in a flat text file. Someday I gotta hack up a PHP script and a MySQL database to do this online so that everyone can see the numbers...

Bob from Veridan (is that what they're called these days) e-mailed me with a better solution to the PSR problem with PBS. Apparently, there's an internal PBS MOM call named fork_me() that does all the Right Things to fork a child process. This is orders of magnitude better than the popen() that we use now. Oops. I'll have to go back and fix that one up...

Progress on poggenc is coming along swimmingly. I now pass the input .wav data all the way through the five distinct states in the state machine (input, input queue, encode, output queue, output) -- there's three separate progress bars to watch (eye candy!).

The progress bars, themselves, turned out to be an interesting sub problem -- you want to update them all the time, but only want to actually display the value periodically. And you only need to display it if it's different than last time. But then you run into a problem when you have more than one file. So here's a sample progress line for one file:

foo.wav |********75%** ||********73%** ||********70%* |

How exactly do you show the progress of multiple files (it's quite possible that multiple files are being processed simultaneously) without having to link in a curses library? Showing one line is simple -- you just output a \r instead of a \n (I think that even works in 'doze as well). But with multiple lines, without the ability to make the cursor go "up" a line, you can't do it.

So I punted for now and just have it redisplay the whole thing again if there's more than one file being processed simultaneously. This isn't the main focus of the work, after all. :-)

Things still to do for poggenc (in no particular order):

add overlap of inputs. This is mainly a function of the input queue; need to add some extra logic to save a few readsets of the input from the tail end of every dequeue of the input and prepend them to the next dequeue.

add vorbis/ogg processing.

MPI stuff (only does threading for now).

juice up the eye candy. If it's not worth watching, it's not worth running.

xmms continues to crack me up. I think I've mentioned this before here in the journal, but there's a thread leak in it such that every song it plays launches a new thread. This thread never dies. And since in Linux threads are implemented as processes, you can see how many threads are running.

As of right now (9:52am), I have 331 xmms processes on my Dell desktop.

And Bill, don't let those fat bastards in Congress stick it to you

I have declared today "Annoying Female Vocalist Day", or AFVD, for short.

It's all queued up in xmms -- I'm set for hours of uninterrupted AFV's.

Sidenote: That is a definite benefit of digital music; you can just queue up hours and hours of music and then not have to futz with it. It's pretty much the same reason that they came out with 3- and 5-CD players. But with digital music, you can queue up [more or less] an infinite amount of music -- you're not just restricted to 3 of 5 CD's worth. There's been many a workday where I've queued up move than 18 hours of music when I start working, and then don't bother with my music for the rest of the day.

Sidenote: I love my telephone headset. It has a noise-canceling mike; I can have my music on fairly loud in the background and the person I'm talking to on the phone can't hear it at all. It's also stereo -- it has 2 earphones, contrary to most headsets. I find this to be extremely useful. It also allows sounds to be piped in from my computer -- so I can play MP3s directly over the telephone. While not an amazingly useful feature, it has been practical once or twice.

Back to AFVD.

We're starting out with Alanais Morisettet; I've got about 1.5-2 hours of her queued up; Supposed Former Infatuation Junkie and Jagged Little Pill.

Then on to Bjork -- both Telegram and Debut. That voice; my God
-- how did that happen? Hearing these albums always makes me feel like I want to donate large sums to children's charities.

We pour salt into the wound by following up with Ace of Base. True, AoB isn't completely female, but they satisfy both requirements of a) having female vocalists, and b) being annoying. We've got The Bridge and The Sign from AoB.

Then Erasure. Ok, not female at all. But he sounds female, and is definitely annoying. I Say, I Say, I Say.

The pain stays alive with Jewel -- Pieces of You and Spirit. Annoying to the max.

The agony continues with Loreena McKennit, The Book of Spirits. There's that one cool song on there that has a really bizarre-o video with some midgets running around, but other than that, it's annoying.

Nina Hagen comes next. Just ask anyone in the LSC (particularly Dog) -- Revolution Ballroom is one of the worst albums of all time. This is why you're probably never heard of her.

P.J. Harvey's Rid of Me -- so aptly titled -- prolongs the horror. I think there's one song on the album with a few quiet parts so that they can pass it off as multiple cuts to the radio stations (after all, if you're not Floyd, they're not going to play 20-30 minute songs).

Suzanne Vega's 99.9F rounds out the mix. Just like P.J., I think there's one song on this album. Consider: monotone singing with a guitar. Need I say more?

So it's going to be a Very Long day. I'll have to rely on my coding skills and embroil myself deep in hackery to keep from having my spirit crushed. It will be a true testament to my abilities if I can come out of this day and still be sane.

December 31, 2000

The saddest of all keys

Just for posterity's sake, I'll make a brief journal entry.

Trip to FL to Tracy's parents for about a week was good. It was warm. We did nothing, and lots of it. I did take my laptop and spend the better portion of a day in the warm sunshine working on parallel oggenc, though.

Came back up here to a few inches of snow -- it's nothing compared to what other parts of the country are currently experiencing, but it's very unusual for Louisville to have this much snow for so long!

The last two days have been almost nothing but parallel oggenc coding/debugging. I've been pretty active on the vorbis-dev list since I got back; much discussion has occurred. Monty tells me that he's got a branch where the thread-safe encoding stuff is partially done (whoo hoo!!!); can't wait for that to become mainstream.

I've gotten pretty far with poggenc, and it appears to almost be working. Things yet to do:

Output ogg data appears to be getting hosed somewhere in the output queue. It varies from completely bad output to occasional "blips" in the output audio, but I think it's symptoms of the same problem.

MPI stuff hasn't been written yet; it's all threaded right now.

My stats displays will have to be re-thunk a bit. Right now, there's 3 separate displays (done that way on purpose; one for input, encoding, and output). But it might make it easier to have a single object for all three (rather than three objects) because I think I want to have an "oggenc compatibility mode" stats display where there's only one progress bar so that programs like grip can run poggenc without having to analyze a different progress bar.

All the steam shutdown bookkeeping isn't written yet; I don't think there's much, but I'm concentrating on getting it working before I make all the memory cleanup at the end work properly.

The first item is the focus of my current work; it's been a bear to find some far -- I can't manage to track it down somehow. We'll see...

Heading out to a New Year's party later tonight which should be fun. I'll probably be up at ND later this week (not for sure yet, but likely).

January 1, 2001

My car just hit a water buffalo...

Of 1004 processes running on my desktop, 897 of them are xmms.
Even with the thread leak in xmms, I have 107 processes running on my desktop. I'd like to see Windoze do that.

Er... actually, no I wouldn't.

I caught one of the lead developers of Ogg/Vorbis (Monty) on #vorbis (IRC) today -- it marks the first time I have ever used IRC, actually. Amusingly enough, when I tried to run one of the stock IRC clients that comes with Mandrake, it fired up Gnome for me! I'm a KDE user (for no particular reason; I won't participate in any WM religious wars). So now I somehow have both Gnome and KDE running simultaneously. Amusing.

Monty answered a question that has been causing me fits for 3 days now (audio output incorrect when queueing up ogg packets for later writing, see yesterday's journal entry). That should fix up the Big Problem with poggenc.

I'm redoing the stats bit right now; more robust, less flakey, and displays a spinner reliably now. Almost done. After that's done, I'll incorporate the fix from Monty and see if that does the trick (it should; I tried it in a different context). Then to finish up the bookkeeping issues and test for memory leaks (I'm thinking that there are many...). bcheck is your friend.

Monty also tells me that the thread safe stuff won't be on the main CVS trunk for a week or two yet; I'll have to put in the duplicate-input stuff in the input queue. I was hoping to avoid that. Oh well.

Until about 3-4 weeks ago, I thought she was working on 2 models: 1 gas, and 1 electric. But she's really been working on about 90 different models! Yes, nine-zero.

Cool! (GE even did some kinda cool stoopid-browser-trix things on that website, too)

They just started going down the production line a few weeks ago (which is pretty cool in itself); they won't be available in stores for a little while yet.

My wife rocks.

I've been doing a lot of development work with poggenc. The first generation is essentially finished -- I'm currently working on plugging up the last few memory leaks. I have found at least 1 bug in the Sun Forte 6.1 STL implementation -- std::vector::resize() causes a read-from-uninitialized error. Doh.

poggenc is still threads-only (no MPI yet). I thought that I knew a lot about threading before I started this, only to discover that I didn't know jack about threading. My original design had many locking bottlenecks, such that encoding with multiple threads (or even one thread!) had so much overhead that it was slower than hell. I had to redesign a bunch of the interfaces and reduce the numbers of locks necessary by a lot in order to get the processing time down to a reasonable level.

Still, however, it's less than linear speedup with multiple threads on SMPs. Of course, nothing can exhibit perfectly linear speedup, but this isn't close enough for my liking. I'll continue to investigate that.

I started some web pages to explain how this works, with the idea that some of this text can be morphed into dissertation-quality text afterwards. i.e,. the web pages are a dry run for a dissertation chapter.

Saw an old Army cadet of mine this past weekend; Brent and his wife Aimee (I hope I spelled her name correctly). He was one of my Airborne plebes; I beat up on him as part of his training (and he's a better person for it! :-). It's a small world -- he now works for GE Appliances here in Louisville. It was good to seem him again, and to hear what he ended up doing in the Army, and what he's doing now. Ironically, he outranks me -- he finished as a Captain, while I'm still a 1st Lieutenant. Life is amusing that way...

He's working on an idea with an old commander of his who is at the Army War College. It's an overhaul of the Army's evaluation system. It's pretty cool, actually. There's a web and technology component (which is why he asked me). He asked if I could help, and I probably will throw a bit of advice their way (contributing to the open source/freeware cause, of course), but I don't have time to do any actual programming for them. Ah well. :-\

Over the past few days, we've (me-n-Andy) been coordinating a trip up to the University of Wisconsin/Madison for a visit with the Condor folks. We've got it all set on the first week of February, but I forgot my @#$#@$% dentist appointment that week. Arrghh!! Tomorrow, I've gotta see if I can get it rescheduled (my dentist isn't open on Mondays).

Other than that, it looks like it's going to be a great trip; I'm going to give a talk on LAM. After a little discussion (we've got a mailing list setup for the LAM and Condor folks for ongoing collaboration), we decided to split my talk into three parts:

MPI vs. PVM: theoretical / practical reasons, with a few small code samples

Talk about how the lower layers of LAM work (daemon-based stuff, etc.)

An intro to what we're hoping to do with a Condor + LAM collaboration, what I've tentatively nicknamed "Lamdor" (like the name?)

It should be a good time.

Speaking of the Goodness of LAM, there's a Linux Integrator company (Aspen Systems -- http://www.aspsys.com/) who wants to install an 800 node Beowulf with LAM and Myrinet 2000. How cool is that?!

LAM: Lust for Glory!

I visited ND last week for a few days. The lab is a total disaster with water damage and whatnot. However, I've heard the most sensible idea for solving the problem that I've heard in years: instead of trying to fix the roof, they're going to essentially install an upside-down umbrella in the attic under the roof to catch all the water that seeps in from the roof. This water will be funneled to a new drain pipe that they installed inside Cushing. That's right -- they drilled through a hole in the floor 325 Cushing, and also through the floor in the room below us, and will be installing a massive drain pipe from the attic all the way down to the ground floor and outside, so that the leaking water can flow all the way from the roof to the outside, safely.

Engineering wise, it's actually pretty cool.

While I was at ND, I managed to grab Dan from Scyld on the phone. We had a good chat. He's very pleased with the progress on poggenc, and we talked about LAM/Scyld as well. We think we came up with a hack for LAM/Scyld. It's not perfect, but it will [hypothetically] allow:

LAM to work on Scyld machines.

An RPM of LAM to be distributed that will work on both Scyld and non-Scyld machines (decision is made at run time).

We'll see how that works out.

Also while I was at ND, Dog and I met with Paul and Johanes to "turn over the keys" of the Hydra. Dog and I are now no longer the primary caretakers of the Hydra -- Paul and Johanes are. Of course, we'll be in a transition mode for a while; Paul and Johanes will probably have to consult us with any problems with PBS for some time. But at least we've started the transition.

Two things I have to do before I am fully out of the loop:

Integrate the Maui Scheduler and QBank software into PBS. This is because Rich Sudlow has finally decided to take us up on the CTC deal where ND HPCC users get 10% of the cycles of the hydra per month. To do this, we need an allocation-tracking program (QBank), and a scheduler that can interface with it (Maui). I'll install this stuff, and tell Paul/Johanes about how it is setup when it is done. Hydra PI's and students will either get an unlimited monthly allocation, or an allocation so large that they cannot spend it all. All the HPCC users will share a common allocation that amounts to 10%
of the hydra cycles per month.

Interestingly enough, the Maui scheduler did not compiler under Solaris. It was a handful of small items that were "wrong". I corrected them and sent a patch to the Maui scheduler list. The author was very grateful and promised to include the fixes in the next release. How cool is that?

Finish the PRS once and for all -- there's some calls to popen() that need to be replaced with formal fork()/exec() stuff (for various technical reasons). This is of lower priority, but it does need to get done eventually.

In Army news, after a weird sequence of events, it looks like I'll be heading down to ARL/STB (Army Research Lab / Software Technology Branch) Atlanta for one more 2 week stint before they get shut down. I have to do my annual 2 week tour before 1 March, so I could go down there immanently. It depends on the trip to Madison and my dentist appointment; we'll see what happens there.

Also, my PMO (personnel management officer... took a minute to remember that) at AR-PERSCOM (Army Reserve Personnel Command) sent me an e-mail at the end of the day saying that she's got a line on a new position for me in ARL since STB is being shut down. I'll be talking to her tomorrow about it. This likely means that I won't be heading back to be a BSO (Battalion Signal Officer) for some combat unit after this 2 weeks. Wooo hoo!

January 11, 2001

I'm ballooning my ass off up here!

A quickie tonight, 'cause I'm tired and want to go to bed. Typos be damned.

I sent a broad list of suggested meeting topics to the lamdor list today. We'll see what the Condor folks think. I also sent my lengthy discourse on all my thoughts about Lamdor that I wrote about a week or two after SC2000. I was amazed to see that I had written 638 lines of text. Good God! I talk a lot.

I have spent 3 excruciating days tracking down a LAM problem on AIX 4.3.3 in 64 bit mode. Craig Stewart and friends down at IU ROCK, by the way. I absolutely needed an AIX 4.3.3 account somewhere to track this down, and after calling/e-mailing everyone I could think of to no avail, Lummy suggested the IU folks. They got us an account within a matter of hours. Amazing.

This all came up because some guy (Shahryar) in ibm.com e-mailed the LAM mailing list saying that he was having problems getting LAM to work under AIX 4.3.3 in 64 bit mode. It turns out that his boss is an old LAM guy from Ohio State, so we felt obligated to help him out. :-)

Amusingly enough, over the span of 3-4 days, I have more e-mail from Shahryar (141 from him) than I have with any other single LAM user. The next contender is Keith from Citibank, of which I have 117 e-mails -- but that was over a period of several months. The next largest number of mails that I have from a single LAM user is 39.

Wow.

Huge props to Darrell and Rich for helping me figure this out. There were deep discussions today in e-mail about kernel-level stuff (I have to admit, it was somewhat amusing to watch a BSD guy and a SYSV guy duke it out). There were many others that helped in find this bugger, too -- thanks to everyone.

Here's an e-mail that I sent about it earlier today:

Subject: Bloody AIX!

For the past several days, I have been struggling with an issue under AIX 4.3.3. This may affect you in the future (it has to do with blocking and non-blocking sockets), so I thought I'd pass it on.

January 12, 2001

Radio broadcasting, wrought iron smelting... it's all pretty much the same thing

A few things that I forgot to mention last night...

As of 7pm on the 10th, I had 788 xmms instances running on queeg. xmms finally crashed yesterday; I'd assume that we were over 800 when it finally died at 12:41pm yesterday (the 11th). Right now, I have 98 xmms instances on queeg.

I need to write an automated scripty to monitor this so that I can keep a record of the most number of xmms instances; probably a simple cron job that appends the date and the number of xmms instances to a flat file would be fine.

...done. cron will fire this thingy up every 5 minutes. Because when you have an 800Mhz machine, it's important to bog it down with utterly useless crap. Oh yeah, I need to write that PHP K-Mary Phone Call tracker, too, with a MySQL back-end and automated report generators...

Got one response back from one of the Condor guys already about my really long summary of what we need to do for Lamdor. Cool!

It looks like Jeremiah (of the clan LSC) will be forced to make his LSC Friday Lunch scripty thing in a webified doo-hickey. It'll be good for him; he'll have to learn PHP, which will, fundamentally, make him a better person.

PHP makes the world a better place.

Johnny hooked me up with an account on his MS Exchange server at home the other day. I used MS Outlook 2k to hook up to it. Why would I do such a thing?

Well, Outlook is actually not a bad program, truth be told. It has many nice features. That being said, I don't know if I'll ever be able to use a GUI mail client because I'm so conditioned to a green-screen mail client, but... My sister Robin needed some advice on enterprise-wide calendaring; having an account on an MS Exchange server where I could step through each window with my sister over the phone (her company uses Outlook/Exchange as well) was quite helpful.

I think we'll try to have a "welcome to the internals of LAM" party/meeting next week with Ron and Brian. Arun will likely be there, too, since he's really only been exposed to a small portion of the LAM internals. I'll have to think about what I want to talk about, and how to orient the guys to a source tree of 80+ directories and 950+ files.

An annoyance that we ran into in LAM the other had to do with libtool. libtool can be your friend, but it can also be your enemy.

It seems that -- at least in some environments --libtool does not like source filenames with more than one '.' in it. i.e., "foo.c" is ok, but "foor.bar.c" is not ok. It's some kind of regexp problem inside libtool somewhere (I tracked it down once) -- they just made the bad assumption that there would only ever be one period in the filename, the one that separates the basename from the extension.

We had a handful of filenames in LAM that had two dots. I don't know why, but libtool barfs on these only in some environments. I didn't bother figuring out why; I know that I've seen this before and that I somehow managed to fix it (of course, I can't remember how, now). But on the rationale that when we start distributing LAM with libtool-enabled builds, someone will run into this problem, Arun and I just went through an renamed all the "bad" filenames using s/\./_/.

This is kind of annoying in CVS, because there's two ways to do it:

Go muck around in the CVSROOT and rename the repository files manually

Use CVS to remove the old filename, and the use CVS to add the new filename

Either way is ucky, but the latter preserves the history in case you need to roll back to an older version, so that's what we did (with comments in the logs about where to find all the previous versions of these files, since they now have CVS versions of 1.0.1).

January 14, 2001

Honneysuckle matchheads

Still working on parallel oggenc. Ugh! There's some internal massive memory leak that is proving incredibly elusive (it must have something to do with the way that I'm invoking the Ogg/Vorbis API incorrectly...). However, I have proven to myself that I have plugged all of poggenc's holes.

C++ can be really helpful. I have a templated buffer pool class; it is used to allocate and then recycle buffers so I don't have to new / delete forever.

In order to provide that nothing is getting lost in this templated class (i.e., everything eventually gets deleteed), I had to put some couts in the destructor (shh!). But since the class is templated and used in many cases, seeing a general: "X buffers remain unaccounted for" is not helpful -- I need to know which instance has buffers that remain unaccounted for.

which shows the real type of the templated instance. Very cool, and very useful. <typeinfo> is your friend.

Brandon (and some others I think), a ND CSE senior, has written the ultimate Palm Pilot killer app: it plays the ND fight song, alma matter, and the Victory Clog. It's still in beta, but I managed to snag a copy of it and it seems to mostly work. Brandon says they're still working on it, and will send me a copy when they hit 1.0.

It's not like I know a bazillion people who would want an app like that or anything...

Saw some movies this weekend:

What Women Want: A good flick; watched it with Tracy and Janna. There was stuff in there for both men and women. Got a bit mushy towards the end (it is a romantic comedy, after all), but all in all an enjoyable movie. I give it 12:30.

Keeping the Faith: Quite amusing -- saw this one on video. It's with Ed Norton (Fight Club!), Ben Stiller, and Jenna Elfman. This one, too, slowed down a bit towards the end, but there was a good supply of one-liners to make it enjoyable. I give it 15:00, partly on the strength that Ed Norton rocks 'cause of Fight Club, Ben Stiller is just really funny, and Jenna was really hot.

It seems that Arun is going to show Eraserhead for the movie club this week. What a horrendous choice. Eraserhead has the dubious honor of being the only movie that I have ever returned to a video store without watching it in its entirety. It was too fucked up for me -- I turned it off somewhere about halfway through. Granted, that was at last 15-17 years ago, but still, I have memories of that movie sucking Big Time.

I'm waiting for a bcheck run of poggenc to finish (it takes quite a while, even with a small sample) that will hopefully shed some light on my memory leak woes.

Miron, head PI for the Condor project, sent some wisdom to the Lamdor list today: let's just concentrate on getting LAM jobs to run in Condor before we do all the checkpoint/migration stuff. I was under the impression that we had to do the checkpoint/migration stuff to get LAM to run under Condor, but Erik informs me that they have a static scheduler that allows things to run uninterrupted, and therefore not have to have checkpointable/migratable code.

This is good to know -- it makes a nice, clean abstraction break between these goals (getting LAM to work in Condor, and getting LAM to be checkpointable/migratable).

I had 487 instances of xmms running on queeg a little while ago -- 85% of all processes on queeg were xmms. However, that caused xmms to eat up over half of my RAM, which was really slowing things down. So I had to kill and restart xmms.
However, X itself still consumed about half of my physicial memory even after I killed xmms. Perhaps there's some gradual memory leak in the X server as well. Who knows. I restarted X and all was well (X had been running for about 30 days; while that's not perfect, I suppose it's [fairly] forgiveable).

Tracy and I contacted a realtor (on the recommendation of several co-workers of Tracy's) and started looking at houses on Saturday. We're going to look at more tomorrow.

It was actually surprisingly fun.

I never thought that I'd be able to walk into a house and say, "Nope. This one won't do," and actually mean it, and have reasons for saying it other than just being cocky, flippant, and arrogant.

Damn, I'm getting old.

Ah! The bcheck run is done. Back to coding... squishing little buggies...

January 20, 2001

A complaint about the complaint box. Delicious.

As I was driving home from Notre Dame yesterday, I drove south into a snow storm, which is really odd. Normally, it's the other way around -- you go North to get snow.

Had a good couplea days at ND.

I had a rockin' LAM pow-wow with Arun, Brian, Ron, and Dog. Dog was more of an observer, but he has been an official Friend of LAM for quite some time. When he gets some Spare Time(TM), he does need a Master's project, so it's possible that he'll do something in LAM. We'll see.

We discussed all the things that are Going On in LAM, and came to a few decisions:

The next release of LAM will be 6.5, not 6.3.3. Mainly PR reasons, but also to signify that this is quite a big change since [the currently available] 6.3.2.

First order of business this semester is to get 6.5 out the door. There's one or two issues that I'm going to look into this weekend, and then start giving tarballs to Ron and Brian for formal testing.

Ron will probably start looking into Totalview support. That will be way cool; having a real parallel debugger that supports LAM.

Brian is going to start looking into IPv6 support. This could give us some really cool things, such as optimized collectives (using IPv6's native multicast ability), security in the lamd (using IPsec), etc.

Arun's going to finish the Myrinet RPI. He's having problems with long messages right now; hopefully that will get fixed Real Soon Now. He'll likely look into the VIA RPI after that, and dabble a bit in compression at the RPI level. This is an interesting sub-note: I think I had the inspiration to use compression in MPI during a drive SBNLouisville. Sometimes it's not worth it, but sometimes it may make a huge difference in terms of bandwidth. It would be our ringer for ping-pong tests. :-)

We'll probably have a series of quicker sub-releases (hopefully!) that incorporate major new features. e.g., 6.5.1 may have Myrinet support. 6.5.2 may have Totalview support. 6.5.3 may have some TCP RPI optimizations (e.g., tiny messages, fixed linked list handling). And so on. We can't really do this now because the 6.5 tree is very different than the 6.3.2 tree.

Didn't get to see Ed-n-Suzanne too much; maybe we'll have to do dinner one of these times when I go up there. Cleo went barking crazy when I came home both nights. I think the Cleo's non-barking acceptance rate is complicated function. There are multiple factors:

Whether I initially come in during the day or at night (day, 1 = day, 0 = night)

Whether Cleo is there when I initially arrive (only if during the day) (at_home, 1 = home, 0 = not home)

Whether I come home at night by car or by foot (car, 1 = by car, 0 = by foot)

How many days I have been there (days)

Phase of the moon (moon, fraction from 0 to 1)

These factors have led me to the following equation (too bad mathML isn't yet implemented anywhere...):

January 22, 2001

"Slut" has been playing continuously for 2 days

Found two great houses over by Janna, and we were all set to sit down and slog through the details of deciding which to get, and then found out that neither of them have DSL availability.

Arrgh!!

I'm doing a bunch of LAM work right now to enable Ron and Brian start the release process for LAM 6.5 (did I mention in the journal already that we're going to call the next release 6.5 instead of 6.3.3? It's a long complicated story [e.g., where's 6.4?], but there are definite reasons for everything. To summarize: there's been major changes since 6.3.2 such that we didn't feel that an increase in the release number was sufficient to describe the enormity of the change. It isn't quite as revolutionary as should indicate a major number change, so we settled for a minor number change. There.).

I've added a whole schload of programs to the lamtests test suite, and added a few more canonical example programs to our "examples/" directory --
something I've been meaning to do for a while. We have some good examples already, but none are the "standard" examples that are typically used in MPI, like the pi approximation program and the ring program.

Now I'm briefly diverting to write a few man pages (we have a bunch left unwritten for MPI-2 functions, so we've divided them up into groups and assigned them to various Llamas. Tackling them a few at a time is a good way to whittle the number of unfinished pages down to a small number, as the limit goes to 0). Mostly MPI-2 dynamic functions for me.

After that, I'll finally get around to fixing MPI_COMM_SPAWN and MPI_COMM_SPAWN_MULTIPLE
-- there's something wrong with using app schemas such that you still have to give a process count on the root or something (you shouldn't have to). And I think the error code reporting is futzed up somehow (lamteam advised me of this about a month ago or something. Not a huge deal since errors typically cause aborts, but it is possible that someone could set the error handlers to return and expect to get valid error codes back).

Then if all else looks good (oops... looks like I have a seg fault in one of the new test programs...), I'll hand the tarballs over to Ron/Brian to begin the release process.

Long live LAM!

There are currently 144 copies of xmms running on queeg, which is 63% of all processes.

January 24, 2001

Choco-latte dead-head sickers

I just had a lengthy journal entry about how LAM 6.3.3b52 is officially dubbed "release candidate 1". Happiness all around.

Unfortunately, I hit ctrl-c at the jjc prompt, and all was lost. Doh. Gotta put something in jjc to prevent that from happening in the future. :-(

Suffice it to say that we're starting the formal LAM release process. I put some way-cool centralized error reporting stuff in the lamtests module (there was a lengthy explanation of it in the Journal Entry That Is Now Deceased; it's too late to re-type it all now), and generally expanded the testing base. This actually resulted in finding a few more bugs and minor memory leaks for obscure cases in LAM (which is a good thing -- yay for testing!).

I will, however, re-print an excerpt from a LAM user that I got today:

"I wrote you a while ago regarding C++ extensions for MPICH. By now we've switched to LAM. Feature availability convinced us to do so... :)"

I replied to her that all Right Thinking people use LAM. Resistance is futile.

Now that Brian and Ron will run with LAM's release process, I'll head back to poggenc... By the looks of vorbis-dev, Ogg/Vorbis beta 4 is pretty close. There's still some broken things in terms of building in non-gcc/non-Linux/non-shared-library environments, so I'll keep bitching about those. :-)

There are currently 413 xmms instances running on my machine out of a total of 497 processes. 83% of the jobs on queeg are xmms.

January 25, 2001

It tastes exactly like licking a shag rug

xmms finally clobbered queeg today.

There were 536 xmms instances on queeg out of 623 total processes -- 86%. Things were running at an absolute crawl when I came back from dinner. So I had to kill xmms, ending the 6 day streak of playing "Slut" continuously. <sigh>

I got access to a friend of a friend's BSD box for some LAM testing. He's quite a nice guy (his name is Todd), and has come through for LAM a few times before. Never underestimate the friendliness of fellow programmers on the internet.

Kudos to you, Todd! And kudos again!

And Kudos to Craig down at IU for getting us AIX access! And kudos again! Er... actually... he got us AIX access... perhaps we should be cursing him...?

All for the glory of LAM.

IRC is actually fairly interesting. The Ogg/Vorbis developers hang out in a channel on irc.openprojects.net, so I pop in there periodically to ask questions, etc. BitchX is an amazingly powerful program; I'm sure that I only understand about 2% of its functionality.

I am so fed up with ROMIO. It turns out to be pretty broken on *BSD platforms. Words cannot express.

January 27, 2001

Are you Doobie Keebler?

Ying: The LAM release cycle is under way. After some struggle, I solved a bunch of issues with the build process that had to do with automake and bizarre timestamps. We're up to LAM 6.3.3b55.

Yang: In chatting with some Ogg/Vorbis developers on IRC this evening about some problems that I have been having, it turns out that doing a parallel Ogg/Vorbis encoder simply may not be possible due to the nature of the Ogg/Vorbis encoding algorithm.

More details to follow. I don't fully understand the encoding process, so I don't completely grok what they told me; need to sleep on it.

February 3, 2001

All I ask is that you obey me like the Will of God

Last Wednesday, I came out in the morning to my car and noticed a huge crack in it. It was fine on Tuesday night. It doesn't look like an impact crack; perhaps it was thermal stress...?

Thursday afternoon, when I drove in to South Bend, I drove up to Chez Costech and ran over a bottle that I didn't see. Not only did I get a flat, but the bottle managed to gash the side of my tire (which isn't repairable), so I had to buy a whole new one. The folks at Basney Honda were quite nice and hooked me up (they didn't even charge me labor, which was nice -- Kudos to the "Jeff" guy who worked there!), but it was $65 that I didn't particularly want to spend.

I'm worried because these things typically come in threes, and ID have 3 more long drives ahead of me (to Madison, from Madison, and back to Looieville).

Whatever I did, Herman, I'm sorry.

In other Big News, Tracy and I have finally decided on a house. Here's a breakdown of the big details:

2300 square feet

2 floors

4 bedrooms (all on second floor)

Laundry room upstairs

Entryway off front door is open all the way up to the second floor; the stairs go around the edge

Sitting room on first floor

Dining room

Big kitchen

Great room

2 car garage

Patio out back

Basement

We picked out the cabenits and countertop last week, and put some "good faith" money down on the house so that the builder would customize it for us. Yes, it's a brand new house -- not even complete yet. Tracy's working on picking out colors and carpets this weekend.

We expect to close by the end of the month (Tracy worked out these details after I left for SBN, so I don't know them offhand). We'll spend the next month cleaning and moving in and whatnot (we'll kinda be taking out time with this), and probably move in by the end of March.

Woo hoo!

And now for some quickies...

Went to the Keenan Revue with Arun, Perk, and Co. Was quite fun. Some of the skits were really funny. I won't give anything away here, but the wheelchair bit was my favorite.

Lummy and I are heading to Madison tomorrow to visit the Condor folks. Should be a great trip; I'm pretty excited about it. I'm giving a talk there on Monday afternoon; I need to finish it!!

All told, Arun, Jeremiah, Brian, Raja, and I spent probably about 3-4 hours discussing quoting and shell escaping rules for LAM on Friday. Wow. In the end, we decided to punt, and only allow simple stuff -- no quoting will be allowed. Maybe someday.

Brian gave a talk on IPv6 at LSC lunch yesterday, which was quite informative. When LAM 6.5 gets out the door, he'll be looking at supporting IPv6 with LAM, and doing some cool things with collectives with it. We'll see how that does.

Arun and I spent some time yesterday with the Myrinet RPI and tried to make ti work right (still a problem with long messages), but got sidetracked wondering if it would be worth it to replace the state machine that we don't fully understand with our own state machine. A few hours and a full white board later, we decided to stick with the one that we already have and try to make it work. A re-write will be necessary, but not right now.

Arun and I also decided that he'll next move on to do Totalview support for LAM rather than VIA. It is both RPI work and helps move Arun up a level in the LAM code -- gives him greater exposure. Plus, when it's done, it will be immensely helpful for debugging LAM itself.

That's it for now -- I somehow have whacked my journal client up here on AFS, so I'm using one on queeg via DSL, and somehow backspace doesn't work in Emacs, which is quite frustrating.

Scalable LAM (SLAM): port some of the technology from Minime back to LAM, including the tree-based booting and TCP connection caching.

I returned to Louisville on Wednesday to find that Bell South had taken a back hoe to my phone lines. I was without telephone and internet for over a day. It sucked.

I'm really displeased with boost, and aside from the Boost Graph Library (BGL), I'm not going to use it anymore. More below.

Everyone in the LSC is now scattered around different offices in Fitz/Cushing. They're cleaning out 325 so that the carpet can be replaced and the room can be cleaned (after the annual winter/roof/water leaking/mold disaster).

Lumsdaine just formally announced that he's going to after this semester.

Ok, now for some longer explanations.

Wisconsin trip was good. Met several of the Condor folks, including Miron Livny (the head PI up there). He's a good guy, and really smart. He's the only other professor type whom I've met who views software the same way we (Lumsdaine/the LSC) do:

He actually wasn't as big on checkpointing as we were; he's more interested in bringing MPI into the dynamic computing world, where faults can (and do) happen. It took all of about 15 minutes to figure out how to run LAM under the static Condor scheduler. It will take a good deal longer to figure out how to [re]define MPI semantics to work in a dynamic environment where nodes can fail. Then it will take a little time to implement those in LAM.

After getting LAM to work in their static scheduler, we decided that the first step would be to get LAM to run in a "debugging / interactive" mode in Condor. That is, do something like:

The first step reserves up to 16 nodes (however, one or more may disappear at any time if a user returns to their computer, etc.). The last step releases any of the nodes that are still left.

Definitely an imperfect scheme, but a good first step -- and necessary to do anything more complicated.

This is going to be good stuff!

I got really annoyed with Boost today. The following is essentially an e-mail that I sent to Jeremy, Andy, and Rich about my experiences with Boost today:

Boost has a long way to go before it becomes usable for the Common User. As it has been all along, boost is really only suitable for its own developers and a handful of other hard-core geeks. Boost needs to become a lot more friendly towards the user before it can hope to become widely accepted.

Software needs to suck less, and boost is not fulfilling that requirement right now.

Here's my story, and why I'm annoyed at boost:

I used a "progress" class from boost in some of my dissertation code. It just prints a simple ASCII progress bar, from 0 to 100%. Handy for some long-running computations. I could well have written this myself, but decided to use boost, well, for pretty much the same reasons I love the STL (it's already written, it just works, etc., etc.).

I haven't touched this code in a month or two, and when I resurrected it today, I figured that I should go get the latest version of Boost, because I saw that there were some BGL fixes, and I'm finally getting to the point where I'm going to need the BGL. This turned out to be a Big Mistake.

I downloaded boost_all.zip (which still doesn't have a version number in the name) from boost.org. There was no indication that this was an unstable release, so I figured that I'd be safe.

I unzipped it (sigh; why still no tar.gz version?) and inserted it into my source tree on my linux box here at home. Four unexpected and Bad things happened:

File locations have changed. Specifically, there are .cpp files that no longer exist. Normally, as a library user, I wouldn't care one bit about this. But since boost provides no build interface, this is highly relevant to the user.

I grumbled a bit, but modified my Makefile.am's to adjust.

boost/config.hpp seems to be broken with g++ 2.95.2
-- g++ complains of bad preprocessor mojo. It seems that g++'s preprocessor doesn't allow you to split #if statements across multiple lines with "\".

I edited boost/config.hpp to fix the error.

boost/cstdint.hpp is broken for the same reason. There may be others that are broken in the same way; I was only using the "progress" class, and this is apparently all that they used.

I edited boost/cstdint.hpp to fix the error.

boost/timer.hpp (from Beman himself!) does not protect #include <limits> with #if BOOST_NO_LIMITS. g++ does not have <limits>. Hence, broken.

After poking around a bit to figure out why this was happening, I edited boost/timer.hpp to fix the error.

After cooling off a bit, I concluded the following:

Although boost is continually evolving and filenames/locations are going to keep changing, until a build and/or install mechanism is in place, users will continue to get burned by .cpp files moving, etc. This needs to be fixed. Soon.

I realize how hard it is to distribute software, especially catching all the minor details before you ship a tarball -- but didn't anyone test g++? Having [at least] three separate things broken on the latest g++ version seems like a glaring oversight.

There may be some rationale here that I'm not aware of that makes this "ok" (there's nothing in the documentation that I saw about this), but Joe User is going to download boost_all.zip and just expect it to work on his Linux boxen. He's certainly not going to comb through the mailing list archives looking for why it doesn't even compile.

Other than the BGL, I'm not going to use boost anymore. I will use the BGL because it's probably safe to assume that there's some LSC-inspired sanity in that package (i.e., I have much higher faith in that package vs. the others because I trust the people involved), but I'm going to write my own "progress" class. It's just not worth my effort to use any other part of boost.

Jeremy replied that there had been some confusion among the boosters -- the version that was released today was released before it was ready. Although this may be true, the problems that I was having were unrelated to the problems that he was talking about. So I stick by my statement that other than the BGL, I won't be using Boost anytime in the near future.

I dug up some old minime code today (the threaded booter) and familiarized myself with how it worked again. I then dove into LAM and added hooks for threads and whatnot (nothing will be checked in, of course, until after 6.5 is released!).

I took a good long look at the current lamboot mechanism in LAM and had to sit and doodle out several designs before I got one that elegantly meshed the assumptions that are built into LAM with the new ideas from the threaded booter. I'll try to implement it tomorrow; it will be a real pain without the STL and C++ strings. :-(

Tracy and I went out to the house today. I took loads of pictures, and we measured out several rooms for planning purposes, etc. I can't decide which of the two bedrooms at the end of the 2nd floor hallway will be my office. I may have to visit the house in the morning and see how the sun glare is in the front one (that's my first choice, but if the sun glare is too much, I'll have to use the back one).

I bought Tracy a new cell phone this week because a) Valentine's day is next week, b) she'll be on jury duty all week, and c) she lost her old cell phone about 3-4 weeks ago.

I found some cool undocumented features from the sales guy. Most of which are scary looking codes with no explanations, so even though I know they're there, I won't touch them for fear of breaking my phone. One cool one, however, lets me program the top line of my phone's "idle" display to say whatever I want. This is extremely helpful because Tracy and I now have identical phones...

And of course the day after I bought Tracy her new phone, she found her old one.

Figures.

Do you ever wonder where the subjects come form for my journal entries?

Most are movie or TV quotes. Some are totally random and off the top of my head.

I've been working on integrating the multi-threaded tree booter into LAM (nothing will be CVS committed until after LAM 6.5 is released, of course). I've had some interesting (and frustrating) problems, but it seems to be going more-or-less well.

When I originally wrote it, it was outside of the LAM framework, so I re-wrote/copied some of the LAM stuff for basic network services and whatnot, frequently putting it in a C++ kind of context (using the STL, making basic objects, etc.). So I've been stripping that stuff out and reverting back to LAM's C interface for these services.

It's coming along swimmingly.

DSL is getting installed in my church; they responded with a telephone installation date of 22 Feb, 2001. They're already up on a LAN; they use individual modems to connect to AOL right now, which is terribly inefficient. DSL will be a Good Thing for them.

With all the hubaloo about ssh1 this week and last, I upgraded to OpenSSH. Took a little bit of pain, because I need the AFS token passing support, so I had to compile it myself. What isn't obvious is that "OpenSSH" is a BSD-specific application. You have to get "Portable OpenSSH" to run on linux (or anything else) machines.

With some futzing, I got the AFS stuff to work.

Then I started mucking around with SSH2. Took a bit more futzing to get that to work.

Important fact: I don't know if I selected this during installation or if it's a Mandrake default -- you have to configure OpenSSH with --with-md5-passwords to get password authentication on the server side to work properly.

After all that (I was using Portable OpenSSH 2.3.0p1), I was randomly getting "authentication response too long" errors when I tried to connect to an openssh server. I asked Todd about this (he's a FreeBSD guy), and he mentioned that they "had problems with RSA authentication somewhere around 2.3.0".

So I got the latest CVS copy of Portable OpenSSH (which is version 2.3.2), and all seems to be well. I don't know if it was the client or the server that was whacky, but I suspect it was the client -- I couldn't connect to an openssh 2.3.2 server with it either (same error: authentication response too long). I don't know what the difference is between 2.3.0 and 2.3.0p1. On my 'drake 7.2 laptop, I have RPMs installed for openssh 2.3.0, and they seem to work just fine, so perhaps p1 broke something...?

But the CVS copy seems to be working, so I'm happy with that.

I may have to switch to gnome. I caught a bit about "Evolution" on /. the other day -- it looks like a free version of MS Outlook. Very cool. But it has lots of dependencies, and seems fairly gnome-specific.

I'm not inspired to try it at the moment, but I might well be upgrading all my current linux boxen (3) after I graduate to whatever latest/greatest stuff is out there, which may include switching to gnome, etc.

I actually only use KDE right now because it was the default when I installed linux on my laptop. Not having previously used KDE or Gnome before, I took KDE simply because it was the first in the list on the login screen.

Nina and Joe from the LAM list made a good suggestion (Nina indirectly asked it about 2 weeks ago, and we never got to it... oops) today that I put into the main-line LAM tree so that it will be released in 6.5.

I added a "-s" option to the lamboot command. Normally, the stdout/stderr of the LAM daemon on the node where lamboot is run is left open. This is so that LAM's internal "tstdio" package can function properly. tstdio is an emulation of normal stdio, but it works in a parallel environment, and funnels everything back to the lamd on the node where you booted.

Anyway, we normally leave stdout/stderr open on the local node for this reason. The stdout/stderr on all remote nodes is closed. However, Joe and Nina both wanted to do:

rsh somenode lamboot hostfile

It's important to remember that rsh requires two criteria before quitting:

The application that it launches finishes (lamboot in this case)

stdout/stderr from the application that it launches and all of its children are closed

This makes sense, actually; normally you'd want to see the output from all the children processes that you rsh over to some node, and wouldn't want rsh to finish before they did, because then you wouldn't see all the output.

But in this case, it causes rsh to hang. Since 99%
of LAM users don't use tstdio, I added "-s" that will force the closing of stdout/stderr on the local node, so that "rsh somenode lamboot -s hostfile" will allow rsh to complete.

More information than you wanted, but I wanted it archived in my journal. :-)

I seem to have years worth of data in my palm pilot datebk. Rich Murphy suggested trying to "purge" option.

February 19, 2001

Are you going to cry right here, or run to the bathroom?

I saw the last episode of News Radio last Thursday (the one where everyone but Dave and Matthew go to Jimmy's new radio station in New Hampshire). The next night, they ran the first show -- they're starting the cycle over from the beginning. Marvelous!

I love the changes from the first few episodes from the rest of the show:

The break room is a production room

Joe is played by some other actor (and I don't think the character's name is Joe)

Beth's character is a decidedly different personality than in later episodes of the show

There's probably other differences, but those ones jump out at you.

Network performance to nd.edu has been pretty bad this week. It was especially bad when the OIT screwed our entire switch and dropped everything down to half-duplex 100Mbps.

Ugh.

I got write access to the Ogg/Vorbis CVS repository last night. I've complained about so many build process problems that Monty (the lead developer) finally just gave me write access to go fix them myself (he's actually tried to give me write access before, but I refused). I committed a few minor patches, but a few issues still remain in debate. It's a bit more complicated because all this stuff has to compile on windoze (which I won't spend any time on), and by the fact that beta4 is due out immanently. Hence, everything is frozen except for bugs.

I've actually got several patches that I submitted long, long ago but were never applied (mostly build-process things) that I will likely apply after beta4 is released. The fact that you still can't build with native Solaris compilers without modifying the included Makefile.am's is still a sore spot that I plan to fix (they include the foolish GNU dependency generating thingy that breaks building for all non-gcc compilers).

There's also heated debate about the use and implementation of getopt_long(). It's silly, but still important. Ugh. Again, this probably won't be Really Fixed until after beta4.

Not that I have time for any of this, anyway...

I've been doing lots of dissertation hacking this week. I'll save that for a second journal entry; I have some important facts to report there.

Not too much else has been going on -- I've been really concentrating on dissertation stuff this past week. I hear that Arun and co. hung a "Lust for Glory" sign underneath ND's "Engineering Week" sign. Most excellent. Arun is promising pictures.

February 26, 2001

We have IPv6 telephones

Whew! What a week.

Tracy and I got caught in the snowstorm out east last week. It took over 3 hours to drive from Philadelphia to Baltimore (and we used the non-congested roads) -- a drive that normally takes 1 hour 45 minutes. Top speed in Alan's range rover was about 40mph. Woof!

And how about Southwest airlines -- $55/each for Tracy and I to fly to Baltimore from Louisville for a grand total of $220 (round trip). That was literally an order of magnitude cheaper than all the other airlines.

More house stuff is going on. We close in T-7 days if all goes well. The building is working feverishly to finish the house by next Monday. Tracy and I went to the house again this weekend and took a bunch more pictures -- have a look if you care:

Things are looking really good; they've painted the house and added all kinds of things. Kitchen cabinets go in tomorrow. Carpet and kitchen floor go in on Wednesday or Thursday.

I've got a bunch of details to work out this week, including life insurance, an updated will, getting a certified check for the closing costs, details with the builder, etc., etc. Will be busy, and it does detract from the time that I spend dissertating, but it will be worth it.

Network Solutions is the root of all evil.

I got a really stupid and poor-english speaking tech help person on Saturday morning when I called to ask why www.lam-mpi.org still didn't resolve in DNS. Suffice it to say that an extremely frustrating 30 minute conversation ensued. After several hours of cooling down, I figured that what that guy had told me couldn't possibly be correct (i.e., I had to wait to ensure that it was a factual response, not an emotional response), so I called again.

I sat on hold for about an hour (apparently the weekend staff is pretty small!) before I got someone. This guy was actually very intelligent and generally knew a lot more about his job than the first guy. We finally figured out the problems and he set some things up to make everything work nicely. lam-mpi.org is now propagating around the world and will soon come to a DNS server near you.

There still seems to be some kind of hitch between ns1.lam-mpi.org and ns1.squyres.com -- the DNS on squyres.com can't seem to do a zone transfer from the main DNS server. I think it might have something to do with Curt's firewall setup.

Chatted with Brian and Arun this weekend about the upcoming release and general LAM Things; we're really, really close. The last few beta tarballs have all been caused by AIX. AIX sucks.

Word to the wise: AIX's "make" is broken. Use GNU make instead. Unbelievable...

Much overhauling had to be done to LAM's web pages to get them to appear nicely on www.lam-mpi.org. In the end, we decided to start a whole new CVS module for these pages (which actually makes a lot of sense). Some of the directory structure has changed, and a whole lot of broken assumptions that version 6.3.something would be the most current version. All the tutorials and the MPI implementation list database are now part of www.lam-mpi.org. 6.5 is the default version of LAM/MPI. Oodles of fixes and changes to the web site.

I'm using checkbot to check that all the links and whatnot in the web site are correct. It's a handy little tool. It takes an hour or so to run, so it conveniently leaves me to do other things, and mails me when it's done. It dumps its output into HTML so it's convenient to view in a browser.

Woo hoo! I guess that since I have CVS write access to Ogg/Vorbis, I'm officially an author as well. Or am I? Either way, I'll take credit when things go Right, and disavow myself if/when things go wrong. ;-)

Interestingly enough, they changed their license from LGPL to BSD because too many .com'ers were afraid of the GPL. This is actually exactly my stance on the GPL for LAM/MPI. We're in discussions right now as to what to change our license to (right now we have a proprietary ND license, but it says you can do anything you want with the code -- it's more or less the BSD/Artistic license, but ND lawyers wrote it). So someday we'll change the LAM/MPI license, and probably to a stock license so that people understand it easily, but probably not until we go to IU.

I got my master/slave tree-based booter running yesterday.

Very cool. When it's done, it dumps the final tree that it used into GraphViz format so that you can view it as a jpg or postscript or whatever. The tree that is uses changes every time I boot on the helios cluster -- new downed nodes, and/or timing between when one slave steals work from another forces changes in the overall tree structure. It's very cool.

I need to drop in the final bits to make it do a complete lamboot and launch a lamd after the tree is made. This won't be too hard. But I'll probably work on the IMPI chapter of my dissertation first:

February 27, 2001

Dave and Lisa have been conducting a secret office affair for the past several months

A darkness has fallen over the land.

Brian found some unexplained failures in LAM testing yesterday. These failures occurred on multiple platforms in different places. The failures were that test programs just "hung" for apparently no reason.

This is not good!

We're tracking this down, but it will take some time. :-(

Some quickies:

I'm dissertating on my IMPI chapter. It's basically the MPIDC paper, but I've added a bunch more sections and topics, and so far I've added one more figure (might possibly add more; after all, a picture is worth a thousand words). Even with just this one chapter, my dissertation is up to 41 pages. Woof! This has actually been the majority of my time recently.

Thank goodness for comments in code! Writing about IMPI (i.e., something that happened over 24 hours ago) has been difficult because I have to refresh myself with exactly how all this stuff works. Thank goodness I put in [sometimes lengthy] comments in all the IMPI code so that I can have a hope of remembering what I was thinking when I wrote that code.

My church (Epiphany) had their DSL router installed yesterday. I'm going to stop by on Thursday and see if I can get them up and running.

House stuff seems to be progressing well. T-6 days.

We finally decided that we goofed on the Cisco proposal yesterday. Lummy and I were wandering through a bewildering array of Cisco equipment, trying to decide what we should ask for, and finally realized that we had no clue -- we should have involved a Cisco sales rep long before this point. Since yesterday was the deadline for proposals, we decided to punt and do a propsal later, from IU.

Back to dissertating...

queeg has been up for 72 days without rebooting.

There are 926 xmms instanaces running on queeg out of 1005 processes total -- 92%. I predict an xmms crash in the not-too-distant future.

February 28, 2001

Don't mess with the guy with the way-back machine, Dave

Did much LAM work today, and very little dissertation work. :-(

The good news is that pending some final approvals and Arun finishing his part of the FAQ, and Brian making the RPMs, we should be ready to release! We're waiting on the approval of some of the Llamas, and then I might release it to the Linux distributors (RedHat, SuSE, etc.) and let them give it a final test -- just to ensure we didn't do anything stupid.

I did find one bug tonight; not in the MPI functionality itself, but in the new program lamnodes, which I fixed up. Amazing that it lived so long. We'll have to think of a way to test that kind of stuff -- our current test suite really only tests MPI things, not LAM things.

I'm downloading the new Mandrake 8.0 beta; I'll install it on my laptop. It has some things in it that I really want to try out (Eazel, Nautilis, Evolution, KDE 2.1, etc., etc.). I haven't used my laptop seriously in a while, so there's nothing important on it --
this is a perfect opprotunity to wipe it clean and try out 8.0.

Not much else exciting happened today. Did a bunch of house followup stuff, and, as I've done before, I'll spare you all from the boring details. Suffice it to say that everything is proceeding right on track, and things look good for Monday. T-5 days.

March 1, 2001

You've got a heart of stone, Dave; would you like to do the honors?

I spent much longer than I intended over at my church installing DSL today.

As a result, I missed the weekly LAM meeting. Brian and Lummy were on the road at IU, anyway, but Arun was around. Oops. I am evil. :-(

It turns out that we got a misconfigured DSL router. Luckily, I had brought my windoze laptop, and was able to sit down in the basement of the church's administration building with the router and talk on the cell phone to tech support. It took about 3 hours (with long periods of, "Hmm... let us try some things and call you back") before they got the router going -- they did it all by remote admin. Kinda cool, actually. I think they even flashed the BIOS to upgrade to the latest rev.

But in the end it all worked out. So I had to reset the TCP settings on all the machines in the building and add virus protection to the small number of machines that didn't already have it (they never had net access anymore, so it didn't matter until now).

Today also reaffirmed my believe that Netscape 6 is a piece of junk.

The story that follows is a bit extreme in what I was trying to do, but suffice it to say that Netscape performed horribly, and IE performed adequately.

Two of the staff members at the church have fairly old machines: pentium 133's with 16MB (!) of RAM, and run Windoze 95. They even had the original version of IE (3.something, IIRC). Most web sites won't even work with that version of IE anymore (including windowsupdate.com
-- IE 3.something has no java, no javascript, no CSS, etc.), so I decided to upgrade them to the latest IE.

Well, the 3.something IE couldn't view any sites to download the latest IE. Which is actually pretty ironic in itself. :-) But frustrating, nonetheless. :-(

So I decided to download netscape instead. You can only get netscape 6 these days (or at least you can only get netscape 6 from an IE 3.something browser). So I downloaded and installed Netscape 6.01.

S L O W C I T Y ! ! !

The machine was swapping so hard that it literally took minutes for any mouse action (hover, single click, double click, etc.) to take effect. Even if you left the computer alone and did nothing, it would continually swap thrash like a bad habit. The hard drive was so busy that it was like listening to a mini gang war inside the chassis (complete with munchkin-like cries of "help me!" and "oh God NO!").

"Painful" is not enough to describe how slow it was. "Agonizing" starts to describe the feeling. I mean, the computer was not just slow, it was in its death throes. Navigating to and downloading the IE installer took over 45 minutes. Incredible.

So I installed IE 5.5 and it worked great. I mean, yeah, it's still a pentium 133, but we had no constant thrashage as there was with netscape 6.0.

So I uninstalled netscape 6.0 faster than Notre Dame shorted its VA Linux stock, and ran with IE 5.5. It does pain me to say it, and I've said it before, but IE really is a better browser these days. I've even been woefully unimpressed with Mozilla on linux. Blech. I have hopes for Konquer and Natulius (sp?); I haven't gotten around to trying the newest versions yet.

Speaking of which, on a lark, I tried the new Mandrake 8.0 beta on my laptop last night.

I wanted to see KDE 2.1, the new Gnome, Evolution, and Nautilus, as well as Linux 2.4.

Are there still bugs?

Oh yeah, baby. Lots of them. Here's some of my experiences:

No matter what I did, I couldn't get PCMCIA recognized. It seemed to be a PCMCIA-enabled kernel, but then again I'm no expert here (this was my first experience with 2.4).

Someone really needs to scrub their default-selected list of apps. On their "features" announcement page for 8.0, Evolution and Nautilus are two of the highlighted apps. They're not installed by default -- you have to go into "individual package selection", find them, and select them. And once you do install Evolution, it doesn't run because it's looking for an earlier version of some shared library than is installed by default (and trying to install the earlier version breaks all kinds of dependencies, etc., etc.). pine isn't even installed by default. WTF!?

DrakConf, their all-in-one system configurator, seems to a) be missing tons of stuff that was in 7.2, b) has oodles of debugging output, some of which contains "ERROR: blah blah blah". How could they release this?

I do have to admit, though, that Aurora, the new GUI boot sequence, is cute. I tried to use DrakConf to change the orientation of the boot icons in DrakConf to no avail (the change didn't seem to want to commit).

Props to 8.0 for correctly determining my video hardware for X configuration. I didn't get far enough to see if it ID'ed my sound hardware properly.

After these problems, I determined that I had better things to do with my time and re-installed 7.2 on my machine. I'll wait for a stable 8.0. It seems like they only released this beta to meet some deadline. So unless you're a 'drake developer, and/or into pain, I'd advise waiting until 8.0 is stable.

Don't get me wrong -- 8.0 looks promising, but it needs work before I'll use it.

March 4, 2001

Ooohh... secret keys.

Spent a good deal of yesterday afternoon trying to make Brent's new Dell box dual boot Linux/Win2k.

Unfortunately, I ran out of time before I was able to get it to work. I could boot either linux or win2k, but not make it successfully boot either.

Brent mentioned the virtues of DirectTV over cable. In particular, the cheapness of DirectTV over digital cable (~$30/month vs. ~$50/month -- comes out to $240/year, which is nothing to sneeze at). My only question (which I didn't think of later, of course) is, "How do you get NBC/CBS/ABC? How do you see Friends?"

Ok, that's 2 questions. Cope.

I had to leave to meet Janna and Mel and Ruben (Muben? Rel? Hmm. Will need a conglomerate to refer to them...) for dinner at Judge Roy Beans (I was actually an hour late... oops). A good dinner; was good to see them all. Mel/Ruben just had their first child 3 months ago, and this was essentially the first time they had been out since (Mel's mom was in town and was babysitting). They said that it was the first time they had been speaking in complete sentences since Lydia was born.

A good time was all. Our waiter was definitely an induhvidual, though.

After I got home, I did a few web searches and bugged Lummy; found some solutions that should allow Win2k and Linux to dual-boot properly.

I spent a little time this morning improving my journal program --
something I've been meaning to do for quite a while. I used to type all the HTML for my entries, and I've been getting quite sick of it. So I put in a bunch of shortcuts, and cool function-based general mechanism:

"=====" is turned into my favorite cenetered-half-line <hr> thingy.

The <em> and </em> tags can be abbreviated with "_". So _foo_ turns into <em>foo</em>. For example, this is emphasized.

The <strong> and </strong> tags can be abbreviated with "*". So *foo* turns into <strong>foo</strong>. For example, this is boldface.

The <code> and </code> tags can be abbreviated with "[" and "]". So [foo] turns into <code>foo</code>. For example, this is code (pine doesn't render code properly :-\ )

Anything between { and } is a function call. Everything from { to } is replaced with the output of that function call. The first word (quoting, of course, applies) is the function name; the rest are arguments. The only function implemented right now is "a" (or the synonym "href"). If there is one argument, "a" does:

Any of the above special characters (_, *, [, ], {, }) can be escaped so that you can still have them in the text.

And perhaps most importantly, I have accidentally lost journal entries (sometimes long ones, too) because I hit ctrl-c or ctrl-d at the wrong time, and the journal program quit. So I put in some signal handlers to catch these kinds of things and save anything that has been typed so far to $HOME/dead.journal.entry (homage to $HOME/dead.letter, of course). And when the journal client starts up, it looks for $HOME/dead.journal.entry and, if it exists, asks if you want to preload it into your rant.

These are extremely useful features. I rationalized spending time on them this morning in that it would allow me to be Lazy -- it saves me time in the future.

(actually, typing this entry found two minor bugs that I've now fixed. Rock on!)

Those who use jjc, lemme know if you want a copy of this stuff.

T-1 day until we own a house.

Woohoo!!

But today, back to dissertating...

I've got 707 xmms's running on queeg, out of 795 total. That's 89%.

Coincidentally, queeg has been up and running without reboots for 77 days. The other day, the history in one of my kterms was well over 4000 (I think I've killed that kterm since then).

March 5, 2001

Chez Squyres

WE OWN A HOUSE!

Pictures, of course, at http://jeff.squyres.com/pictures/.
The walk-through went very smoothly this morning. We really only had a small list of things that the builder will fix next week. And we have a full warranty on the house, so any other things that we find over the next few weeks will be no big deal.

March 10, 2001

Matthew wants to know if he's going to be fired

Journal entries are going to be a bit sparse over the next few weeks as a frantically scramble to finish my dissertation by all the deadlines.

Some quickies:

Went to the OSCAR meeting meeting with Brian. I gave a talk on LAM, and expressed the general rocktitude of LAMness. I now understand what these guys are trying to do, and how it is different than Scyld -- it's essentially what Bill Saphir's group at LBL was trying to do with BLD (Berkeley Lab Distribution). That is, but a "Beowulf 1" cluster on a CD. Sort of a ExtremeLinux 2. Scyld is more like "Beowulf 2", where they have single-system image, etc. If they succeed, I think it will be a good thing. To make a long story short, we got what I wanted: both LAM and MPICH will be included in the OSCAR distribution.

I had a few long talks with the Scyld guys while we were there. We finally realized exactly what we have to do in LAM to make LAM work on Scyld. All told, it's probably about 20 lines worth of code changes (including configure.in changes). It's really easy. The talks with the Scyld guys were good, also, because it seems that they didn't know much about parallel run-time systems. Likewise, we didn't know much about Scyld. So it was a most excellent exchange of information. Brian is setting up a mini-Scyld on 3 of our alphas this weekend, and will be doing the initial LAM porting to Scyld.

We decided that the initial port of Scyld is easy. But how to do more than that, and take advantage of some of Scyld's nice features will take a lot more thought and discussion. It kinda changes the ballgame when you have many machines that are emulating one single machine; there are many... er... "issues" is not the right word. Perhaps "definitions that need to be created" would be accurate.

Anyway, the Scyld guys seem like a cool, hackish, fun-loving group. It will be good to work with them.

I also had a long chat the Patrick from Myrinet. I don't think that I can repeat much of what was said (not that it's secret, I just don't think that the conversation was intended for public consumption), but needless to say he's happily waiting for LAM/gm. Rock on!

The conference was at the NCSA in Champagne/Urbana, IL. We took a tour of their machine room (actually, machine building). It was most impressive. There were only 2 SGI machines there, but each with many nodes (hundreds? I don't remember). They had a test x86 cluster as well; perhaps 64 nodes or so. And then a brand-new multi-hundred x86 cluster that was just being installed by IBM. Fully connected by Myrinet. It was huge. We also saw their Itanium cluster (IA-64); yes, these machines actually do exist. They were still under NDA, so the NCSA folks really couldn't say much about them, but they hinted that their performance was fabulous.

On my drive home from IL, the realization hit me that my dissertation is literally due in about 2.5 weeks. Ugh.

I've spent the rest of this week writing and dissertating. I'm still writing the software, too. poggenc is coming along; I think I've worked out most bugs in the serial code (I re-vamped just about all of the thread stuff last night and made it much simpler, but I think there's still a bug or two left -- you can periodically get an audible "click" in the output. It's running through bcheck right now; hopefully, bcheck will tell me where my error is...). I'm writing the MPI code right now; after all, the whole idea here is to mix MPI and threads. Gotta get this stuff done...

SLAM is coming along as well, but I can't work on two things at once.

I had to be at the new house yesterday when the GE fridge, washer, and dryer were delivered. It's very nice to have a non-coin-operated washer/dryer! Our new bed was delivered today as well. Tracy and I have taken several carfulls of stuff over to the house; our apartment is getting emptier and emptier. The moving truck comes in 6 days (i.e., Friday) to move all our furniture and whatnot.

The good news is that our phone number finally propagated through Bell's computers and whatnot, and it finally appeared "DSL capable" on Wednesday. I ordered DSL for my new house from seminar room where the OSCAR meeting was being held (it was fully wired; Brian and I were logged in with our laptops); very cool. The order is progressing smoothly, but they can't tell me when it will all be setup. It may take up to 3 weeks. :-( Hopefully, it will be less than that.

March 11, 2001

The world will just have to wait for a lighter shade of whitewalls

Well, I did it today.

I finally gave up on parallel ogg/vorbis. I think Lummy will be telling me, "I told you so..." pretty soon.

I did some tests with the serial encoder to prove that it just won't work. I'm not sure if it's by design, or if it's a bug, or if I did something stupid in my tests. But I don't think it's the last one
-- I think it's one of the first two.

The first step to parallelizing anything is to split it up into lots of little chunks so that you can process those chunks independently, and hopefully simultaneously. Ogg/vorbis provides a nice way to split the work up into chunks.

However, from my tests, it seems that if you process these chunks in any order other than sequential, the resulting output is different. i.e., you get nondeterministic output since all the chunks will effectively be running in some different order every time you run the program. BONK!!!

So I sent some mail to the vorbis-dev list this morning about this problem, but haven't gotten any response yet. I think the developers might be traveling or something...

So I formally decided to ditch that from my dissertation. Which means I need another sample application for that chapter.

I spent the day de-ogg/vorbisizing the code and turning it into a real framework. I also scavenged the TIFF reading/writing code from the old PIPT, and threw in a bunch of extra datatype handling functions to boot.

I named the whole thing "Son of PIPT" (or "Son of PIMP", for those in the know). It now generates a libpipt.a and and libsop.a, and I have one sample application that compiles/runs right now. The application just provides a function (actually, a class) for input, a class for the worker, and a class for the output. Then it invokes the SOP engine to do the rest.

It took literally all day to get this far, but it seems to be working. Now I'll need to code up a real application and add in the MPI hook-ins (it's only threaded right now, but I did all the MPI groundwork last night in the ogg/vorbis code, so it shouldn't be too hard to add the MPI proxies into the general framework).

On the other hand, Brian did a fair amount of research into the laptops that we can get for IU. The Dell Inspiron 4000 with external keyboard, mouse, and monitor looks really promising. If Lummy gives the go-ahead, I'm gonna sign up for one of those puppies!

March 16, 2001

What was the final score on Dave's coffee cup today?

"The warriors may get all the glory, but engineers build societies."

A great quote; I may have even put it in my journal before. Amusingly enough, it is from Belanna Tores (probably spelled all wrong) from Star Trek Voyager. I'm not really a trekie, but I do enjoy the shows. Belanna's character said that in the rerun that was on tonight.

Brandon Moore and ..... released their Palm Pilot app that plays ND school songs (the fight song, the alma mater, and the victory clog). It totally rocks. You must go download it. Now.

In the words of Arun, "if you don't have a palm pilot, go buy one so that you can use this app. And Jeff Squyres is a God." Ok, I'm paraphrasing.

I sent a message about this palm pilot player to many, many ND grads. I got several positive comments back fairly quickly. Including one from Renzo, who, within minutes, submitted a bug report. What a geek. :-)

Dan, a friend of mine who graduated with me back in '94, had a good quote:

"Had an amusing inbox incident today. After receiving your original e-mail announcing the palm pilot app, I received multiple copies of that same e-mail from different sources over the next 2.5 hours. There's got to be some way we can make money off of Jeff's Notre Dame connections..."

Anyway, this app rocks. Go download it. Now.

Stop reading -- go download it!

No, really!

Did much moving stuff today: we hired a service to come over an move all of our furniture to the new house. So I spent all morning doing that. The apartment is getting really barren.

The three guys who came to move the furniture we quite friendly; I got along quite well with them. Over the course of the morning, I told them that I was "a computer geek", and told them that I was finishing up at ND.

We actually had a lot of laughs, talked about a variety of subjects (including other "professional students" that they knew --
one of their nieces apparently is about 4-6 hours away from about 3-4 majors; she just can't make up her mind), and they did a pretty good job moving all of our stuff.

After they had moved all the furniture into the house, when they were getting ready to leave, one of them asked me, "So are you like [my niece] -- you just can't decide on what major to finish on?"

"Oh no, I already have three degrees -- I'm working on my fourth. I have undergrad degrees in English and Computer Science, a Master's in Computer Science, and I'm working on finishing my Ph.D. in Computer Science."

He just looked at me.

"Hey, I told you I was a geek -- you just didn't know how much!"

He kept looking at me.

He finally said, "Yeah, but you just seemed... normal."

I laughed. Really hard. :-)

DSL should be installed at the house next week. The apartment is so barren that I'm not too excited about spending time there (even though it has DSL), so I'll probably head up to ND next week. Yes, I'm just using Notre Dame for its high speed^H^H^H^H^H^H^H^H^H internet access.

My uncle e-mailed me asking about DSL. Indeed, DSL would have saved them much money this past month. A great quote from my uncle:

"You came this close // to being minus two cousins when we got the phone bill a few days ago..."

Brian is going to Sandia this summer to do All Manner of Things LAM. Well, he's actually being funded to do specific things in LAM (fault-tolerant things), but either way, it's Rockin' cool. Brian also won some DOE fellowship the other day. Talk about being set... that just rocks.

Off to do some more moving things, and then spend our first night at the house (there's no bed left in the apartment). Woo hoo!

March 24, 2001

Queeg rides again

Queeg rides again.

After successfully setting up DSL last night, I went over to the apartment today to get queeg (my desktop, for those of you who don't know). The plan was to leave my router box over at the apartment for a few days so that squyres.com could continue to be served properly while I was switching IP numbers.

Unfortunately, I forgot to bring a screwdriver, so I couldn't open up the router to get the extra NIC out of it (I have a computer back at the house to be the new router, but I needed a second NIC for it so that it could become a router). So I made a command decision and just took the router down and brought it to the house.

After playing around with settings for a while, I finally got the networking going again, and I now have a nice LAN going in the house. The router is now in a closet, along with my new telocity modem. I don't think that they will put out much heat such that being in an enclosed space will be a problem, but I'll keep an eye on them.

I moved the DNS for squyres.com and waynetruevalue.com to register.com -- I think that their server farm is much more resilient than my and Darrell's home computers, and there's probably much less latency to them, too. I've been burned a few times with DSL outages, or stupidity on my part such that badness happened to squyres.com unexpectedly, so I figured that hosting it at register.com (it's a free service, since I bought the names from them) was a much better idea. So since I unexpectedly took the old squyres.com DNS server off the net today, it will take a little time to propagate around the world, and there is a void where everyone else things it should be right now, but that will be fixed soon enough. 48 hours at the most.

I see that all my names are now on register.com's DNS servers, and they're starting to propagate, because the whois info is starting to propagate. I've already had a few calls from family members, though, asking why mail was going awry. Oops. Sorry -- I really didn't plan it this way. :-(

I called Ed and told him to update www.fhffl.com to point to my new IP address, too. I see that he's already updated his register.com DNS entries, too, so that's just in propagation delay right now.

Queeg is now happily running away in his/my new home.

One bad thing, though -- I accidentally plugged the wrong power supply in to my linksys switch and fried it. Arrgghh!! The plugs are all the same size... Luckily, I have an old hub lying around, but it's performance is already sucking (since I'm streaming MP3s from my router to queeg). I'm just gonna have to bite the bullet and go buy a new switch. Grumble...

I'm playing The Matrix soundtrack, good and loud. This sub-woofer really rocks. I've heard it good and loud in the LSC before, but my office is a much smaller space than the LSC, and that just makes it all the more powerful. I can feel the woofer vibrating in the carpet. Yummy!

There are a measley 19 copies of xmms running, out of 92 total for a wimpy 20%. We'll keep playing MP3s to pump this stat up. :-)

I demand an immediate and comprehensive investigation into the disappearance of office canes!

Quick journal entry today:

I was up at ND all week to spend some quality time in the library, and to touch base with Brian and Arun w.r.t. LAM/MPI.

Got a whole bunch of good stuff from the library from the various journals and whatnot for my dissertation.

Scyld stuff for LAM is turning out to be more difficult than we thought (imagine that). Biggest source of problems: Scyld is "single system image" because it doesn't force you to have a common filesystem (and when everything in unix is a file, this is kinda critical).

Lots of Myrinet progress -- still not complete yet, but I think we're down to the "mopping up the details" phase. Long and short sends seem to work. There's a few issues with fork, and I think MPI_IPROBE doesn't work properly (possibly MPI_PROBE as well), but it's getting to be releasable.

HP has graciously given us a machine to play on (both for LAM testing and for IMPI testing), but it's taking a while to get it setup
-- first they forgot to add an account for us, and now we're waiting for C/C++/Fortran compilers and HP/MPI to be installed. Hopefully, it will be ready by Monday.

LAM finally works on HP-UX 10.20.

Got home last night and fired up DSL at the house -- it was
to setup on Linux! I plugged in the modem, plugged in ethernet to my laptop, and fired up a browser to the 10.x.y.z address that my install book said to do. I agreed to the TOS, typed in my phone number, and the modem installed itself (it seemed to look up configuration and whatnot from central Telocity servers and whatnot, keyed on my phone number). Pretty slick! The router than rebooted, I forced Linux to re-DHCP (just removed my card and put it back in), and I was live on the net! All in all, I was damn impressed -- you don't get much easier than that.

Ironically, today, Tracy wanted to get her e-mail, so I figured I'd just plug ethernet in the 'doze machine and let it DHCP from the modem and it should be good to go. Nope. There's a whole separate (and non-trivial) install procedure that I really didn't want to go through. Weird that the Linux install was , but the Windoze install was complicated. Must go get my backup linux router box from the apartment today, and setup NAT properly, etc., etc.

So doing more house stuff today, including setting up my office and bringing queeg and my new router over. I'll have to leave the old router running at the apartment for a few days to give DNS a chance to propagate the update of the IP addresses nicely (can take 1-3 days to go all the way around the world).

There are currently no xmms processes running, 'cause I never bothered to setup sound on my laptop. ;-)

March 28, 2001

I am a cipher wrapped in an enigma, smothered in secret sauce

This is somewhat annoying.

DSL has gone out today, sometime between 10-11am. I called Telocity probably about 30-45 minutes, to see if they had an estimate as to when it would be back (thinking that it was a general outage). Turns out that it's just me -- there's some kind of glitch between me and Telocity.

It appears that I have connectivity to my DSLAM (if I get this right: "digital subscriber line access ....". Ok, I don't remember the last word. It's the routing point at my local phone company (Bell South) to Telocity. Anyway, Telocity can reach my DSLAM, but not me. I'm not sure what I can reach -- I can ping my modem, but I don't know if the DSLAM is supposed to respond to a ping or not (it doesn't).

Two things concern me:

Yesterday, I called Telocity to disconnect my old DSL line (i.e., the one in the apartment). They said that they had to confirm this through e-mail; they'd send me an e-mail, and I would have to reply to it. This kinda struck me as strange, especially since she asked me for my e-mail address. So even if they're trying to get a solid confirmation from the customer, who's to say that I'm still not a malicious person with a random hotmail account handy that I could use to "forge" the real user's authorization. It's like the "Sir, I cannot accept your credit card because it is not signed" urban legend.

Anyway, I got the e-mail this morning and replied to it, being careful to specify my old phone number. I'm kinda concerned because my new DSL service shut off about 2-3 hours later... I called up Telocity billing a little while later, and they swear that this is not the case. But I'm not [yet] convinced...

When BellSouth installed our phone a few weeks ago, they ran a cable from the pedestal at the street to the side of our house. Literally. And they left that cable sitting on our lawn. They finally came back and buried the cable today. My phone still works, but did they jar some secondary connection that disconnected my DSL service? Hmmm...

All this is pure speculation, but the timing is suspicious. Meanwhile, it's at least 7 hours later and I still don't have connectivity. Arrgghh!!!

I took the opportunity to do a whole bunch of house stuff, and fixed up a few nagging issues in the jjc code, and I did some more work on my taxes (good and complicated -- ugh!). But I wasn't productive in terms of dissertation stuff. :-(

I did read some papers from a SIAM conference that Lummy went to recently (he gave me the CD of the proceedings). Some of the papers made me mad; at least one author claimed that MPI sucked because they were using a "pure MPI code" on a cluster of SMPs and the message passing performance on each node sucked.

DUH!!! MPICH doesn't have a TCP/shmem device. This is a well known shortcoming of MPICH. Did the authors use LAM/MPI, which has a TCP/shmem device? No, they just claimed that MPI sucks, only because one aspect of one implementation of MPI sucks. (don't get me wrong, LAM/MPI certainly has its own weaknesses; this issue just highlights one of MPICH's weaknesses)

Indeed, MPICH is just about the only MPI that doesn't have a TCP/shmem device; LAM/MPI does (as I pointed out above), as well do all the vendors MPI implementations. The authors didn't try any other MPI implementation -- only the one that would support their theories.

Had I been to this conference, I surely would have pointed this out to the authors. I still might well mail the authors. Grrr....

I got a carpet chair mat thingy today for my office. I can now roll my chair with ease. Yummy. :-)

I've been using TurboTax for my and Tracy's taxes. I went out and bought "TurboTax/State" today to do Tracy's Kentucky taxes (I was an Indiana resident last year, and already have the TurboTax for Indiana/2000). Taxes are complicated. Taxes suck. Some of my friends are convinced that taxes are illegal (they have a great story line/rationale as to why this is so), and they just don't pay taxes. Although their rationale is quite convincing, I'm not quite to the point where I won't pay taxes. :-)

Right now, I'm just trying to figure them out. Ugh. Very complicated, because there's 6,000,000 special rules and exceptions. Yes, I'm 29 years old, and my dad has always done my taxes before this. Is that pathetic?

Hell, no! It saved me oodles of time up until now. :-)

But I figure that now that I'm married and have a home, I should probably try to figure out these taxes things...

I had to reboot queeg today; it had mysteriously dropped off the CAN (right about when DSL went out. Hmm...). So there's only a measly 55 copies of xmms running right now, out of 124 total, for a grand total percentage of 44%. The high point earlier today (i.e., pre-reboot) was 339 copies out of 413 total processes, for a total of 82%. Not impressive, but nothing to sneeze at, either. Give it a few more days, and we'll have respectable xmms-crashing numbers again...

March 30, 2001

And my special "spooky" version of the Hokey Pokey

Been doing housework and dissertation work.

Had another DSL outage for several hours again today. Foo. Turned out that power cycling the DSL modem fixed my woes. Arrgh!

Saw a great quote in a mindless action novel that I was reading the other day. A bunch Americans who have colonized the moon, and the Russians have flown several soldiers to the moon to take over the American colony. The stereotypical Russian soldier is squaring off the the lead American:

American: I must warn you, Major. We have a secret weapon. You and your men will surely die. Russian: A crude bluff, Mr. Steinmetz. I would have expected better from an American Scientist. American: Engineer -- there's a difference.

And, of course, the engineers go on to defeat the soldiers and save the day. God bless America.

pine has been annoying the crap out of me lately, so I gave wanderlust a whirl today (an emacs mailer that can do IMAP, SSL, threading, filtering, archiving, online and offline operation [which is truly cool -- pine really needs that], and make your dinner). Wow -- it's complicated. Perhaps even more complicated than mutt.

I couldn't get some of the basic functionality working properly (sometimes the index and the messages wouldn't match, or wouldn't display at all). So I gave up and decided that I'll spend quality time with it after I defend my dissertation. It looks like a really powerful mail program, but I just don't have time to spend with it now...

Nothing else too exciting to report. We turn over the keys to our old apartment tomorrow. Woo hoo! We sold 2 of the 3 room air conditioners; the third will stay in the apartment (it's a big momma-jomma) -- the owners said that they could probably sell it for us.

Much dissertating to do this weekend...

During my DSL woes today, I rebooted queeg, so I've only got 78 xmms's running, out of 157 total, for a pathetic 49%. Woof.

LAM/MPI supports portions of the Interoperable MPI (IMPI) standard in a separate distribution -- the 6.4 series (also downloadable from the LAM/MPI webs site). It is expected that the IMPI extensions will eventually merge into the 6.5 series.

XMPI 2.2 will not work with LAM/MPI 6.5. A new version of XMPI will be released soon that will include support for LAM 6.5.

The full source code for LAM/MPI is available for download. Linux RPM's for all three of LAM's message passing engines (pure TCP, combined TCP/shared memory with spin locks, and combined TCP/shared member with semaphores) are also available.

All downloads are available for download from LAM/MPI's new web site (please updated your bookmarks and links accordingly):

This list is for questions, comments, suggestions, patches, and generally anything related to LAM/MPI (in order to control spam, you must be a subscriber in order to post to the list). Web archives of the lists, as well as individual and digest subscriptions are available. See the following URL for more information:

This is a low-volume list that the LAM Team uses to announce new versions of LAM/MPI, important updates, etc. Public posts are not allowed. Web archives of the lists, as well as individual and digest subscriptions are available. See the following URL for more information:

I'm off to astound the world with more feats of adequcocity

Some quickies:

Spent the entire weekend (and I do mean entire weekend) doing taxes and old Army paperwork. My taxes seem to be a bit complicated this year 'cause it seems that I did an improper IRA conversion to a Roth IRA, which I now have to undo. Ugh! The Army paperwork is mostly really old stuff that I should have done a long time ago. I have to send all that stuff to Georgia first, and then they'll sign stuff and send it on to the proper destinations. Oops. :-(

Tracy took care of doing a final walk-through of our old apartment (a.k.a., "the hell hole"), and it's now finally out of our possession. Woo hoo! We sold 2 of our old room air conditioner units, and still have the Big Monster one left -- we left it in the apartment, and the owners said that they would have the new renters call us to work out a deal.

I'm going to get a fax machine for home. It's been really, really annoying having to have Tracy fax stuff for me, and the frequency has increased lately, such that it's gone beyond "one or two faxes in a while", such that I really should be doing them at home and not on the company bill.

The ND women's basketball team rocked last night and won the national championship. That's just so cool! Many congrats to them (like any of them will ever see this :-) -- they did themselves and ND proud.

LAM now seems to pass all tests. I'm going through a checklist before releasing. Brian and I are synching up later today, so it's quite possible that we're finally going to release LAM 6.5 today. Woo hoo!!!

Finally saw Gladiator this weekend. Not a bad flick. I give it 10 minutes. Probably would have given it more, but it was way too hyped up for me.

xmms crashed earlier today, but much earlier than it usually does. There were only 481 copies running (out of 555 processes total). <shrug>

Just to clarify (some have asked) -- I was not affected by the Northpoint DSL shutdown. Telocity is my ISP, and they use many "routing" companies (including Northpoint). The "routing" company around here is Covad, so I wasn't affected. My recent DSL woes have been caused by increased solar flare activity (a catchall to blame random occurances on).

That's it for now.

There are currently 34 xmms processes running out of a total of 112 (30%).

Why's your mom taking the SATs with you?

Forgot to mention -- I had to change my clocks for daylight savings time for the first time in many years. I now live in a part of the world that actually participates in daylight savings time, so I have to switch my clocks twice a year.

April 4, 2001

Matthew just told me to go fetch his lunch

We released LAM 6.5 and 6.4a7. I included the press release about this in a prior journal entry. There was much rejoycing.

But then I checked my e-mail this morning and saw that some guy claimed that he couldn't start parallel jobs under Linux with LAM 6.5. And then 3 or 4 others chimed in saying the same thing. Nooo!!!!!

Needless to say, this is a software developer's nightmare: discover a bug immediately after a big release. It took a while, but I tracked the bug down to a faulty test in LAM's configure script. And it only seemed to affect Linux (it had to do with pseudo-tty behavior). Arrrgghhh!!!

Someone else also found a legitimate bug in the C++ bindings. It's amusing because the C++ bug has been there for quite some time, but it just happened to be found on the heels of the Linux pty Big Bug.

So I released 6.5.1 and 6.4a8 this afternoon.

Hopefully, things will be ok now.

No, Trond from Redhat just e-mailed me and says that all the tests are failing on his Linux 2.4.2 machine. Arrgggghhh!!!! In all fairness, we've never tested on 2.4.2, so I'm kinda hoping it's just some kind of stupid difference between linux 2.2.x and 2.4.x. He's going to give us access to his machines tomorrow to give it a whirl. We'll have to see.

Ugh. All of this made today be pretty crappy.

I finally had my windshield replaced the other day. It cracked itself quite a while ago after a particularly cold evening. I just came out one morning and there was a 2 foot crack across my windshield. It clearly wasn't impact damage of any kind; it just appeared there. So I assumed it was thermal damage.

Anyway, now that I have a garage, I finally called USAA to start a claim on my windshield. It didn't cost me a dime, and they had a guy out here the very next day to replace it.

I watched him do it -- it was fairly interesting. Lots and lots of sealant to keep those windshields in place, and keep water out. The guy told me that Saturns were probably his least favorite windshields to replace (this is all this guy does -- replace glass in cars; he's been doing it for 12 years, so I would guess that he pretty much knows what he's talking about) because they have a larger curve than most, and it makes it a bit more difficult to get the new windshield in, etc.

The main sealant that he used to hold in the new windshield was some caulk-like stuff that he put around the frame before he put in the new windshield. When he was all done, he told me to wait about 2 hours before driving because the caulk would need time to cure. The windshield would still stay in place if I needed to drive, 'cause there's other clips and strips and various insidious devices holding it in place, but apparently (and I didn't know this beforehand) the purpose of windshields is not only to keep wind and rain and whatnot out of the car, they are also to keep passengers in the car in the event of a collision. And if I drove before the caulk cured, in the event of a collision, the windshield could pop out.

So that's an interesting engineering issue -- making caulk-like adhesive and a plate of glass that is strong enough to hold up to several hundred pounds of humans and other loose objects in the car, assumedly all moving with a very large momentum. Woof!

I've had to get my DoD "secret" clearance updated. Apparently, my last background investigation was done in 1990, and they're only good for 10 years. So my original clearance has expired. I got a packet in the mail for my reinvestigation. I had to download some questionairre program (Windoze only, of course) that asked a zillion questions about my history.

One of the things that it asked was all of my addresses for the past 10 years. After some thinking about this, I was surprised to discover that I have lived at 16 different addresses over the past 10 years (including my current address). Wow. No wonder I hate moving!

I also had to be fingerprinted. This seems kinda weird, especially since I've been fingerprinted before (when I entered ROTC). A person's fingerprints never change over their life -- they expand a bit, but my understanding is that the unique characteristics of the whirls and whatnot stay the same, albiet they typically grow in size as your hands get larger. So why did they need them again? Who knows...

I went to my local police station and was surprised to find out that they only do fingerprinting on the third Thursday of every month (no joke). I know that ND security department does it if you just walk in (Brian had it done for his DoE clearance about 2-3 weeks ago). However, my local police department gave me the number to some adjoining precints, so I called them, and one of them does it every day.

I went today and had it done. The officer who took my fingerprints says that they do about 10-15 a week, for all kinds of different agencies. DoD (department of defense), DoE (department of energy), FBI, various insundry banking and trading firms, etc., etc.

I stopped at the mall to get Tracy's birthday present (next week
-- I know I'm safe, 'cause she never reads my journal :-). I parked at one side and had to walk clear across the mall -- through various department stores and whatnot -- to get what I was looking for. How annoying. And there's all the chatty folks in the aisles in the mall with the mini store displays selling cell phones and sunglasses and watches and portable walrus scrubbers, each one of them feeling the need to ask you if you want their particular product as you walk by.

Don't get me wrong -- the mall is the qunitissential symbol of American capitolism (Suzanne and Rich -- don't you dare start quoting facts at me here, I'm on a roll), but with all those stores all selling essentially the same items (are clothes from the Gap really much different than Tommy Hilfinger clothes?), I just have to ask myself: why?

How did I possibly enjoy going to the mall when I was a teenager? Oh, wait -- I didn't.

Maybe I'm just one of those people who likes to go get what they want and not have to bother with 16 billion choices. Maybe I was just in a pissy mood because someone found a real bug in LAM earlier today (this was most likely it). Oh well. End of topic...

My aunt gave me the e-mail addresses of my cousins Pat and Chris the other day. I mailed them, but they haven't replied yet, the little weasels. I'm sure they've seen my mail -- they're the ones who were almost sold into slavery to pay off the excessive AOL telephone bill last month, so I'm sure that they're online all the time...

Tracy's parents are visiting her grandmother in Illinois this week. They're stopping by this weekend to visit and to see the house.

Must go; have been doing LAM stuff all day, and no dissertation work. Ugh!

There are 449 xmms's running on queeg, out of 552 processes total -- 81%.

April 5, 2001

This is unbelievable

As I mentioned in a previous journal entry or two, I am in the process of filling out a background check for the re-upping of my "Secret" DoD clearance. This is a periodic and normal thing.

I downloaded the software from their web site (which had strong encryption export control warnings all over it), and filled out all the questions. At the end, it spits out a .zdb file.

Here's the part that astounds me: they tell me to e-mail this file to them.
They claim that this file is encrypted and it's safe to e-mail (I even spoke to two different people on the phone who claimed that the .zdb file is encrypted). However, there are some major flaws with this claim:

The very same web site that allowed me to download the "user" version of the software also had the "security manager" version of the software. This version decrypts .zdb files. So just anyone in the world can download the decrypting software and compromise my .zdb file.

I copied my .zdb file to my linux box and ran "file" on it. It said that it was a ZIP file. No way... Yes way. I extracted all the files in it and was horrified to see my social security number in plain text in multiple files. Some parts of the files actually did appear to be encrypted, but if just anyone can download the security manager version of the software, what does that matter?

The best part of this is the two agencies who are running this show. Their names are: "Defense Security Service" and "Security and Counterintelligence Management Office". You would think that with names like this, they would have a clue about data security.

And they wonder why people are talking about a digital Pearl Harbor...

April 8, 2001

Tublecane

Tracy's parents were here for the weekend. Partly to visit, partly to see the house, and partly so that Tracy could go shopping w/ her mom.

Tracy's dad and I did some house things, like install a new programmable thermostat and put pellets around the foundation of the house to repel ants and mites. The joys of being a homeowner.

I think Tracy and I set a record for going out to dinner for 4 days in a row. Woof. Thursday, we were both in snitty moods and didn't feel like cooking (that was the day I found out that my background check stuff wasn't encrypted). Friday was dinner w/ Janna at an English Pub a few minutes up the road from us (fish-n-chips, of course -- it's Lent!). Saturday was with Tracy's parents because Tracy didn't want to cook for her mother. :-) Today was with Tracy's parent's again because Tracy's birthday is tomorrow.

Tracy's parents leave tomorrow morning. It was a quick visit.

There's been lots of good discussion on the llamas list about All Manner of Things LAM. Good advice and tips and whatnot from the llamas.

Trond from RedHat is still running into weird LAM failures in some esoteric circumstances. We haven't been able to duplicate his issues.

I'm heading up to ND for this upcoming week tomorrow morning.

Been getting ready for my meeting with my dissertation committee this week (no, I'm not stressed...). I'll be spending some more quality time in the library this week, and will spend time writing writing writing...

Got some blinds for my office instead of these sheets that are hanging in the windows now. I'll hang them when I get back from ND.

(can't remember if I mentioned this in a previous journal entry or not) My new linksys switch came last week. I'd been waiting for quite a while, and kept checking on the status of it at amazon.com. The weird thing is that it said "Delivered", but I never saw it at my door. Hmm. So on Wednesday or Thursday, I called UPS and punched in the tracking number. They said it had been delivered on Monday... to the back door. Doh! Sure enough, it was sitting at my back door. It had been sitting there for 2-3 days. Gotta remember to check for those back door deliveries, I suppose. :-)

Gotta run now. There are 947 xmms instances running on queeg, out of a total of about 93%

April 14, 2001

Strange... the Garelli 5000 had exactly the same problem

Whew.

I'm behind on journal entries. Let's catch up:

I was at ND for most of this past week. The main goal was to have a synchronization meeting with my Ph.D. committee (which was on Wednesday). I drove to ND on Monday morning (Tracy's parents and I left at about the same time). Spent a good amount of time up at ND refining my presentation for my committee meeting and rehearsing with Lummy. All in all, things went pretty well, and my committee was pleased with my work. They made a few suggestions and clarifications which changes a few things that I had planned, but they're not too big of a deal. Rusty drove over from Argonne for the day, and it is always good to talk with him.

About 20-40 miles out of Louisville, I realized that I had forgotten my rollerblades. Doh!! I guess I'll be walking to ND all week...

Only Brendan, Brian, and I went to wings. It seems that BW3's has deleted our RLYBAD account on the trivial game. This is truly the end of an era -- we have had the RLYBAD account for years, and now it's gone. RLYBAD is dead -- long live RLYBAD!

OO Stamtish was fun. I only stayed for an hour or so. The new crew is working at Sr. Bar, so the opportunities for free stuff are now severely limited.

Went out to dinner w/ Dog on Wednesday night (which started with me stopping by his office and chatting around 6 or 7pm, and, an hour later, I said, "hey, let's go get some dinner"). Dog is good people. Also had some good conversations with Curt. Curt is also good people.

Chatted with Rich about his work -- he was frantically trying to finish his Ph.D. proposal by this Thursday. He's working on multithreaded message passing systems; we talked a bunch about how LAM works and whatnot. He's seems to have three main choices to do his work:

Use Sun/MPI, since it's thread hot. But ND still hasn't installed it (even though they've owned it for at least several months)

Use LAM/MPI, but it has the major drawback that it's not thread hot, so Rich would have to make it thread hot. Not a minor task.

Write his own message passing system from scratch.

We talked a bit about LAM/MPI and how it worked, and some general message passing things (who knew that it would ever be so incredibly hard to get bytes from point A to point B? It's much harder than one would think...) I tossed the idea out that he could use LAM without using MPI -- I pointed him at all the Trollius man pages and whatnot, and explained how he could get the use of the daemons without using our MPI layer, etc., etc. Who knows -- that might prove to be a workable solution for him.

LAM meeting on Thursday was good; Brian has had some success with Scyld. Although it's not quite what we want it to be yet, it does work. We'll probably make it a bit more slick before releasing it. Arun had done a little more on the Myrinet mop-up, but not much. After the meeting, I helped him add some environment variables which will allow the user to specify (at run time) the tiny and short message boundary sizes. This is an important tuning knob, and it turned out to be a little tougher to implement than we thought because we were using compile-time constants to size some static arrays. But we worked around it and it seemed to work; Arun's going to finish the testing this week.

Arun has made his decision to fade away from the LAM group, mainly since he will be staying at ND when we go to IU, and his future involvement with LAM is probably going to be pretty limited (if at all). He'll continue to answer LAM mail through the end of this semester, and get Myrinet out the door, but that will more or less be it. Sadness. :-(

I had one more meeting w/ Lummy on Thursday before I drove home. We talked about my committee meeting from the day before, clarified a few things, and set a few directions. I've got to write code code code to get the final polished version of my "manager worker" code out (although we decided that, strictly speaking, "manager worker" is not what this program does, so I have to come up with a better name, such as "distributed multithreaded data parallel framework" or something).

Drove home Thursday afternoon / evening. Scheesh, gas is expensive!

Went to Epiphany on Friday morning to have a look at some of their e-mail woes (I just switched a small "test" group of them over to Outlook Express with the DSL-provided e-mail). OE seems to keep freezing up on them, which is pretty surprising and disappointing (it's the most recent version of OE on Win 95 and 98 machines). I think what's happening is that when OE launches, it launches two windows -- the main window and a separate "checking your mail..." window. It even shows up as two items on the 'doze task bar. The "checking your mail..." window then prompts for their password.

However, sometimes OE puts the main window on top of the "checking your mail..."/password window. And therefore the user doesn't see it. So they start using OE, even though the "cym..."/password window is still there and waiting. OE is configured to check for their mail every 10 minutes. It seems that if they either manually click on "send/receive" or if the 10 minute timeout expires and OE tries to check for main, it gets confused because the "cym..."/password windows are already open, and hangs. Weird. And lame.

It seems to have a simple workaround -- always put in the password right away, even if it goes to the back (i.e., bring it to the front and put in your password). We could check the "save your password" box so that the issue never comes up, but I'm not a big fan of that --
I prefer users to have to think about security once in a while. Plus, it means that anyone could walk up to their computer and access their e-mail. This is probably not a big deal in a Church staff environment, but there are enough random people walking through the offices in a given day that it is something to consider. I hope we don't have to do that, but we'll see.

I got home around 2pm (did a bunch of other maintenance, too, since most people had taken Good Friday off), and spent the entire rest of the day doing taxes with Tracy. Ugh. The federal stuff was essentially done (just one or two minor corrections), but the state stuff was extremely confusing. I bought the Indiana and Kentucky programs for TurboTax. Indiana was quite good -- it did all the Right Things for our Tax situation (although it did have the annoying "feature" that if you started going through the interview and navigated to somewhere else in the middle of the interview, you couldn't re-start the interview where you left off -- you essentially had to ditch that data and start the interview again. Very annoying). The Kentucky program, however, sucked. Kentucky allows four filing statuses: single, married filing separately by on this one form, married filing jointly, married filing separately on different forms. Because of our particular tax situation, we needed to do the last option. But Ttax didn't have that option. After grappling with this for several hours (combined with finally figuring out that we needed to use that last option), we finally just looked up the relevant forms in Ttax and filled them out manually.

Figure this one out: Tracy, had had income in Kentucky for last year, filled out a 2 page return with a single additional schedule for itemized returns. A total of 4 pages. Me, who had no income in Kentucky for last year, and who was not even a resident in Kentucky last year, had to fill in a 2 page tax return combined with about 10 pages of stuff from my federal tax return -- all so that I could say that my tax owed in Kentucky was zero. Gotta love taxes...

That's it for now. I have a separate journal entry brewing about the whole Chinese/American plane collision thing.

There are 139 xmms processes running on queeg right now out of 211 total (66%). When I came home from ND on Thursday night, xmms was frozen, so I had to kill and restart it.

An editorial

A few words about this whole American plane colliding with the Japanese plane thing... I am not a diplomat. I am not a statesman. I am not wise in political ways. These are just my thoughts; they have no correlation to any official positions that I hold, nor are they related -- in any way -- to any of my employers. These are also not well-studied conclusions; they are just are my personal thoughts.

From our point of view, it seems that the Chinese pilot was clearly the aggressor. Our pilots claim that the plane was flying straight and level on autopilot and the Chinese pilot approached at high speeds, multiple times (getting as close as 3 feet from the left wing at one point). The third time, the Chinese pilot apparently misjudged his approach, which resulted in the crash.

This appears to be consistent with the facts that the Chinese plane is much more maneuverable than the American plane. Indeed, I know that if I was the pilot of the American plane, I'd be flying straight and level on computer autopilot for two reasons:

specifically so that I could claim that I was not the aggressor.

since any foreign pilot would be a variable (regardless of their degree of aggression or not), an unarmed plane only has one defense --
be completely predictable and hope that the other plane doesn't hit you.

This only makes sense.

It would also be incredibly stupid for the plane to have been in Chinese airspace. While I certainly have no knowledge of that plane's specific mission, I find it hard to believe that such an electronically noisy plane (and therefore easily observable by the Chinese) would have intentionally ventured into Chinese airspace during their mission (i.e., before the collision) without permission when we are not at war with them and with no means of defense. So I find it hard to believe that they were not in international airspace. Did the skirt the border? Perhaps. But were they in Chinese airspace? I doubt it.

Is this really what happened? I would tend to think so. I know some US military pilots, and I'm pretty sure that their reactions would be pretty much what I said above.

But was it really the case? It certainly makes no sense for the American plane to intentionally swerve into the Chinese plane. But did the American plane unintentionally swerve into the Chinese plane? If so, the Chinese plane:

clearly must have been too close to the American plane for accepted safety limits (i.e., the Chinese pilot had no time to react to prevent the collision), or

was far enough away (i.e., should have had time to react), but the pilot was so unattentive that he didn't notice the American plane lumbering towards him

Either way, the Chinese pilot would share at least some of the fault.

So what really happened? It's hard to say, and I wonder if the public will ever really know what happened. There are multiple factors which influence any situation:

Take 10 people who were all direct eyewitnesses at the scene of an accident, and you'll still get multiple different versions of the story.

We (the public) accept pretty much whatever the media says, even though the media distorts just about every story reported.

And let's not forget that it is possible that the American government is covering up the details of the "real" story. As much as my patriotism doesn't want to acknowledge this fact, it certainly could be the case -- the scientist in me has to concede that point.

So what really happened? I don't know for sure, but I'm inclined to believe some form of what the American pilots claimed. Are all the details exactly right? Perhaps not. But what they say generally makes sense, whereas the Chinese version doesn't.

As for the Chinese accusations about how the American plane landed without permission; technically speaking, they are correct -- the American plane had no permission. But they also had no choice. I do believe the American pilots saying that they had broadcast mayday multiple times and did a 270 degree rotation around the field -- the international signal for "in distress and not in touch with the tower". The fact that the Chinese authorities didn't acknowledge these signals is something that they have not answered.

I initially shared my fellow citizens outrage that the Chinese keeping the plane.

But then someone reminded me of the fact that we did essentially the same thing a few years ago when a Russian pilot defected and landed a MiG in Japan. We examined that plane thoroughly before we sent it back to Russia -- in crates. So it's hard to fault China for doing what it did (in terms of keeping the plane). It is incredibly advanced and secret technology, and it literally fell into their possession.

Granted -- the situation is slightly different than what we did (someone gave it to us rather than a forced landing), but the larger picture is the same: an advanced piece of technology came into their possession that the owners did not intend to happen, and the owners want it back.

As for the technology itself, the crew has said that they were able to destroy all the sensitive stuff in the plane before the Chinese boarded. I'm quite sure that all their non-physical codes were able to be destroyed (computer records, access codes, etc.) as well as any code books and whatnot. Indeed, even if they hadn't, I'm quite sure that as soon as the plane announced its intention to land in China, Pacific Command started the process of changing all relevant codes. This is standard procedure -- event in the event of a possible compromise, all codes must be changed immediately. So I'm not concerned there.

But as for the machines that ran the plan, and the specific crypto devices and other kinds of secret technology on the plane -- I have no idea whether those kinds of things have self-destruct mechanisms that can be activated in the event of capture. I hope so. I'm sure the crew did their absolute best to render any technology in the plane unusable and unstudyable by the Chinese. They are all experts in their respective fields, and are intimately familiar with the machines that they fly with. We have to trust them. And I do; they apparently had at least 10-15 minutes while still in the air to potentially start the destruction process (although it's not clear that they could start the process until they landed; the plane was pretty badly crippled), and they apparently had about 15 minutes on the ground to complete these procedures.

So I have to concede here -- this is probably not outlandish for the Chinese to do. Especially when you look at the fact that we've done just about exactly the same thing. That plane is now just a pawn in the big chess game of foreign diplomacy, whether we like it or not.

That being said, I'm making a big assumption here: that the Chinese pilot did not intentionally hit the American plane so as to force it to land in China. This is a possibility, so I have to mention it, but I would think that the tensions between our countries were not strong enough (before the incident) to precipitate such an action. Indeed, it would be extremely difficult, if not impossible, so plan and execute such a maneuver and guarantee that the American plan would still be able to land (i.e., that it wouldn't be destroyed). Indeed, there are other ways of forcing a plane to land rather than a mid-air collision. So I don't think that China did this on purpose -- it doesn't make sense.

Even more importantly than the plane is the crew. I think that this is what most Americans (myself included) are most inflamed about.

Keeping the crew was quite stupid (that's a gut reaction there). Yes, I can see the Chinese's political reasons for keeping them (and I do think it was political more than anything else -- there's no military reason to keep them), but that doesn't stop me from being angry about it. They needed to silence the American version of the story while their own version was spoon-fed to their public (their media is under even tighter control than ours), keep bargaining leverage in the situation, save face while claiming to wait for an apology from the U.S., and delay as long as possible so that they could keep their scientists and technicians working on the plane.

It was also in their best interests to keep the crew safe and relatively comfortable until they were returned. Think about it: if they had harmed any of the crew and didn't eventually have them killed (or otherwise kept from communicating with American officials), the American story would come out eventually, which would have been a political disaster for China (particularly with the pending trade deals and UN stuff). The crew had to eventually be returned and in perfect health with no mistreatment.

Sure, the crew were questioned. That is to be expected. I'm also sure that the Chinese officials knew that they would get little to no new information from our crew, because I'm quite sure that unless the crew were drugged (or otherwise coerced, but as discussed above, physical violence was not an option), they wouldn't voluntarily give any sensitive information away.

They gambled that they could hold the American crew for quite a while before American government would take a hard line. And they were right. Will there be any reprocussions? Maybe. But certainly less than if they had injured/killed any of the crew. It's hard to see how a new president would be able to take a hard stance and have direct retributions against a foreign power, particularly when that president was insistent upon negotiation for the release of the crew.

That being said, I have to admit that I'm extremely happy that no military action was taken. It would have been a more-stupid-than-normal reason to go to war. Don't get me wrong --
the crew is very important -- you never leave a crewmember behind. But going to war over the fate of 24 people is just not good statistics. The old adage, "the needs of the many outweigh the needs of the few" is highly relevant here; while I'm extremely happy that the crew is home safe, I think that they too (being members of the military) would have understood if the process had taken longer and/or gotten ugly. Military members assume a certain risk when defending the American freedom; we all know this and acknowledge it when we do our jobs. While everyone is happy that it didn't come to that in this case, it certainly could have.

All that said and done, I welcome home the crew of the our plane. Thanks for defending our country. It's said too infrequently, particularly when your everyday job can have you end up being held by a hostile foreign power. Thanks for keeping us free.

April 16, 2001

Oh the irony

WWWWWHHHHHOOOOOOOO HHHHHHOOOOOOOOOOOOOOOOOO!!!!!!!

I was just informed via e-mail that I won the ND SGI Award for Computational Sciences and Visualization in the College of Engineering at Notre Dame.

It comes with a nice prize check, which I'll be spending at least some of it on wireless networking for my home. I'll get the award at the graduate student awards banquet in May (this is my second time --
I won a GSU Lifetime Achievement Award a few years ago).

April 18, 2001

C'mon Dave -- tap waits for no man

'tis the season for quickies.

My niece apparently loved the Barbie VW bug that Tracy and I sent her for her birthday (in all fairness, Tracy found and picked it out -- I only gave final approval).

Progress is being made on my dissertation code. My committee wanted to see a more hierarchical structure for the "manager worker" (gotta think of a better name), which I have been spending the last three days on. It's actually complicated just to launch the thing --
you have to provide a map file (using inilib, of course). The startup protocols are getting complicated. It's interesting, 'cause I've never thought of using MPI for "startup protocols" before, but that's exactly what I'm doing -- spawning a bunch of sub-tasks, and then exchanging a bunch of startup meta information to setup the structure of the main computation. Kinda cool.

The more hierarchical structure will actually make the IMPI tests easier, I think.

Since I still had more prize money, I decided to get as well (shh... haven't told Tracy yet; I know she doesn't read this, so I'm safe). We've been talking about a DVD player for quite a while, and could never quite rationalize buying one. Now that I had some "free money", it seemed to make sense. Outpost canceled my first order because they decided to stop carrying that model, but they still do have a refurbished version of the same model (for the same price). Go figure. So I ordered that one; it should be here Friday.

Tracy dropped and broke her Palm Pilot last week, so we had to order a new one for her. Got it at http://www.staples.com/, and used a coupon that we found at http://www.amazing-bargains.com/ and got it darn cheap without paying shopping. Sometimes, I just love the internet. It was actually supposed to be here today, so I guess it will show up tomorrow.

I helped Tracy with some Excel wizardy last night that apparently saved many hours of tedious work for her (it was a non-trivial formula that did some lookups and cross-referencing, handled errors, and generally could make a nice short-order lunch if you needed it to). Yay me! :-)

Darrell tells me that I need a "Previous" button on my journal so that it will go to the previous [time increment] in the web version of the jjc. It's on the to-do list, but probably won't be any time soon.

Still don't have a lawn mower, so I had to pay someone to cut the lawn again. Ugh. Not that I'm looking forward to mowing my lawn, but I am looking forward to not paying someone to do it. Err... well, maybe I'm looking forward to not having to pay someone to do it (let's not exclude the possibility of Jeff sometimes getting lazy and choosing to pay someone to mow the lawn).

Apparently, in my editorial about the whole "spy plane" incident, I said "Japanese" instead of "Chinese" at least once. Doh. I'm a stupid American. I do know the difference, and I'm quite sure which two countries were involved in this incident. I'm just a stupid typer, apparently. :-(

Back to the dentist tomorrow, hopefully for the last time in quite a while. Woof.

Thunder over Louisville is this weekend; a big airshow followed by the nation's largest fireworks show -- supposedly even bigger than the D.C. 4th of July show. It's over the Ohio River. My old apache unit usually puts in an appearance at the show as well (although I've heard their part is usually quite lame -- I have to say I'm not surprised :-).

I added a quickie feature to my MP3 web caster -- I can enqueue my entire audio collection at once into xmms. I have a lot of music. I enqueued my entire "alternative" section and found that I have 1806 MP3s. Putting it on "random play" makes for quite a nice mix, and I never have to think about what to play next -- for about a week (or until xmms crashes, which usually comes first).

Gotta run. There are currently only 21 xmms processes running on queeg (out of 93), because it crashed earlier today, when 948 xmms processes out of 1018 (93%) were running.

Mandrake 8.0 was released yesterday. Brian managed to find a really fast mirror and we finally managed to download both CD images to nd.edu. It took quite a while, but I finally downloaded them to squyres.com and make CD's from them.

I'll probably eventually install 8.0 on my laptop. Installing it on my desktop will probably wait for a little while; when I can interrupt my work for at least a full day and not worry about it (perhaps after I hand in my dissertation...). Although I really would like to move up to KDE 2.1.1 (I'm on KDE 1.something now), and I would like to see Gnome's Evolution...

In other Linux distro news, it seems that LAM 6.5.1 made the cutoff for RedHat 7.1. Woo hoo!

Another entry is coming about my dissertation code. Those who aren't tech-heads can ignore it, as it will be quite geek-filled with details.

April 21, 2001

Sir, we can't eliminate the line item for our oxygen supply

Today was the Thunder over Louisville festival.

It's a big air show and enormous fireworks display over the Ohio River between Indiana and Kentucky. Our house is about 15-20 miles from the river; as I was out in the garage building our new gas grill, I could see a bunch of the military planes fly by on their route to the show. I saw some tankers and some fast movers (have no idea which; I'm not a Zoomie). It was actually kinda cool.

I didn't see my old Apache unit on the TV coverage, but that doesn't mean they weren't there. They've apparently been in the show in years past (I still haven't made it to a show yet; perhaps next year). They had the B1-B Lancer close the military part of the airshow; the TV coverage simply didn't do it justice. That is one Big Friggen' Plane. The only time that I have seen one was at the Chicago air show last two years ago, and it was enormous. Louder than a hum-vee stuck in the mud, too (and that's loud).

We watched the fireworks on TV, too. It was pretty amazing and I'm pretty sure that the TV coverage didn't do it justice, either. The close the bridge over the Ohio river and shoot off fireworks from both there and barges in the river. By my watch, the show was about 25-26 minutes of nonstop fireworks. Amazing stuff.

And just think -- someone probably wrote software to design (and execute!) fireworks shows. You gotta take into account ignition time, launch time, height and flight time, explosion time, fade time, timing to the music, etc., etc. Think of the database of ordinance that drives that simulation. I think the field names alone would probably make airport security monitors shrudder. Think you'll ever see that on Freshmeat? ;-)

Tracy and I tried out our new grill tonight and it works great. Needless to say, I was reading the cooking tips as I was cooking, so I did it entirely wrong, but since burgers and hot dogs are pretty hard to screw up, it all went well. Turns out we got the wrong grill cover, though (it's too small) -- I'll have to exchange that next week.

I promised Janna a good meal cooked off the grill next weekend for letting us use their 4runner to bring the grill from my local True Value hardware store (of course) to our home.

A helpful LAM user (Chris at Advanced Data Solutions -- they do oil field simulations and whatnot) have been encountering failures on his system. He sent some test code, but I still haven't been able to duplication his problems. Hmm.

We've just changed the LAM model for handling signals in user code
-- LAM used to install signal handlers for SIGSEGV and the like during MPI_INIT. Now we don't install any (except for SIGUSR2, which we need for IPC kinds of things); the lamd just notices that the child process died due to a signal from the exit status and sends that back to mpirun. mpirun prints out a pretty error message and then kills the entire parallel app (this used to be done from the signal handler in the user code itself, which was usually reliable, but there were some problematic cases where it would go haywire). The code in mpirun has been bullet-proofed to make it much more robust, too. Unfortunately, we forgot to take into account MPI_COMM_SPAWN and friends -- this new plan doesn't take this into account at all.

The problem here is that there is no mpirun waiting to receive status/death messages when a child process dies (RTF_WAIT is not set for MPI_COMM_SPAWN processes). Hence, if a child encounters a signal, no one will kill the rest of the parallel app. Hrmm.

The easiest solution would seem to turn the signal-handling code back on in MPI_INIT (it's still there -- we left it there for backwards compatibility, you can manually activate it with the "-sigs" option to mpirun) for spawned programs. This is somewhat inconsistent with mpirun'ed programs, but will users notice?

A better solution would seem to have an mpirund pseudo-daemon in the lamd that can "emulate" mpirun in the lamd, just for these kinds of purposes. Heck, mpirun itself can utilize it so that the interface is the same everywhere. I certainly won't have time to get to this until I finish my dissertation, though...

I got notice that DFAS has deposited the first of four back payments into my checking account. It's the reimbursements for hotel and rental car for one of the two AT's that I did in Atlanta. While it's really just a reimbursement, it feels like free money. Must resist urge to spend it on house stuff...

:-)

I just noticed an annoying bug in jjc -- it evaluates underscores (i.e., emphasized HTML) before functions. So if you use the HREF function and have a URL with an underscore, jjc will changed that into <em>, and complain that there's no ending tag. Grr... gotta change that order of evaluation...

Jimmy has fear? A thousand times no!

Yesterday, I sent the following e-mail to the xwrits author (xwrits is a program that periodically pops open a "take a break" window on my screen to give my wrists a rest) with the following suggestion for his excellent program:

Thanks for xwrits! I think it has saved me much pain... literally.

I have a suggestion.

I have found it convenient to schedule breaks with xwrits, not just for my wrists, but also for taking care of non-computer-related stuff (take out the trash, feed the cat, make sure that the laundered money showed up in my swiss bank account, etc.).

As such, it would be convenient to know (at least approximately) when the next xwrits break is coming.

So when one of my henchmen says, "Hey boss, we got a big player on table 6; you wanna fix a 'special' deck for the next shuffle?" I can say, "Yeah, sure -- I'll do it in about 7 minutes when I have to take a break, anyway."

Here's some random idea on how it could work -- any one of the following methods of showing the time left until the next break would be great (or something even better / easier to implement would be fine as well!):

a dockable toolbar app (for KDE or Gnome or whatever)

a small window that could sit in the corner of my desktop somewhere -- independent of the main xwrits popup window

a specific keystroke (in any context, which could be kinda hard) could bring up a window for 2-5 seconds before automatically closing

Such functionality would be most useful, if possible.

Thanks!

A helpful LAM user found a really obscure name clash with LAM's tputs() function and the termcap/ncurses tputs(). Kudos to Glenn!

I mentioned a few days ago about a LAM user having a problem that I couldn't duplicate. It turns out that this might be due to linux's implementation of TCP/IP -- it seems that after long periods of inactivity when there is data ready to be read on the receiver side, the sender will declare "timeout" and kill the socket. Doh! It's not clear if this is Linux's fault, or is part of the design of TCP/IP, or if Linux's timeout value is just short. Either way, it would be a real pain for us to put in proper heartbeats on the TCP RPI. Arf. I'm hoping that TCP/IP has a "stayalive" functionality that will do its own low-level heartbeats, but I'm not hopeful. Gotta check Stevens sometime soon about this...

2 level decomposition of my dissertation code seems to work, but I think there's a minor memory leak in the RelayOut class. Should be easy to fix, but it's late and I'm tired.

April 25, 2001

I'm just not white like you Dave

It appears that the fan on the CPU in my router is dying.

All morning I was hearing weird whirrs and clicks and whatnot. I couldn't locate the source of the noise, so I assumed that it was outside. It only dawned on me to look in the closet where my router lives after an hour or two. I finally realized that the noise was coming from my router machine itself.

I popped the cover and after trying to figure out what was making noise (first candidates were the disks), I finally determined that the motherboard was vibrating to varying degrees, resulting in the rattling sound. It was pretty trippy until I realized that there is a moving part on the motherboard -- the CPU fan.

Flicking the fan with my finger resulted in realigning whatever was rattling and silencing the racket. Oh yeah, it also rebooted my router. :-\ <sigh>

Anyway, no matter how much I flicked the fan, it would always start making noise again a few minutes later. Hence, I'm assuming that the fan will die in fairly short order. Where does one get new CPU fans in Louisville, KY?

As I mentioned yesterday, I got my WAP from http://www.outpost.com/, but still have no wireless network card (it's on back order). I only hope that Outpost stays in business long enough for them to ship the card to me -- they had the best price on it, and I avoided shipping costs.

Did I mention that Tracy and I picked out a grandfather clock? It's a wedding present from my parents. We finally have a house to put it in. It's the McConnell clock from Howard Miller. If all goes well, it will be here in a few weeks. Cool.

I've been working on a paper for SC2001. We'll see if it gets finished in time for submission (extended abstracts are due this Friday). Ugh -- stress! The topic is Tucson -- the software framework that I detailed several days ago (of which, several of those details have already changed :-). The name "Tucson" came from the fact that "it's on the road to Phoenix!" The name Phoenix, of course, refers to the ability of the framework to ressurect processes when they die (in a fault tolerant kind of way).

It's funny. Laugh!

Yes, you can groan now.

While modifying a figure for this paper, I just learned that xfig has a "freehand" mode in the line tool. I had no idea! It's rather amusing, actually, if you are in grid "snap to point" mode --
"freehand" is actually quite blocky. :-)

I'm reading a book by Arthur C. Clark and Stephen Baxter that I randomly picked up in a bookstore a few weeks ago called The Light of Other Days. The pretext is that technology is invented in the late 21st century that allows remote viewing of any location as well as any point in the past. The philosophical implications are staggaring --
no privacy at all anymore.

I just ran across a great quote:

"This isn't the 1990's, Mary. Software development is a craft now."

One can only hope that software development gets much better than it currently is! :-)

Speaking of software sucking, it looks like Telocity's Atlanta routers are hosed again. Riddle me this: DNS works fine (which makes sense, because I'm using their DNS routers -- assumedly their internal network is fine; it's just the connection to the net of the net that is suckin'). So I can lookup IP addresses of anyone I want. I just can traceroute to them. BUT -- I can telnet to port 80 of www.excite.com. Even though UDP and ICMP traceroute packets to www.excite.com die at the Atlanta Telocity routers.

Weird, man. Weird.

There are 129 copies of xmms running, out of 206 total processes (62%).

April 27, 2001

Soon the super karate monkey death car will park in my space.

I just learned something about my phone the hard way.

I called a local hardware store earlier this morning. Later, I called Lummy. Since I call Lummy not infrequently, I have him on speed dial, so I hit the speed dial button. After I hung up with Lummy (his voice mail, actually), I remembered something that I had forgotten to leave in my message for Lummy, so I hit "redial". It called the hardware store, not Lummy. Oops. That is, it redialed the last number that I had punched in, not the last number that I called.

I guess that pretty much makes sense, since I could just as well hit the "Lummy" button instead of "redial", and redial theoretically is better suited to remembering something that you don't necessarily have stored somewhere (i.e., the last number that you punched in). But it still surprised me.

These new-fangled telephones. I just don't get'em. But I heard that they have the internet on computers now. What will they think of next?

I saw an article about how major cell phone companies (Ericcson?) are delaying their rollout of "3G" phones (third generation) due to software glitches. I'm not surprised. I have a fairly simply Audiovox phone which is pretty handy, but it definitely has its share of what are assumedly software glitches. I've even had to "reboot" my phone at least once (take the battery out for several seconds).

It's fairly reliable, but I would imagine that the software inside is actually fairly complex, and therefore susceptible to the "software quality sucks" rule that seems to be the norm of today. :-(

Went out and plunked down a few hundred on a mower today. Woof. Also got a trimmer. On the way home, I'm out of books to read, so I stopped at a local Books a Million and got two new Orson Scott Card books (I'm in the middle of the Homecoming series), a Fatboy Slim CD (ripping MP3s now...), and the Fight Club DVD.

Must spend the rest of the day on the SC paper...

There are 345 copies of xmms running, out of 425 total processes (81%).

April 29, 2001

The Secret of Management

Clean Fatboy Slim? Wha...?

I just noticed that I got the "Kiddie's Clean Version" of the Fatboy Slim CD that I just bought. What an outrage! Darn you, Tipper! I want all my golly-darn profanity and holy smokes swearing! And dang it all if the razzem frazzem songs don't not obscenely suggest that I should have my cake and eat it too, gumdangit!

...aw, bleep it.

In all honesty, I'm curious to know what the difference is. If anyone has the normal one, or has heard both versions, drop me a line and let me know what was changed. Thanks.

All in all, Halfway Between the Gutter and the Stars was somewhat disappointing. Not spazoidal enough. So on the way to the hardware store, I bought some more CDs. I got the new Poe (Haunted), another Fatboy Slim (On the Floor at the Boutique), and a compilation called Louder Than Ever -- Volume 1 (which may turn out to be a rap CD, which I didn't realize -- I was looking for random techno. My criteria for selecting the CD was that it was a compilation, I didn't recognize any of the bands, and it had words like "Da", "Club", and "remix" in the titles. The fact that one of the songs on this CD was humorously named "What U See Is What U Get" was just a bonus [what is a WUSIWUG?]. Regardless, it seems that my CD-selecting filter may need a bit of tweaking...).

The Boutique CD is pretty cool. Good and spazzy. I give it 10 minutes.

I haven't had a chance to listen to the Poe CD yet, but it's got that song Hey Pretty that they play on the radio. I have another Poe CD, and it's good slow stuff; I expect this to be similar.

Tonight was the "Secret of Management" episode of News Radio. It featured both Ft. Awesome and a ball pit. Coincidence? You decide.

Just got my orders for my Army duty this summer.

Arrgh... they're specifically not giving me a rental car! (I've had one the last two times I went down there) This could be a real drag, depending on where my hotel is. Last time, my hotel was fairly close, but the first time I went, my hotel was a good 15-20 minute drive away including a fair amount of highway driving.

This'll be my last tour down there in Atlanta; the place is closing down. Slowly. (like all government entities -- when the decision is made to close an office, it takes months or years to actually do it). I'll have to find another unit after this. I've looked around a little, but nothing has come up yet. I'll have to resume my search, but as with most other things, probably not until post-dissertation...

We submitted our SC2001 paper with 26 seconds to spare. 26 seconds later and the server would have disallowed our submission.

And who said that Computer Science wasn't exciting?

It's now Sunday. Yesterday, I paid my dues and re-joined the Lawn Mowing Association of America (LMAA). It seems that over the past 11-12 years, my membership had lapsed.

I spent much money at the hardware store for various home things (hose, sprinkler, more blinds, a spreader, etc.). Woof. Tracy spent a good deal on internal furnishings, too. Now I understand why the US economy is so dependent upon the

Janna came over for dinner last night; we made steaks on the grill. It was much fun. Jim thought the wireless stuff was way cool (still waiting for my Orinoco card... it should be here sometime this week).

I put up some more windows blinds today, and we'll likely order the rest of our window stuff Real Soon Now. I might well make a "boring" category for journal entries that have to do with Home Stuff. That way, those who don't give a damn can just delete it without reading it. :-)

DSL was out most of yesterday, and about half of today. Woof. Telocity just can't seem to configure those routers in Atlanta properly. At least, that's what I'm assuming -- traceroute's stop in Atlanta for some reason. I can usually get to some sites (e.g., http://www.excite.com/), but not others (e.g., http://www.yahoo.com/ and http://www.nd.edu/). Arrgh!

There are 32 copies of xmms running, out of 109 total processes (29%).

May 2, 2001

Dave, do you think Earnest Hemmingway ever gave a reading that bad?

I was working on my laptop -- I had just installed Mandrake 8.0 and was playing with my new wireless card. It mostly works, but not entirely.

emacs seems to have arbitrarily wide wrap lengths; 78 or 79 chars in text mode. Gotta figure out how to change that.

Aurora is a cute GUI boot screen. However, it only seems to want to run in "NewStyle" mode, not "Traditional" mode (I'd prefer the Traditional mode, because it still shows each item as it starts, whereas the "NewStyle" only shows a small number of amorphous icons at the bottom of the screen and alternates highlighting them -- I have no idea what it means).

I haven't tried to play MP3s yet.

The fonts are somewhat icky. Took me a while with playing around with my konsoles to find a reasonable font. The konsole scrolling is pretty slow, too -- it wasn't slow before.

I set the non-framebuffer kernel to be the default; it seems that it's slightly faster video (e.g., scrolling in konsoles).

Konquerer is cool, but it crashes a lot. And the crashes are repeatable, too. So I'm still stuck with Netscape...

My wireless card was detected and installed properly without me having to do anything. Cool.

pine wasn't installed by default. Dunno why; it's possible that I deselected it somewhere during the install (but I don't recall doing that...). It was on the CD, so I just RPM installed it. It had built in support for SSL and IMAP, so it rocked right off the bat.

dig also wasn't installed (same disclaimer). I found bind-utils on the 'drake CD and installed that, too.

I don't know anything about postfix, so I uninstalled it and put in sendmail instead.

It seems that there's a new wireless NIC driver called orinoco_cs (as opposed to wvlan_cs, which 'drake installed for me by default). However, 'drake didn't compile orinoco_cs as a module, so I set about compiling my own kernel. Needless to say, on my little laptop, it took several hours. I used the config file that 'drake supplied, and just tweaked a few values. It finally all built, but when I rebooted with it, I got a mysterious "cannot boot from that root device" error. Dunno what that's all about. I restored my original modules and all was well. I'd really like to use the orinoco_cs driver, though, 'cause it works with WEP and whatnot. Might have to play a little more there...

In the meantime, my DSL has been really crappy this week. It was out for about 3/4 of this past weekend, and multiple times through Monday and Tuesday. It was out for about a half-hour this morning, and is now out again.

This really sucks. Especially when I'm trying to do stuff in nd.edu -- I just get cutoff. Arrghh!!

I started a primitive script to log how much I'm offline, and when. I just does a ping to the Telocity DNS servers, a ping to www.lsc.nd.edu, and a ping to www.excite.com every minute. Why Excite? Well, it seems that I'm never offline to them. Just about everything else goes (www.yahoo.com, for example), but Excite seems to stay reachable. The problem almost always seems to be the Telocity routers in Atlanta (traceroute's stop there).

I wish that whoever was maintaining those routers would get their act together!

I learned something about xmms's thread leak -- it only happens when streaming files to it via http. It doesn't happen when playing files from a real CD, or MP3s from a local filesystem.

Weird, eh? Must be a bug strictly in the http streaming code.

There are 240 copies of xmms running, out of 326 total processes (73%).

May 4, 2001

You wake up at C-TAC, SFO, LAX

I didn't tell Tracy that it was coming, of course (they called yesterday to setup a delivery time). So she was extra pleased to see it when she came home. It's the little things in life. :-)

Today is the Oakes race in Louisville. It's basically the local's version of the Kentucky Derby -- it's a Big Event. Tomorrow, Louisville will be absolutely flooded with billions of people out outside of Louisville, so there's a lot of local pride wrapped up in the Oakes race.

We got invited to go through GE, but I had to turn it down so that I could work work work... Maybe next year.

I did some rough stats on my monitoring so far and found out that DSL has been up 44% of the time since I started monitoring (Wed, May 2, 10:11am). Granted, that's really only 2 days, but as Holly said, "That's a poor IQ for a glass of water!"

Ok, it was much funnier when Holly said it. And it made sense, too.

The point is that in the past two days, DSL has been down more than it has been up. And it has cost me a lot of work. Bonk. :-(

And since it's Bell South's problem, it's not like switching to a different DSL carrier will fix the problem -- they all use Bell South since Bell South is the local tellco. Double plus unbonk. :-(

I accidentally killed xmms earlier, so the stats there are pretty low right now.

This is unusual for me to send so many journal updates in one day
-- normally, I leave the journal window open for quite a while and let it accumulate, but given how flaky DSL is, I'm going to submit now so that I know it gets recorded properly...

There are 17 copies of xmms running, out of 104 total processes (16%).

Let's start the Fire Marshall debate!

I forgot to mention one thing about reinstalling my laptop...

When I reinstalled, I selected to use XFree86 3.something instead of 4.0.3. This resulted in much better X performance -- the Konsole scrolling issue that I was complaining about in a prior journal entry was nonexistant. Indeed, it was back up to performance levels that I was used to.

However, I'm pretty sure that I was previously running XFree86 4.something before I installed 'drake 8.0 (i.e., under 'drake 7.2), and I didn't have these issues.

Oh well. What the heck do I know about the difference, anyway? Nothing. It works great, and is back being nice and fast, so that's all that I care about.

:-)

There are 489 copies of xmms running, out of 574 total processes (85%).

May 11, 2001

I really have no idea, Dave. I've been stone-cold drunk since about 8 this morning.

Oops. The last rant should have been under the "technical" category.

Mary had a great response:

If a manager doesn't spawn, it would be shot. At the very least, its demons should be exorcized. Get thee to a rectory.

A few weeks ago, I found the Andromeda software package for streaming MP3s from a web server. It was much slicker than the thingy that I was using, so I installed it (trivial install -- just a single .php file). It works nicely. It only lacks one feature -- the ability to enqueue arbitrary directory trees (something I only recently added to my thingy, but quite handy).

I pinged the author with my thoughts, not expecting much (per most freeware development, IME). He actually responded, and we had a good chat (via e-mail, of course).

He contacted me a few days ago with a beta for the next release of Andromeda. The big new feature is playlists. We found a few issues, and he fixed them. We also discovered that there's an inherent limitation of cookies that at least I wasn't aware of. Cookies have a maximum length on Apache servers -- about 8k. That is, the sum total length of all cookies given to a given server must be <= ~8k (remember that all cookies are given on one HTTP request line). Apparently, IIS allows a bit longer than this.

This is a big bummer, since Andromeda was storing playlists in cookies. Either way, there's a finite limit for the playlist. Bonk!

We pondered over this for quite some time, actually. There's just no better way to do this than without some form of server-side storage (files, a database, sessions, whatever). And to do that properly without allows a DoS, you have to have both a login and some kind of finite bound on the playlists anyway.

Urgh. :-( (one of the wonderful points about Andromeda is that it's a single .php file with no extra storage required). Adding this complexity is not attractive.

Indeed, I think there is a real missing chunk of software that allows client/server stuff without a database -- flat files only. Such packages would be extremely useful when you are running your software on some ISP's web servers, and database usage costs extra. Flat files would be a bit more bookkeeping, and probably less efficient, but if you need a non-high-performance web package, what would it matter?

Indeed, I have found that I am using the word "indeed" a lot lately.

I blame Arun.

Epiphany continues to have problems with Outlook Express. Bobbe in particular is having a horrid time. OE is doing random things. Sometimes it freezes on the splash screen. Sigh.

I think her machine has just degraded to the point of being non-function. It's a Windoze 95 box, several years old. I think that 'doze itself has just degraded enough to the point of non-determitiveness (is that a word? Probably not). It probably wasn't helped by the fact that I got all the latest "updates" from Microsoft. Ugh.

I really don't want to reinstall the whole machine. Particularly since that machine has all the databases and whatnot that have all the parish records, etc. Ugh.

So my solution is to loan them one of my old machines so that Bobbe has something to use for e-mail. At the same time, their fiscal year starts in July, and they'll be replacing that machine. So this stopgap is good enough for now.

At the same time, they got some donation money to get a new machine to replace one of their other machines -- a P100 with 8MB of RAM, IIRC. You have no idea how painful it is to use the machine (it's on the desktop of one of the church staff members). They gave
$1500 to get a new computer.

They're Gateway folk, so I perused the GW web pages, and noticed that they were running P4 specials. Since we had to use all the money, we ended up getting a 1.3GHz P4 with 128MB of RAM. Way more than necessary. But then again, perhaps it just means that this machine will last 4 years instead of 2.

I've been reading "Exceptional C++ : 47 engineering puzzles...".

I think Kevin, Jeremy F., Arun, and Brian and I will use this book as the basis for an e-mail version of C++ Friday Lunch. Perhaps doing one puzzle a week or so. I created a GNU mailman list for this on my DSL router, but had to reconfigure DNS to make this happen. It'll take 2 days to propagate around the rest of the world before we can really start.

I'm heading to to ND next week. It'll be Arun's last LAM meeting, and graduation is that weekend. My specific purpose is to attend the graduate awards dinner to receive the SGI HPCC award (and prize check
-- whoo hoo!!).

Lots of discussion on the OSCAR lists this week. Summary of decisions:

Move OSCAR development to sourceforge

Have 4 lists: oscar-announce, oscar, oscar-dev, oscar-core. The first three are typical open source lists, the last is "members only" for administrative kinds of things.

Interesting discussion occurring about how to have multiple MPI implementations on the cluster. I had a really long proposal which I thought was elegant, but then someone pointed out that it was functionally the same as modules. Duh. But modules are good things, so if we put modules in OSCAR, by associativity, that will be a good thing.

You know that you have a large uptime when the average history in your command windows is around 4500 commands.

Excellent! The Lone Gunmen tonight used a song off one of the Fatboy Slim CDs that I just bought -- Weapon of Choice. That song is cool. Seeing it on the Long Gunmen was double extra chocco latte cool.

There are 456 copies of xmms running, out of 530 total processes (86%).

That's a good ploy, Dave, to pretend that the ship is sinking.

Linux really sucks sometimes.

I'm working heavily on Tucson, and since yesterday morning I've been fighting a bug where the manager wouldn't spawn children properly. LAM/MPI would return an error and say that the rpcreatev() (one of the underlying functions under MPI_COMM_SPAWN that is used in LAM to actually spawn a remote process) had failed.

I couldn't figure out why this routine was failing -- it's used successfully in many different places. It's used in mpirun itself, and isn't failing there, for example. So why is it failing in MPI_COMM_SPAWN?
I tried to use gdb and ddd to track the problem down, but gdb kept seg faulting. Sigh. Linux debuggers are generally useless. I was reduced to printf debugging in a multi-threaded, parallel program. Do you have any idea how painful that is? Sigh.

It took me quite a while, but I finally figured out what the problem was.

Each LAM client has a global structure named _kio that contains, among other things, the PID of the process that is using LAM. That is, each MPI program has to call MPI_INIT, which, in turn, calls the internal LAM function call kinit, which opens a socket to the local LAM daemon and does some other bookkeeping things. One of the things that it does is cache the PID of the kinit-calling (i.e., MPI_INIT-calling) process on this global _kio struct. That way, if you fork, if you invoke a LAM function call, it will know that this process is not registered with the LAM daemon and can therefore throw an error.

Note that only some MPI functions will end up doing this compare-the-PID thing. One class of examples are MPI functions that need to send out-of-band (OOB) information, such as MPI_COMM_SPAWN.
This scheme actually works fine, and has prevented me from doing stupid things in the past.

However, it has caused me much grief over the last 24 hours because Linux implements threads are processes. Hence, each thread has a different PID. End result: MPI_COMM_SPAWN will end up comparing the thread's PID with the one cached on _kio. If they don't match, boom.
This is a problem if any thread other than the one that invoked MPI_INIT invokes these MPI functions. i.e., even if we guarantee that only one thread is "in MPI" at any given time, the current scheme in LAM will fail because each thread has a different PID.

ARRRGGHHH!!!

I don't quite know how to solve that in LAM yet (there's probably some way to get a unified PID for all the threads in a single process... need to look that up...), but I do know how to solve it in Tucson: force all MPI calls to be in a single thread. What a pain.

ARRRGGHHH!!!

There are 377 copies of xmms running, out of 460 total processes (81%).

May 12, 2001

Dave, we're *not* sinking!

I started the C++ Friday Lunch list today.

I subscribed everyone. We'll start next week after everyone has a chance to get the book.

Been working on Tucson heavily.

It took a lot longer to do the MPI queue than I thought. Particularly with respect to arrays of requests. Every time I thought I had it right, I realized that the abstractions were just slightly off, and that would cascade into a whole chain of side-effects and whatnot.

Urrrghhh...

Took quite a while to get it right. I think I've got it right now
-- it all compiles -- but I'm too tired to try it (it can't possibly work -- it's hundreds of lines of code that's all brand new). I'll debug tomorrow.

I really want to have it working -- or at least major parts of it working that I can have some kind of reportable results on Tuesday for me meeting w/ Lummy.

I'm seeing some really weird cron behavior on queeg. Until now, I thought the problem was with my script somehow and so I ignored it. The problem is that I sometimes get double entries in my checking-DSL-connectivity log. That is, it's fired up by cron every minute to check my DSL connectivity. Sometimes I get an entry in the log at xx:xx:59 and xx:xx:00.

I thought my script was just mucking up somehow (it is actually somewhat complicated), so I never bothered to check, because both entries in the log were correct. But today I noticed that cron itself is actually launching the script twice.

DSL dropped out twice today, each for <= 30 minutes. But still annoying, nonetheless. Same old problem -- packets stopping in Atlanta. Gumdangit, BellSouth!

Can't get to anything, though -- not even Excite.

<shrug>

Stupid Linux thread model. I know that I saw a web page once that went through it and said why it was a good thing that threads are different processes (other than "it was an easy hack"). I did some web searches and can't find it.

<shrug>

This is going to problem for LAM itself, when we make it multithreaded because what I described in a previous journal entry. I did find the function pthread_atfork, though, and I think it can be used to fix this problem. There will have to be a cached value of getpid(), and at fork time, we'll have to zero out the cached value.

This can work. I haven't fully thought this out yet, but I'm quite sure that this scheme can work. It may require an additional configure test, too, which may be a bummer, but possibly not.

xmms crashed earlier today. I notice that I have xmms 1.2.3, and 1.2.5pre1 was announced on freshmeat today.

There are 98 copies of xmms running, out of 173 total processes (56%).

May 23, 2001

Nice "Big House" humor, sir

GNU Mailman is smart. I created the cfl list and added a bunch of people to it, then sent a few posts across it. Pete then asked to be on cfl, so I added him, and bounced all three posts to him.

Oops -- I bounced one of them to cfl, not to Pete! Doh! But GNU mailman must have recognized that it was a duplicate (or a resend), because it didn't resend it across the list.

Pennsylvania has free birth certificate copies for military members.

Cool!

I ordered 3.

It's like when I was driving back to SBN from Ft. Knox one night after duty (I had to be in SBN for a meeting first thing in the morning or something), and was still in my BDUs when I stopped for dinner at a McDonald's. I ordered a Big Mac combo meal and pulled out my wallet to pay. The manager walked up and said, "Meals are free for military members". "Cool!", I said, "Can I super-size that?"

My passport expires soon. I went to http://www.firstgov.gov/, found the passport page, and downloaded the renewal forms. They're expired. They expired April 30, 2001. <sigh>

I went to the post office to get renewal forms instead. They had the same out of date forms. <sigh>

I found some more old bugs in jjc.

I found a nonterminated C string. I can't imagine how that didn't cause jjc to crash all the time.

I also freed some static memory.

I also found an endless loop when <> was in a rant (i.e,. an empty HTML tag).

They're all fixed now. Anyone who uses jjc, lemme know if you want a new copy.

Actually, it still appears that there's a little problem with jjc identifying which line unterminated HTML or special characters are on. I'll fix that one someday. Not today.

Epiphany got their new computer -- a w2k box. I put it together and got it running. It came with Office XP when we explicitly asked for Office 2000. We ordered Office 2k; should be here in the next few days.

The high tension power lines over the exercise track in my neighborhood have a distinctive hum.

I went to ND last week to receive my SGI award. I saw Dr. Eileen there, which was pretty cool. She's actually at IU/Bloomies now, and came back for graduation weekend.

While driving back from ND, I chatted with my C-* Terry for a good 45 minutes about her wedding, furniture, etc. It was a good chat.

I stopped in Indy to see Kelly and Matt on the way home. We had lunch and were generally silly for several hours. I met the crazy brown dog, who actually is crazy, brown, and a dog. They're moving to Chi-town. Good for them, bummer for me!

I'm in Bloomington (Bloomies); I met a bunch of people in the CS department today, saw Lindley Hall, the student union, the woods, got lost on the campus, parked illegally, had lunch with Todd, got a guest key, planned equipment purchases and layout, and did other nefarious deeds.

May 25, 2001

Stinkbutt

I think that the rat-bastard ice cream man is trying to kill my spirit.

It seems that entirely different muscles are required for running vs. roller blading. This is quite unfortunate. Why can't they be the same? IMHO, roller blading is much more fun than running. Running is so boring.
Unfortunately, the army isn't quite modern enough to offer competitive roller blading as part of their standard physical tests (just imagine: the soldiers of tomorrow blading around on the battlefield on special track-mounted foot adaptors... what an edge!), so I still have to run at my test next month.

So I decided that I had better start actually running rather than blading for exercise. Needless to say, I was smoked within minutes. But being a stubborn idiot, I pressed on for quite a while (mainly because my army duty is only a few weeks away). I ran around my neighborhood a bit, and did the exercise track (situps and pushups) down by one of the two lakes here.

And there's a big-ass hill between the exercise track and our home
-- and it's downhill the wrong way. Yes, I have to run uphill to get home. Woe is me! I guess it makes me a better person.

Anyway, the ice cream truck, playing its loud jingle-jangle pied piper music came down the street just as I was dying up the hill towards home.

Did I say "dying"? I meant "running".

Several thoughts enter my head, almost simultaneously:

I shouldn't have any ice cream; I'm trying to lose some weight here!

I don't have any money with me.

I wonder if he'd give me a ride home.

Woe was me. He even stopped for a bunch of kids that I went by so that they could run screaming into their houses, "MOM!!! The ice cream man is here!!!" (reminded me of the old Eddie Murphy ice-cream man schtick. "It's like sprinkles.").

But I survived. Without ice cream. I have declared the ice cream man to be my nemesis. It's a battle of wills between us. I will prevail.

May 28, 2001

1-800-J-JAMES

More quickies. Some are techincal. Cope.

I discovered today why grip sucks. I previously have had problems with grip refusing to rip a track or two. For example, it wouldn't rip a track at the end of Fatboy Slim's On the Floor of the Boutique. I had always assumed that the CD was defective. Today, I was ripping a CD that Tracy had just bought, and ran into exactly the same problem. grip reported the time for the track as 5:37, while the CD jacket reported it being something like 2:50. Hmm. I tried three different CD drives and they all did the same thing. I put the CD into a real CD player and the track played fine. Hum!

So I ripped it manually with cdparanoia, and it ripped fine (which is weird, because gripusescdparanoia to rip). Then I encoded it with bladeenc, and the resulting MP3 is fine. I did the same with the Fatboy Slim track, too. I found the problem -- each of these two CDs have an "enhanced track" at the end, which screws up the next-to-last track somehow. grip not only specifies the trackcdparania to rip, it also specifies the sectors. So somehow grip is getting the wrong sectors, which causes it to fail. If I give just the track number to cdparanoia, it works just fine. Weird.

Internet connectivity has absolutely sucked for the past 72 hours. To ND especially. I am guessing that the networking upgrade that ND did on Saturday morning may have mucked things up... but IIRC, they were just replacing some UPSs, not changing any configurations. Hmm. But then again, there could just be lots more traffic on my DSL segment due to the holiday weekend. I dunno. I've been seeing 50-60%
packet loss to nd.edu.

I found, by accident, today that the latest versions of xmms fix the thread leaks. Turns out that it was apparently leaking sockets, too. I was doing some Tuscon testing and noticed that a ps took 10-15 seconds to complete. This is because there were so many dangling threads (arrgghh... stupid linux thread/process model!). So I went and check http://www.xmms.org/, and sure enough, there was a new version. I got the latest (1.2.4), and it fixed the problem. I noticed that they released 1.2.5pre1 recently, so I grabbed that, compiled it (with ogg support, of course), and it seems to be working fine. Check out the xmms stats at the end. Amazing!

ogg seems to be coming along. I've been rather inactive in it while trying to finish the dissertation. I updated my CVS copy of it today when I compiled xmms; there's a DOS file that doesn't compile 'cause of preprocessing badness (trying to have a multi-line macro with a carriage return after the '\' causes the preprocessor to be unhappy). Monty just checked in what sounded like much audio goodness (I don't follow much of that stuff, but it sounded good... hahaha... very punny...).

I'm getting account on American Museum of Natural History's 260 node cluster; they have problems over 229 nodes with LAM. Will be a good place to test lamtree, too. But I must graduate...

For days, I've been looking for a memory leak in Tuscon. My sample passthrough test app was allocating memory at an alarming rate
-- but only in the root master process. The children all looks like they were nicely memory bounded. I finally found the problem today --
it wasn't a memory leak at all. Turns out that my input thread stupidly allocated space for the entire input file at the beginning of time rather than ask for a bunch of buffers, fill and enqueue them, ask for more buffers, etc., etc. i.e., the whole concept of buffer pooling. Nope -- I just asked for buffers for all the data up front. Duh. :-\

May 30, 2001

I'm sorry, that's just the way we do things around here -- the new guy has to sit next to Matthew.

"I love the smell of dropped packets in the morning."

My DSL connectivity still sucks. 50-60% packet loss to just about everywhere. Arrgh.

I saw a few episodes of the FX X-Files Memorial Day Marathon. They were showing some of the really classic X-Files episodes, like the monster/Cher Halloween episode, the X-Cops episode, the Queen Mary/Nazi episode, etc.

Classic.

Tuscon seems to be working! Added some simplification functions that I rolled into a "simple" example.

By the end of the day, I had broken Tucson again, all the in name of making the user interface better...

I finally got sick of this horrendous connectivity and called Telocity to report the problem. It seems that Telocity was bought out last week by Hughes Electronics, and is now DirectTVDSL. Hmm. I don't know whether to be worried or not...

The technician lady that I got was totally clueless. She asked me if I was frequently deleting my "catch files". It was only 10-15 minutes later that I realized that she meant "cache files" (which has absolutely nothing to do with the problem, which is why I didn't realize what she meant at the time).

The connectivity problems that I was having were mostly to the rest of the internet. Even though my route to other Telocity machines goes through Bell South, that appeared to be working [mostly] ok. For example, my DNS connectivity was [mostly] ok.

She ran some ping tests between me and her and said, "I'm not seeing this 50-60% loss than you're talking about..." I tried to explain that my Telocity connectivity was fine, but connectivity to the rest of the internet sucked. She said, "but I don't have any connectivity problems to Yahoo (for example)". <sigh>

She finally came up with a 2% packet loss between me and her, and decided to report that. She was convinced that this was the Big Problem. I made her put down that I was seeing 50-60% loss out to Yahoo, even though I think she didn't believe me. Arrghh...

[several hours later] The recorded message on Telocity's tech help line says that "customers in the southeast may experience intermittent service... there is no estimated time to repair, but it may take up to a week to resolve these issues..." ARRGGGHHH!!!! Hopefully it will be less than that. :-(

I got account on American Natural History Museum cluster today. They punched a hole in their firewall for ssh for my IP (fixed IP/DSL does have some advantages...). It appears to be related to the Zoology department... hum! I wonder what they use such a big cluster for.

But my connectivity sucks; I couldn't really do anything.

The Army finally authorized me a rental car for my AT today. Woo hoo!

DSL connectivity finally came back around 4-5pm. Everything appears to be normal now.

June 3, 2001

I whipped Joe's ass

My kingdom for quickies!

Really able to simplify the Tuscon user interface. Now only need a "kernel" for input, worker, and output -- all buffer management (which is not trivial) is handled by the boilerplate input, worker, and output engines.

And it all works!!

Happy dance....

ps shows that I typically have between 60-100 processes running on queeg (i.e., just under the username jsquyres). Wow -- I guess I'm a busy guy!

Tuscon works in multi-level with no changes. Cool...

Army cushy job no more. I've been told that I must wear a uniform (class B) during my two weeks! Schnikies!! I suppose I shouldn't complain, though. They mentioned "distance learning" as what I might be doing. I guess I don't quite know what that means. We'll see...

Brian put in stuff to get LAM to compile psuedodaemons separately. Woo hoo!

Still having problems with myrinet RPI on Chiba City. Discovered that the head node doesn't work, but then again, there are some cases where the myrinet RPI doesn't work on the interactive nodes either.

Turns out that the tiny and short sizes were mixed up upon initial assignment, so short messages were effectively getting tagged as envelopes, which then caused short messages to get interpreted as envelopes, which just led to Badness. Arrgh!

Side effect that we didn't think of before, though -- since the user can override the tiny and short message sizes, what happens if log2(tiny) == log2(short)? Hmm. Need to investigate the gm tagging mechanism a bit more...

Dog was here. He saw the house, went to a convention, and then we all went to party at Janna's. It was much fun -- it was to celebrate Jim's 30th and getting his MBA.

Dog and I talked a bit about LAM and his MS this morning before he left.

June 10, 2001

He's got a billion dollars! He could hire Steve Forbes as his cleaning lady.

I was sitting in the Louisville airport today, waiting for my flight to Atlanta (I'm so happy with direct flights. I'd like the think the person who invented them. Props!). I saw three women in BDU's (Battle Dress Uniform, for those of you who are uninformed -- or Army fatigues). One of them was addressing a group of obviously college-aged kids. She met them at the end of the terminal and was directing them were to go to catch some bus.

It wasn't until I saw her yellow and black armband that said "CAMP CHALLENGE" that I realized that the kids were all ROTC cadets reporting to Ft. Knox for fun in the sun. The women in BDUs were all spec-4s. I remember being confused by that rank -- they're not privates, they're not corporals, and they're not sergeants. "What is that funny little thing on their collars?", I used to wonder.

It took me a while to catch on that it was a real rank, not just an unusually uniform (yes, excuse the pun) patch of black on so many different people's BDUs. I was always the slow one in my family.

Great line: "She makes coffee nervous."

I'm here in Atlanta for my Army two weeks. Don't know what I'll be doing yet; I got some vague mention of "setting up distance learning", but that's all they told me before I got there (did I mention that before in the journal? If so, cope).

Suzanne sent me the recipe for her mother's Lebanese casserole. It has to be the most yummy food on the planet, I think. She makes it not-infrequently when I go up to SBN. Someday, I'll actually try to make it myself.

I found an outdated PHP header file that caused the search functionality off the main LAM mail archives page to break. A helpful LAM user pointed it out to me today. Oops!

I got my hair cut in preparation for my Army time. I wish the barber lady had left a little more on the top. Ah well. It will grow back.

My hotel room has ethernet. Thank goodness. I probably won't use it every day, since I start work tomorrow and will have internet access from the lab, but I have to say that it was quite convenient today.

I fixed a bunch of dumb errors in the lamtests suite today and caused a bunch of tests to hang when you ran them on more than 2 nodes. Thanks to a helpful LAM user for pointing this out.

That same users also pointed out that the spawn tests failed in the non-uniform filesystem case. Oops!

I recently accepted to buy 3 CDs from BMG in order to get 9 free. One of the three I even already owned (it was a special offer -- 3 specific CDs). The thing that I forgot was that BMG's selection totally sucks. I had to browse their entire "Modern Rock" section until I could find 9 CDs that I sorta kida would probably get if they were free. i.e., they definitely weren't on my "yeah, I'd like to get that" list.

I'll definitely quit BMG after I get these nine.

Another great line: "She's gone out to meet a bunch of bikers. Big ones. Full of sperm."

Time for sleep. Gotta report to duty tomorrow; I'm defending your country through distance learning!

June 13, 2001

Can that thing measure a New York Minute? 'cause Jimmy could walk through that door any minute. And this is New York City.

Defending your country...

I got my new Army black beret today. It was made in Canada, thankyouverymuch. The official start day for these berets is Friday. What a pain in the butt. You have to shave them, shape them, etc., etc. They're actually quite difficult to wear properly. That's why the French are so uptight, for example.

I think everyone in the Army is gonna look dumb for a while while we learn how to wear berets.

I was driving around Atlanta today taking care of administravia (getting an ID card, parking permit, etc.), and I heard "Weapon of Choice" by Fatboy Slim on the radio (it's probably a remix of something else -- it is Fatboy Slim, after all -- but I have no idea what the real song is). Even though I've heard this song many times, I never paid too much attention to the lyrics.

I was amazed to hear a Dune reference:

Walk without rhythm, and it won't attract the worm.

Trippy.

Played with qmail -- a replacement for sendmail. Seems quite powerful and flexible. I have to say that it's non-trivial to install; it requires additional users, groups, directories, etc. I guess I've grown accustomed to sendmail over the years, and am now quite comfortable with it.

qmail's strengths seem to be that it scales much better than sendmail (it claims to, anyway), is 100% secure (there's an open $5000 challenge for anyone to find a security problem with it), works nicely over NFS (when you use it in "maildir" mode), and lets users make their own arbitrary mailing addresses (e.g., jsquyres-foo@hostname).

I guess it makes me somewhat nervous, 'cause it wants to work in maildir mode, where -- if I understand this right -- it has a separate directory for each user's mail and puts each mail message in a separate file. This actually makes a lot of sense, and there are a lot of good reasons for it. But I'm not sure that pine can read the messages that way (although it might -- gotta dive into that a little more, I suppose). I dunno if imapd can, either (same disclaimer).

I guess I'm just uncomfortable 'cause it's a large new system --
no mail server can be small and simple to use; they're just too complicated. Maybe I'm getting old. ;-)

----

The next day...

----

I just noticed why I never heard the "Dune" reference in "Weapon of Choice" before. I just heard the song again on the radio. The voice is quite clear and easy to understand. I always listen to the song in MP3 format on my computer. And my computer actually has fairly good speakers.

But the voice on the MP3 is distinctly degraded and difficult to understand. Proof positive (to me, anyway), that MP3 really does degrade the sound quality. I'm tempted to encode this song in ogg/vorbis and see if it's any better. Hmm...

The last time I was here, I discovered that my hotel charges
$18/night for parking. Wow! Yes, I can get reimbursed for that, but that ultimately comes out of all of our tax dollars, so I went looking for a different garage within walking distance of my hotel. I found one -- it's the parking garage for America's Mart. It's actually right across the street, but who's counting?

They have a max of $7/24 hours. And it's literally right down the street from my hotel. It's less than 1/3 of the cost! I'm not only defending our great country, I'm saving taxpayer money as well. So I parked there all last time I was here.

As an added bonus, if I left early enough in the morning, I would get there before the attendant was on duty, and therefore I could drive out without paying (the gate is up when there's no attendant on duty). I didn't note enough when the attendant came on duty so as to figure out how to always avoid paying.

I was wondering if the same conditions would apply -- the last time I was here was about 1.5 years ago. And lo, the conditions are the same -- I parked in the garage last night and left this morning for $7. The attendant was there at 7:45. I'll have to see if I can leave before then tomorrow...

I've noticed a curious phenomenon with my hotel room.

When I come back at night, the volume is turned up 100% on the TV. I think the maids must be severely hearing-impaired.

I had dinner at Planet Hollywood last night.

I had dinner at the Hard Rock Cafe tonight (they're right across the street from each other).

Per diem rocks.

(I'm such a simple man...)

I now realize why I tend to stay late when I'm working down here (aside from not knowing anyone else in Atlanta, only having working clothes, and having free high-bandwidth internet connectivity at work)
-- the traffic in downtown Atlanta between 5-7pm sucks. Stop-n-go traffic made a normally 5 minute drive about 20-25 minutes.

Ugh!

On the way out tonight, I dropped my laptop. Aaahhh!!!

The CD drive cover broke off.

Where I previously had one vertical line on the screen, I now have about a dozen.

:-(

I think it's now definitely time for a new laptop.

:-(

I've started watching Witchblade -- the new series on TNT. Very bad -- I don't have time for it.

It has very cool photography. Lots of matrix-like effects. They had a very cool scene in the first five minutes of the show tonight with two dirt bikes jumping right at each other, mixed with several bullet-time spins at stopped time. Trez kewl. They play a lot of good music, too.

Indeed, TNT is headquartered here in Atlanta -- I don't know if it's the headquarters, but there's a TNT complex right next to the edge of the GA Tech campus where my building is. There was a multi-story Witchblade poster on the side of the building yesterday.

Windows ME (but IU has a site license for Win2K, so I might format and install that)

No MS Office; IU has a site license for that as well

Port replicator

External speakers

External mouse

External keyboard

Lummy said that I can have one of the extra monitors that came with the machines that we got for the machine room, hence, it wasn't in the actual order. It's shipping directly to Bloomies; hopefully it'll be there next week sometime.

Have you ever had a surreal experience involving C++, three peacocks, a kumquat, and 1977 $10 bill?

Neither have I.

Just curious...

Looks like Lummy and I are traveling to Sandia to see Brian's gig and talk to the folks down there. We'll give some kind of talks about LAM and what we want to do with it w.r.t. fault tolerance, etc. Don't know the exact composition of the talks, nor the exact travel days, but it's likely to be the early part of the week of 25th.

I've had a bunch of really interesting phone conversations and e-mails w/ Brian about this FT stuff w.r.t. LAM. It's very cool stuff. He's doing Great Things with the lamd.

More LAMisms:

Dog has started doing LAM stuff. (Did I mention this in jeffjournal before? Short term... what?) He'll be adding TM stuff, which will only benefit PBS right now, but hey, perhaps others will implement it as well. He'll also be adding compile-time and run-time parameter checking disabling, and measuring to see if that actually makes a difference or not.

I finally made a breakthrough in the Myrinet RPI last night, and I think that I may have found the last problem (that I'm currently aware of, anyway). The tag/size from a long message ACK was getting trashed before the actual body of the long message was sent back in an obscure race condition because the tag/size was stored in a temporary buffer. This race condition only exhibited itself during long message all-to-all tests. Woof. The solution was to save the tag/size immediately upon receipt -- not difficult at all, but it took forever for me to understand what was going on, what was going wrong, and exactly why the tag/size was getting trashed. I'll run all the tests on all three Myrinet systems that I've got access to and see if I can manage to get a beta out this weekend. Of course, as soon as I type this, I probably doom myself to finding other problems, but we'll see...

Network Solutions really sucks. We've copied the DNS zone files from nd.edu to cs.indiana.edu, and they're up and available. I tried to use the NSI web interface thingy to change their pointers for the two top-level DNS servers for lam-mpi.org (and .net and .com), but it wouldn't work. So I called them. I spent over 2 hours on the phone, and they still aren't changed. The first woman that I spoke to was... well, let's just say that she was unhelpful. She finally transferred me to "second level support", where I sat on hold for an hour before giving up and hanging up. That was 2 days ago. I haven't had the strength to call back yet. I feel bad because we continue to impose on the good will of Curt while this continues (it's his DNS server that we're using in nd.edu), and it's of no other fault than the fact that NSI sucks.

I'm still defending the USA down here in Atlanta. We're doing a massive IP number reorginzation tonight -- the changes go live in DNS at 1630 EST. We have a whole class C network to ourselves, are are only using less than half of the available IP addresses. So we're shifting all the IP addrs down to the lower 128 and letting GA Tech have the upper half back. So I'm going to go around to all the machines tonight and reset their IP numbers.

We're moving our servers around, too -- the old mail server is going to be retired (although it will stay on for the next few days, while the new DNS information propagates around).

I did a nessus scan the other day to determine what IPs were being used and which were not (DNS really didn't match what we had at all), and I found an unpatched IIS on one of the windows servers. Gulp. I immediately told my boss about it, and it started what can best be described as a political free-for-all brouhaha.

Suffice it to say that it took about 24 hours to get the machines powered off, and some people were very unhappy about that. They're going to need to be reinstalled, 'cause I'd find it extremely unlikely if no one has cracked them yet -- the IIS doors were wide open, with bright blinking neon lights, "Crack me! Crack me!".

Ah well. It's good to know that good technical sense finally prevailed over the political disputes. What will happen with those machines in the long run has yet to be determined, but at least they're off for now.

June 17, 2001

We're the pros from Dover

I heard Bjork on the radio on the way in this morning. You don't hear her stuff much on the radio anymore.

The movie MASH was on TV yesterday, and I watched part of it from my hotel room. MASH has to be one of the funniest movies of all time. Ever wonder where Hot Lips got her name? You gotta watch the movie to find out.

"Goddamn Army jeep!"

I think the football segment of the movie has got to be one of the best football movies ever. "I think their ringer just made our ringer."

I continue to have less and less hope in the current myrinet code. It seems to be a sinking morass of race conditions. I fix one, and another one appears. The next one inevitably shows up in an innocuous single test failing in the test suite. After tracing it down, it typically turns into a conglomeration of events that ends up in some memory location being used twice. <sigh>

This is completely the fault of stealing from the TCP RPI -- the myrinet RPI reflects many of the same assumptions that the TCP RPI reflects, most of which aren't necessary when using gm for communication. The central assumption that has caused the most Badness is that when you read() from or write() to a TCP socket, you may or may not transfer all the data that you expect to transfer.

Since most of the time you're not allowed to block, you have to have extensive bookkeeping to remember exactly where you were in reading/writing a given message. The next time you enter the state machine, you have to try to continue reading/writing from where you left off.

Hence, each socket has some state associated with it -- pointers for the current message being read/written, and how many bytes are left. This stuff is all redundant in myrinet, because it doesn't send/receive partial messages. Hence, when you send, it's sent. When you read, it's read in its entirety. However, the current code still uses these pointers that are associated with each "socket" (actually, we call it a "process"). It dawned on me while I was walking in that this could be the root of much Badness in the myrinet RPI. I'll spend the rest of today investigating getting rid of all of that stuff and see if I can send/receive directly from the MPI request in question rather than use all these temporary pointers/bookkeeping that is attached to each process.

T-5 days left on my current army tour.

I have to spend some quality time with my beret tonight and get it into shape.

I spent an hour on hold with Network Solutions yesterday before I finally got someone. Fortunately, the guy that I got was actually fairly cluefull. We managed to get the top-level DNS servers for lam-mpi.org|net|com changed to the IU servers.

The change apparently went in at 5am this morning; it'll take a day or three to propagate around the world. No domain that I have access to can see this change in DNS yet (LBL, MCS/ANL, ND, GATech, Telocity), so I hope it's propagating... I guess that was only about 6.5 hours ago, so it may not have been picked up by any of the respective local DNSs yet.

June 23, 2001

Can that thing measure a New York Minute? 'cause Jimmy could be walking through that door any minute. And this is New York Ci

Much has happened.

The easiest format is quickies [sound effect: mad crowd cheering]

I'm officially an IU post doc. Can't remember if this has been in the journal yet. Pay starts very soon. Real money -- woo hoo!

I'm back from my Army 2 weeks. Literally moments after I turned in my rental car and got in the shuttle to go to the airport, it started raining. Hard. So hard that it was darn near impossible to see out the windows. Could I have timed that any better? It is unlikely that I will return to Atlanta for future ATs; not only are there only 3 unix machines (part of the AT that I just completed involved ramping down their Unix side), there is also some question as to the future of that specific office. I've got some contacts that I'll be following up with to see if I can get another computer-type posting (as opposed to being a battalion signal officer somewhere). We'll see how that goes.

Contrary to my last 2 AT's, I sent off my Army travel/pay paperwork immediately.

My trip home from Atlanta was otherwise uneventful. The plane was somewhat late in taking off from Atlanta, but that was no biggie.

Tracy picked me up at the airport and we went out to dinner. We ended up at a table right around the corner from Janna and some of Jim's MBA study friends. We ended up going back to Janna's and watching the new version of Charlie's Angels. Holy cow, did that movie suck! I give it 25 feet. It was so bad that parts of it were really funny, but it wasn't bad enough to be funny enough to be a worthwhile movie. It was just plain bad. They tried a whole bunch of Matrix-like special effects which were technically competent enough, but (for example) I don't think that the actors/actresses carried off the harness work well at all. They even left the door open for a sequel, but I highly doubt that that will happen.

(Editor's note: If you don't know the internals of LAM, skip the following item as it will make less sense to you than a one-eyed dog looking at a "hidden eye" picture that contains a Hindu translation of the Rosetta Stone) Brian has found a really troubling bug in the lamd w.r.t. some new code that I put in recently for sending back the routing table in a call to ldogetlinks(). I recently changed the code to split up the routing table to only send a portion of the table at a time because the nsend() glue in the lamd does not packetize -- it truncates over 8k. This was a problem when the routing table was over 8k (i.e., lots of nodes in the LAM universe). However, my changes were somehow causing failures sometimes on RedHat 7.1 (consistently in ldogetlinks() in tping, but not in ldogetlinks() in any other program). Much weirdness. We decided to punt on this for now, and put the old code back. Brian's work in the lamd will soon give us fully-packetizing nsend(), anyway, so the point will be moot.

Monty (of Ogg/Vorbis) just replied to me that the problems inside the Vorbis engine to making it parallel are only going to get worse, because of new things that they want to do, etc. (insert math mumbo-jumbo here). Bummer. He thinks it's still possible, though, but it will require some API work in libvorbis to support this. So the door is not closed yet; we'll see how it works out.

I spent today catching up on paperwork and snail mail that accrued while I was defending the country.

Target has a fairly nice and fully-functional web site. I just went there to buy a wedding present for Ken/Amanda (their names don't really combine well). I was pleasantly surprised.

queeg rebooted sometime while I was in Georgia because of a power blip. Bonk. His uptime is now only 9 days. Unfortunately, I don't have any records of how long he was up before that.

Some machine in indiana.edu got hacked this past week. Someone on the ND CERT list sent the press release around; this makes two hacks this year. Hmm... imagine if ND had a press release every time they got hacked! (the implication here is that it would overflow ND's PR department)

I downloaded the newest stable bladeenc and am re-ripping a few of my CDs. I re-ripped the Weapon of Choice song from Fatboy Slim (the one that I complained about a few days ago here in the jjc -- I never noticed how bad my MP3 was until I heard it on the radio, and the quality was much clearer).

I've been hearing a lot of Fatboy Slim on the radio and in movies these days. It's probably because of my support and the taglines that I've been giving him here in the jjc. You're welcome, Normy.

I'll be visiting IU this upcoming week sometime (gotta fill out paperwork) and then go visit Brian and crew down at Sandia in New Mexico next week.

June 25, 2001

I took Miller and Johnson and squished 'em together and picked 'em apart and got... "Monsoon".

And so it goes.

I'm reading Robert A. Heinlein's Stranger in a Strange Land. A good book. I think I grok it.

When I debarked from my plane to Louisville on Friday afternoon, I turned my cell phone on. I wondered how long it would take for new voice mail messages to show up. When I turned it on, it showed no new messages. About 20 seconds later, BEEP!, and my messages arrived. Even though I'm familiar with the technology how it works, it's somehow amusing to me that the simple act of turning on my cell phone causes a database lookup on a voice mail server somewhere in the depths of the Verizon network.

Interestingly enough, the "you have voice mail" indicator lit up while I was in Atlanta last week. It used to only do that when I was in Louisville -- you could get voice mail from anywhere, but your phone would only alert you to new voice mails when you were in your "home" area. I wonder if that's nationwide now.

Jortney are now using squyres.com as a temporary home for their domain (and therefore e-mail) while they move into their new home. John is distressed because there's no broadband available where they're moving to. Sucks to be him. :-(

We bought patio furniture today. Woo hoo. (I didn't have too many opinions on this stuff; Tracy mostly picked it out) We also finally got blinds for our great room. I am actually pleased about that; it's the last window that we still had sheets hanging on.

I finally bought a headset for my cell phone today. It comes with a real headset-style over-the-head band thingy, but also converts to a clip-on-the-ear thingy. My C*'s called me today on my cell phone, so it proved to be an excellent opportunity to try it out. It works great, and is much more comfortable than holding the cell phone up to your ear, especially for those who are on the phone for non-trivial amounts of time. The only disadvantage is that it's a bit to big and fragile to shove in my back pocket with my cell phone if I want to go out, so it's really only useful for in the car. Perhaps I'll just keep it in my laptop case; it's not too inconvenient to switch to the headset during the middle of a call. They do have the small plug-in-your-ear kind that has a separate clip-on mike that you could shove in your pocket with your cell phone, but I generally don't like those things.

I converted the LAM ldogetlinks fiasco to use a single nsend. After thinking about the problem some more, I'm not sure that multi-threading the lamd would have fixed this problem. Hence, I just changed the protocol outright to be simpler (albeit less efficient). Hopefully, this will fix all of our woes (RH 7.1 tping and MPI_COMM_SPAWN_MULTIPLE).

Night fell, and the sun rose again. A new day.

Must work on my OER today, and then continue to work on the Myrinet RPI (haven't been able to work on that since mid-last week or so). I started to re-write it from scratch, and was coming up with a much, much simpler model, but was forced to ditch all of that because it would break our compatibility with our shared memory RPIs. Arrgh...

June 27, 2001

Good thing we didn't do any theropy, Dave

Emacs "C-x v =" is your friend.

Went up to IU yesterday. I got all my paperwork sorted out, and got my accounts setup in indiana.edu. Sooner or later, jsquyres@indiana.edu will start working (I think it works now, but I'm not sure where it's forwarding to...). I saw several of our new machines (several sun blade 100's and some big Dells for linux and win2k). Jeremiah and Ron seem to be establishing themselves nicely at IU.

We discussed the LAM license issue for a while. Lummy wants (in order) one of the following: Clarified Artistic License, Apache, BSD/MIT. I'm not too personally fond of the CAL -- it seems to be a bit restrictive-sounding on the issue of distributing binaries. Apache is not GPL-compatible, so I think we need to ixnay that one because it might lock us out of some linux/BSD distributions. I would not mind a BSD/MIT license. We'll see how this plays out.

I found out that my Rasmus number is 2. It turns out that Todd went to both high school and university with Rasmus (the author of PHP). Todd even had an account on Rasmus' BBS back in their high school years. It's a small world.

I have switched to having my pine config on the IMAP server. I have found that there are three different places that I typically access my mail from: my desktop, my laptop, and one of the workstations at school (which share a common filesystem, so it doesn't matter which one it is). So whenever I update my pine configuration, I have to update it in three places. This has proved to be annoying, and I rarely remember to do it. End result: when I run pine on an nd.edu machine, I don't have much of the setup that I'm accustomed to. Bonk. So I uploaded it to the IMAP server and made an alias ("gpine") for the lengthy command line that is necessary to fire up pine and retrieve my config from the server. Seems to work quite nicely. Now, if only pine would support disconnected IMAP operation...

Still haven't figured out how to access whale.cs.indiana.edu (the CS IMAP server) -- it doesn't seem to accept my password.

June 29, 2001

You have nice hands, Dave

Ick. My last entry was an example of good formatting gone bad. jjc even warned me about it, but my fingers just acted by themselves (really, I swear, officer!) and submitted the entry anyway. This is the same entry, but with the formatting fixed.

I had to delete the last entry from the jjc database as well, so that it didn't show up on the web page. Urgghh...

Great quote in text talking about the history of /bin vs. /usr/bin vs. /usr/local vs. /opt:

Manuals for these programs are present for one funny reason: Steve Bourne ran a cron script that checked /usr/bin for new/updated programs each night. If there was no manual or the manual had not been updated, the binary was removed by the daemon.

We finally got the project name "oscar" at SourceForge. Mike from IBM is filling up the site today and tomorrow. Finally!!

I was suddenly hit by the urge to hear "Echos" by Pink Floyd. It's playing right now. Mmmm.....

My paper got rejected from SC2001. Bonk. From the reviews, it was apparently mainly because I didn't have any results in it (they only wanted an extended abstract -- full paper to be submitted later). I was right up against the word limit as it was, so I put a blurb in there about "results will be included in final paper". Both Lummy and I thought that would be ok for the extended abstract. Apparently not. <sigh>

Tracy and I built our new patio furniture in the rain yesterday. The furniture was delivered to our back patio during the day. Shortly after Tracy got home, it started raining. Oops -- all the furniture is still out there, and is in cardboard boxes! So we decided to just build it right then and there. When was the last time you built patio furniture in a thunderstorm?

Had more interesting discussions today with Brian about multi-threading LAM and the lamd (at least 2 hours worth). Good stuff, but very confusing. Wow. He did a good writeup of it in his journal.
Talked with Dog for a long time about what he's going to be doing in LAM, too. Very cool stuff. He's going to modularize some of the stuff in LAM that we use for bootup and various system services on different kinds of systems (regular rsh, scyld, tm, globus/grid, condor/grid, condor, etc.). We more or less figured out how to do it such that it can be entirely self-contained in its own module directory (e.g., modules/rsh or modules/scyld).

The most obvious example of where such things would be useful is for lambooting -- each different kind of system has different ways to launch executables on remote nodes. But there are other things as well -- Scyld's whole "there's little or no filesystem on the nodes" concept really threw Brian for a loop when he did the Scyld stuff, for example.

Here's the loose plan:

The idea is to aggressively build as many of the modules as possible. Hence, it tries to configure all of the modules. If the configuration of a given module fails (e.g., libbproc can't be found
-- so we must not be on a Scyld system), we don't build it.

Additionally, the overriding goal here is that a module is completely self-contained in its directory -- the addition of a new directory requires no changes to any other part of LAM.

LAM's top-level ./configure will traverse the directories in modules/ and look for a configure script. If a directory has one, the top-level ./configure will run it.

If the configure script in that directory succeeds, the top-level ./configure will add it to the "to be built" list. If the configure script in that directory fails, that directory will be ignored.

For all modules listed in the "to be built" list, the top-level ./configure script creates a .c file (perhaps share/etc/modules_init.c) that is part LAM (in liblam.a somewhere) that initializes the modules. This .c's only purpose is to call the "init" function of each module. So some standard header is written out, followed by a list of "lam_module_NAME_init()" calls, where NAME is replaced with rsh, scyld, etc. (i.e., whatever the name of the module's directory is). This is because the function names cannot be the same, or we'll get linker errors. So instead, there is a naming convention so that we can build the function call list on the fly.

Indeed, the API that these modules will need to support will also not have function names (for the same linker error issues) -- the init function of each module will need to supply a struct full of function pointers of all the module functions.

Assumedly, LAM will have one or more modules built at compile time. Later, at run time, LAM must determine which module to use. One of the module functions (perhaps it will be the init function itself) will be used to make this decision. That is, keep the decision for whether that module should be run or not in the module itself -- the module can do whatever test it wants to determine if it should be run. For example, if the tm module detects the environment variable PBS_ENVIRONMENT, then the tm module should be used.

However, one can imagine situations where multiple modules may report "yes, I'm the module to use". So each module should probably also have a command line flag that forces its use to resolve ambiguities. For example, say you're running in PBS, but also happen to be in a Globus environment. In this case, you'll probably want to use the Globus module, not the tm module. However, both modules would probably report that they could be used. So we'll have a flag such as "-Mglobus" to lamboot that would tell all the modules "if you're name ain't 'globus', you ain't runnin'." But most of the time, there probably won't be an ambiguity, and the modules can just determine themselves which module to use (optimize the common case).

This is actually quite a useful concept. There's a few other details that I didn't mention (e.g., for all the API functions, there will also be "default" versions such that if a given module supplies a NULL function pointer for a given API call, the default version will be used instead -- somewhat like C++ base/virtual functions).

I love interpreted languages that have eval functionality. This allows you to effectively have self-generating code. I'm guessing that this is the entire premise of Spielberg's new "A.I." movie --
self-modifying php and perl scripts that went bad and ended up going to war with each other to prove language supremacy once and for all.

Let us not forget the following quote from the Field Guide to Your Unix Sysadmin:

What do you get when you make an M P 3? Besides artifacts and patent royalties? It's not to late to open your mind. Use Ogg Vorbis Don't Fall Be Hind.

Don't you pay those Ger-er-mans.

You could live in Happiness too! Like the Ogg Vor Bis Programmers Do!

I took some friends to the airport today where they're leaving for a vacation. I took a very sub-optimal route home, though -- I really need to learn the roads around here better. Doh!

Rich Murphy made some groovy points about my last journal entry about having multiple "modules" for system services (including booting) in LAM. His main point was that we should just use dynamically loadable modules and avoid what I was talking about. If I don't post something about this, I'm sure that Darrell will say the same thing. :-)

Here's part of his e-mail:

Let me make a suggestion. Get to know and love dlopen() (or equivalents on other platforms... Solaris = dlopen(), linux = dlopen(), IRIX, AIX, etc., I have no idea). Basically, you make this part of the code modular by loading a shared object. Each shared object has a function, like lam_boot_module_init(), and maybe a lam_boot_module_finish(). Then, you set up a handle into your lam boot module's functions, say you want each module to implement a generic open_remote_node() and close_remote_node() function. You have a structure like:

[code snipped for brevity]

The module loader uses dlopen(), and can be driven from some init script. Then you can use dlsym() to find your lam_boot_module_init function. Call it and get the handle to everything else you care about. Then you're done.

Also, you can require that other modules dynamically load the libraries they need...

The best part is you don't have to rebuild lam every time, you don't have to futz with finding unique names every time, and your interface is perfectly well defined.

You probably want static linking in liblam.a, but why???

His last question is exactly right -- I do want static linking. I have three main reasons why:

When using dynamic libraries/objects, the user inevitably gets
burned by a) using a wrong/old version -- I'm reminded of DLL
russian roulette in windoze -- b) paths changing and therefore
having to set LD_LIBRARY_PATH (or some equivalent), or c)
doing a new installation of LAM can fuck currently-running MPI
programs (e.g., MPI programs that run for weeks).

Difference of dynamic linker functions on different OS's; creates
headaches for us with [potentially] lots of #if kinds of
statements. Ick.

Creating .so's on different architectures is, at best, a
nightmare. Libtool only partly solves the problem. Hence, we
make shared libraries an option, not a requirement. Portability, portability, portability! Unix != Unix.

My end goals are:

Maximum portability and reliability with minimum effort. If I
can have configure write out a single .c file instead of
changing my whole paradigm (using dlopen(), adding environment
variables to potentially specify alternate locations / version
numbers of shared libraries, build shared objects, etc. -- that's
a large effort.

Minimum change for the user to fuck it up, particularly after the
installation (see #1, above) -- put it this way: we got a
question on the LAM list the other day from a user asking how to
set $PATH. Do I really want to explain the nuances of shared
libraries to these kinds of users? No.

Consider the target audience for MPI: scientists and
engineers. NOT necessarily computer science folks. People who
still write in fortran. Why? Because it's simple and it works.
They can chunk in their formulas in really shitty coding styles
and rely on the compiler to spit our nice optimized code for
them. They just want it to work -- they don't care how.

This guy asking about $PATH is a typical example of that.
So while we privately laugh at him, we'd be pretty hard pressed
to explain the basics of how a particle beam accelerator works,
and/or how to make adjustments to it. So one can see his
viewpoint, at least.

Granted, we'll probably never have to use a particle beam
accelerator, but you get my point. :-)

MPI is just a tool. And it should be darn easy to use the
run-time environment that is required to run it. And by "darn
easy", I mean adding one entry to your $PATH, if any. If
you're very adventurous, you can also add something to your$MANPATH. More than that, and the users' eyes glaze over, we
get bombarded with questions on the mailing list, and users think
"this LAM is a piece of crap -- why do I have do do all of this
just to run a job?" They might be damn good technical reasons to
do the 20 different things to your environment before running a
LAM job, but no "normal users" will do them. It's almost a PR
issue. Know your audience. Target them. Make things easier for
them so that they can concentrate on their real work, not the
intricacies of how MPI/LAM/whatever works.

Software needs to suck less, and unfortunately I can't make
LAM not suck less if I use C++ or shared libraries. Yet. :-(

There were some other interesting side issues in that e-mail conversation, but that's the gist of it.

I'm getting to the end of Stranger in a Strange Land. It's actually getting disappointing. :-(

It started off well as typically SciFi with a human that had grown up with Martians and was returned to Earth. But towards the end of the book, it's just degenerated into discussions about sex and whatnot that seem somewhat frivolous. I understand the point that Heinlein is trying to make, but (IMHO) it could well have been made without descending into semi-porno.

But that's just my opinion...

Watched October Sky with Tracy last night. A good warm-fuzzy flick, with elements of "engineers rule!". I give it 15 minutes.

I also watched End of Days with Arnold Schwartezzenaggerama in it. I thought it was a good movie -- I've always enjoyed christian-end-of-the-world / mysticism movies. However, I can see how it didn't do spectacularly well in the theaters 'cause Arnold portrays quite a different kind of character that his fans know and love. Even though he wins in the end, he's portrayed as a weak ex-cop. Plus he has no witty one-liner puns that he's famous for.

But I enjoyed it, and it had some really great special effects. 20 minutes.

If you're ever in an argument and you start losing, and perhaps realizing that your position is less than correct, you can abruptly win the argument by saying, "Yeah, that's just what Hitler said!".

Most everyone will recoil in horror at the thought of being compared to Hitler. Hence, by invoking a known abhorrent image that probably has absolutely nothing to do with the conversation, you win.

It works the other way, too. If you're arguing with a Neo-Nazi, just say, "Yeah, that's just what Jesus said!" The end result will be the same.

The Myrinet struggle goes on. I find bugs, I fix them. I got it to a point where all the tests that should pass on the Hydra did, and then took it out to LBL. There I found a few endian issues, and a minor seg fault in connect_all().

Now I've got some insidious problems in COMM_SPAWN that I think are actually symptoms of something else. <sigh>

July 4, 2001

The only question that remains is "which of them do I fire?"

I went to Sandia with Andy this week.

It was a neat trip; I've been to Albequrque (sp?) before, but not to Sandia itself. I didn't realize that it was physically located on an Air Force Base (Kirkland AFB? Don't remember the name offhand).

Next time, however, I'll be flying in and out of Louisville, not Indianapolis. This time, I picked up Lummy in Bloomies and then drove to Indianapolis to fly out. We came home on Monday night, and after driving Lummy back to Bloomies, I didn't get back to Louisville until about 3:30am on Tuesday morning. Never again.

The moon and Mars were really bright the whole way home, though.

I also found out that you will get stuck behind someone slow on IN-46, regardless of the time of day.

The trip itself was cool. We went "behind the fence" at Sandia, into Classified County. We had to be escorted and within sight of Rich, our contact, the entire time. When we went into his office building, we had to sign in, and the secretaries put big magnetic "Caution -- Uncleared personnel in the area!" stickers on all the doors.

We were down there to see Brian and kick off our LAM/MPI collaboration with those folks. They actually can only generally tell us what they are using LAM for -- "simulating the nuclear stockpile". Any specifics beyond that are apparently on a I-could-tell-you, but-then-I'd-have-to-kill-you basis. Which is ok; I think I'm comfortable not knowing. :-)

Brian and I both gave talks; I gave a general "LAM is great" talk, and Brian gave a talk about specifically what we are doing with Sandia (we were up until 1-2am working on his slides, and then Brian came back to our hotel at 6:30am to practice his talk). Both talks went well. We met with Ron and the CPLANT folks as well; we'll probably be at least coordinating with those folks during this work, which is good.

In general, the trip was a success, and we have a better understanding of what they are trying to do, and what they would like us to do with LAM/MPI. This should be a very interesting project.

I notice the news reports that the European Union has blocked the GE / Honeywell merger. This was pretty much expected, I think --
there has been rumblings about this before. But it's still amazing. I am not familiar with the details, but I can't imagine that the US government is going to be happy about this (that's how much clout GE has). Even President Bush has apparently made some comments about how he is not pleased about this.

Woof.

Tracy and I went to Gina and Dan's 4th of July party last night. There were lots of GE folks there. We stayed until after midnight sometime; it was quite fun.

My conversation with Rich about shared library modules for LAM continues. He brings up good points about the efficiency of shared libraries and how it's not the fault of the concept of shared libraries that, etc. And he's right. There's lots of very good technical reasons that we should use shared libraries for the module design in LAM.

But we won't.

At least, not right now. Right now, the state of technology for shared libraries (IMHO) is too non-uniform. Probably the number one reason why is that it's a nightmare to create shared libraries on different platforms. Even GNU libtool isn't a complete solution (it doesn't work on all platforms). Hence, LAM/MPI is not in a position to require shared libraries. Sure, it can be an option, but not a requirement.
This reason is closely followed by the fact that it would be a somewhat large delta to change to explicitly use dynamic libraries (properly) with dlopen() and friends. Is this rocket science? No. It's not even particularly hard. But it's a nonzero change, and, at the moment, unnecessary. So good engineering dictates that we don't do it now. Get it working (with static linking), and then possibly move to shared library modules. With good modular design, the change is the same now as it would be to do it in the future, so not implementing it now reduces the number of variables that we have to debug.

One point that came out that I assumed was common knowledge was that LAM can be compiled as shared libraries (using GNU libtool). Hence, all this module stuff can end up in liblam.so (vs. liblam.a). This was actually one of Rich's points -- that we shouldn't close the door on shared libraries completely, because of the various performance benefits, particularly for large SMP boxen.

My only point is that I don't want each of these modules to be their own shared library. Whether or not liblam is a shared or static library, I don't care -- that's a user choice -- just as long as it's not a requirement. More to the point, if the compiler/linker/libtool can give me shared libraries for free, great. But I don't want to explicitly program for shared libraries (dlopen() and friends) for the reasons that I stated above and in prior journal entries. At least not yet.

And, for the record -- I was right: Darrell did feel the urge to say the same things that Rich did. :-)

On another LAM note, someone found a minor bug in LAM 6.5.2 such that compiling programs with MPI I/O won't work. Arrghh... This may trigger the release of LAM 6.5.3. But we're currently in the process of figuring out what the license for LAM/MPI will be down at IU. I'm trying to push BSD, but other forces are at work (including IU's lawyers). We'll see how this shakes out...

July 8, 2001

Kenau Reeves can't act

Did you ever notice how green looks absolutely nothing like red?

I got my 9 "free" CDs from BMG (had to pay shipping and handling). They suck. It was damn hard to find 9 CDs in their selection that I wanted. And some of them I didn't really want. Ugh. And I think I ended up with at least one Sara McLachlan CD that Tracy already has. Double ugh.

The script that we've been using for CVS diff mails doesn't handle binaries nicely. I discovered that when I checked in a powerpoint presentation the other day -- it sent the whole binary file in the e-mail.

I hacked it up a bit so that it doesn't do this anymore. Apparently, CVS won't tell you if a file is binary or not. So I had to add a list of filename extensions such that the script will check the incoming files to see if they match. If they do, no diff.

I sent the script back to the Vorbis folks (that's where we got it from in the first place), and they put it in use immediately.

Tracy and I went to a 4th of July picnic with the people on our street. Nice folks. Met a few of the kids, too. One guy (retired) worked in a tobacco factory for 35 years as a machinist. He had some interesting stories.

A fairly serious (but pretty small) bug was found in LAM's mpicc this past week such that it was necessary to release LAM 6.5.3. I rolled in a few more minor bug fixes as well -- most of them had to do with ROMIO (the bane of my existence).

I had a long and exhausting conversation with Trond from RedHat about building the RPMs. As a result, I slightly improved the spec file for LAM, and we finally violently agreed that having the LAM RPM built on a RH 6.2 machine means that the man pages and doc files will be installed in the wrong place when the RPM is installed on RH 7.x machines. If the RPM is built (or the SRPM is rebuilt) on a 7.0 machine, since we didn't hard-code any file locations, the man pages and doc files will magically end up in the Right Place.

I think that I finally have the gm (myrinet) RPI for LAM working. I found many, many bugs, and a bunch of things that just weren't implemented yet. It passes the entire test suite on the hydra, Babel, and Chiba City (finally). I didn't realize that the test suite was so thorough. Let's hope that it is really thorough, and it found all the bugs.

I had to patch up a lot of the "make dist" procedure to build the new tarball, especially the lamtests suite (because lamtests now uses its own automake build procedure).

I'll wait for a few more confirmations (mailed it to Brian, Joe, and Feldy); hopefully we'll release the beta early this week.

A /. article caught my eye this evening -- some guy was mad because he got cut off from Telocity DSL when Northpoint went out a few months ago, and Telocity just got around to asking for their gateway back (they send you a pre-paid airbill box, by the way). Admittedly, lousy accounting on Telocity's party -- Northpoint shut down quite a while ago. However, it might still be in the 6-8 week range, which is fairly normal for these kinds of things.

The /. article played up the fact that Telocity charges $500 if you don't return the modem.

Duh.

It's right in the contract. And it only makes sense -- if they give you a modem for free, it certainly makes sense that they want it back when you discontinue service with them. I know that I was either told that or I read it in the terms of service when I signed up with Telocity (both times). So I have no sympathy for this guy if he either doesn't have the modem any more or hacked it up to play with it.

A few people said as much. A few others disagreed. Idiots. How can you disagree with something that you agreed to, even implicitly?

So I posted pretty much what I said here, plus a link to the Telocity TOS web page where it specifically states that you have to return the modem or pay $500. I actually got moderated up to a 4 (insightful). Although I've really only posted on /. a handful of other times, this is the first time that I've been moderated up. Amazing. I posted fairly quickly, used bad grammar, and the posting wasn't entirely clear. But I guess I had the facts on my side.

Johnney Mnemonic is on TNT tonight. This movie sucks. I just noticed that Sandoval from Earth: Final Conflict is in this movie. His acting hasn't improved at all. So is Izzy from Starship Troopers.

The movie still sucks.

But hey -- The Planet of the Apes w/ Marky Mark is coming! Looks pretty cool; I always enjoyed those movies. It makes me wanna watch The Big Hit.

We had a big (but short) storm here tonight -- rain was coming down in sheets. It's a good thing that I watered the lawn today.

July 12, 2001

Jeff's Journal

Strange things are afoot at the Circle K.

In the slashdot post that I mentioned in a prior jjc post, Pete actually magically got moderator privileges that day (all the moons must have been in alignment), and so he moderated my post up to a 5 (the highest possible value). LOL!! Like I said, my post used bad grammar and was slightly unclear. I find it extremely amusing that it got moderated all the way up to a 4 (without help from friends), and then finally pushed up to a 5 (with a little help :-).

I picked up Janna from the airport the other day; they had just returned from a holiday of hiking through Switzerland. They had a great time and did much walking throughout various out-of-the-way little Swiss towns.

They bought me a swiss army knife with my initials engraved on it, and got some chocolate for Tracy.

We ran into Aimee in the Louisville airport; she was on her way to some business meeting the next day. It's a small world.

Eric fixed the latency problems on the Babel cluster. It seems that the NICs were continually going into auto-negotiation mode (to swap between 10 and 100Mbps) for some reason, which caused all kinds of retransmission delays and errors. But now the cluster seems to be working well (it's amazing that it has been that way for at least a year) -- the NFS delays seem to be much more normal, meaning that the latency is about what you would expect from NFS.

I drove back from Bloomies yesterday. Had a good 2 days there; I'll typically be spending 1-2 days a week there.

It's very easy to get from Bloomies to my house -- it's essentially 3 highways: IN46 to I65 to I64.

While I was driving home yesterday, I was driving along the forest-lined IN46 when suddenly 3 C-130's flew above the tree tops right in front of me (military transport planes). They were flying North. They looked to be at about the right altitude for parachute jumping. IN46 is nowhere near any military bases that I know of. Weird.

Later, I got on to I65 and headed towards Louisville. About 40 miles south, I saw 3 C-130s again (must have been the same ones, although they were much higher) flying east as they flew over me on I65. About 20 miles further south, I saw them again, this time on the West side of I65, but flying due south.

The Louisville airport was somewhat nearby, and they were heading in that general direction, but Ft. Knox is an additional 75-100 miles further south; they might have been heading there as well. Both airports can handle C-130s. Who knows.

The only obvious conclusion that I can draw from this experience is that the government is shadowing me, and/or bombarding me with electro-magneto kinetic rays in attempts to steal my brain. "This line is tapped, so I must be brief."

If I disappear and/or turn into a Microsoft neophyte, you all will know the reason why. Let the truth be known; trust no one.

I came across a great term yesterday: "war driving". It's a moniker off the old term "war dialing" from back in the 70s and 80s. War driving is taking a laptop with a wireless NIC and literally driving around and seeing what wireless networks you can tap into.

The SSI project in LAM is going quite well; I'm pretty excited about it. More and more ideas about what it can be used for keep occurring to me; SSI may solve a lot of issues and provide a really nice framework inside of LAM. If we do it Right, it may end up being the One True Way that we integrate LAM to all new kinds of systems (PBS/TM, Grid, Scyld, KLAT, etc.). Indeed, I'd really like to be able to use SSI to integrate new algorithms into LAM -- such as the tree-based lamboot. Very cool stuff.

I chatted with Kay at IU yesterday; she's a "pre-faculty" in the CS department (analogous to how I'm a pre-post-doc). Her advisor is Andrew C. (formerly of UIUC); she's part of the Fast Messages Group. We had an interesting chat about Myrinet, research, software engineering, Windoze development, etc. She's somehow connected to our IPCRES group, but indirectly. I don't understand the exact relationship (I don't think anyone does, yet :-). But she'll be at least somehow connected to our group.

It seems that one of the big things that Fast Messages did to make Myrinet message passing fast was to support essentially the same thing that writev() does (although they call it "scatter / gather", which confused me until I realized that she wasn't talking about collectives) -- take pointers to different chunks of memory and write them all out into a single transfer, rather than:

Force the user to copy everything into a single, contiguous buffer so that it can be sent in a single transfer, or

Force the user to use multiple transfers to send everything.

GM doesn't currently have such a vectored-write (or vectored-read) capability. So I pinged the folks at Myri about it and asked if they will ever support such a thing. Indeed, that would help us in LAM --
it would effectively eliminate the need for the tiny message protocol in the gm RPI.

July 14, 2001

I'm the original Nicks Superfan

My IP traffic to IU goes through Atlanta, then to New York, then to Bloomies.

My pilot fritzed out again the other day and reset itself for no reason. Last week, it fritzed out and lost all of its data (also apparently for no reason). This wasn't a tragedy, because I had done a backup in the recent past, but I did end up losing some data, which was annoying. This pilot is just getting kinda old; it was pretty hard to write on it (letter accuracy was pretty bad). Not that I'm an expert at palm writing, but when I wrote on Tracy's (new) pilot, my accuracy is much better. This leads me to believe that my (old) pilot is just getting tired. Indeed, the writing area is visibly worn.

So I went out an out a new m100. There's lots of other more advanced models -- indeed, m100 is the low-end offering -- but that's really all that I need. Although I had visit three stores before I finally found the m100 in stock (at CompUSA). Best Buy and Staples were both sold out.

I transferred over all my data and everything appears to be find with the new m100. My old pilot has officially been retired. It served me well.

So this brings the tally to five -- this is my fifth palm pilot. The record stands:

Tracy has broken a pilot as well (I think it was a short drop from her desk...?); she's on her second pilot. Now we both have identical m100s. Ugh...

I bought Depeche Mode's new "Exciter" CD while I was in Bloomies on Monday (stopped in Target to buy some Bennadryl, and ended up wandering over to the CD section). It's not bad, but the music is slower than their previous stuff.

Had a really long SSI call today. Dog, Brian, and I hammered out a whole bunch of stuff -- we've had separate discussions up until this point; this call helped have all three of us agree on a whole bunch of points. There's still a ways to go, but we agreed on things like:

There will be different kinds of modules (e.g., a comm module for ipv4 vs. ipv6, a boot module for the lamboot kinds of things, etc.).

Some kinds of modules will be use-only-one-of-all-available-modules (e.g,. comm), whereas some kinds of modules will be use-all-available-modules (e.g., the RPI).

The SSI glue will be general enough to not care where the modules are located (in liblam or libmpi).

The SSI glue will have three essential functions: init, finalize, and export_tables.

LAM's top-level configure will write out the tables that the init functions will use

Each module will have its own top-level interface that the rest of LAM/MPI will call. These top-level interfaces will do their own dispatching.

Modules will emulate C++ inheritance by specifying two global variables in each module: a struct full of a bunch of function pointers, and a pointer to a "base" struct full of a bunch of function pointers (which can be NULL).

Modules with NULL values in their struct of function pointers will have those NULLs replaced with real function pointers from their "base" class during init, so that during run time, we only have to do one pointer lookup, not [potentially] many.

The SSI finalize glue may not be called, but only if the process doesn't call kexit() (e.g., the lamd dies by signal)

The init function of each module may fail if that module determines that it should not run.

There will be command line interface that may be used to force the selection of a given module. It will likely have a three part nomenclature: a common option, the module kind name, and the specific module name.

Modules can call the SSI export_tables routine to get the final function pointer table for a given kind (probably with arg type (void*)) so that the SSI dispatch wrappers can be bypassed for performance reasons (this is important for the RPI).

I think those were the majority of things decided, in addition to the function breakdown of several of the [proposed] kinds of modules. One notable thing that we haven't decided yet is how to handle arguments of different types. The comm module has a notable problem: argument types and sizes are different between ipv4 and ipv6. We may get away with using handles to the "real" datatypes in many places, but Brian thinks we're going to have many problems in the routerd in the lamd, because it stores tables of IP addresses. Ugh. We'll see how that shakes out...

Connectivity to ND has sucked today. It keeps going away for 1-2 minutes at a time. Very frustrating...

I got /. moderator access this morning; I don't think that I've ever had that before. So I moderated a 5 articles and did my civic duty.

I went to see Tomb Raider with Tracy last night. It was fairly good. It wasn't what I would call a great movie, but it was worth seeing on the big screen and whatnot. Good effects, but I found some of the action a little hard to follow because they kept switching the camera at a high frequency. I'm sure that they did this on purpose, but I didn't like it much.

All in all, though, it was a good flick. Rimmer (Chris Barrie) from Red Dwarf was the butler. I give the movie 15 minutes.

Got some LAM patches from the KLAT folks to make LAM work on their systems. Their networks have the property that each node has multiple NICs, and the switches are wired up in a non-uniform way. i.e., nodes A and B may have entirely different ideas of what the IP address of node C is.

So LAM has to do something a little different than what it normally does: pass around hostnames rather than host addresses. Tim from the KLAT project hacked this up in LAM and sent me a patch. I'll probably be applying it soon.

Tim also noticed a small bug in LAM, such that when /tmp is NFS shared across multiple nodes, when the haltd in the lamd dies, it leaves the kill file open. This causes NFS to keep a cache file open (of the form .nfsNNNNNNNN) in the LAM directory, and therefore doesn't let tkill remove the entire directory. Blech. Tim's fix for this doesn't work when one compiles the lamd as separate pseudodaemons, so I passed it on to Brian to see what he can do with it.

I renewed squyres.com today for another 2 years (thanks for reminding me, palm pilot!) I renewed it a few months early, but hey... I don't wanna lose it. :-)

BIG NEWS!!!

I have it on reliable sources that Nabisco is considering releasing a new animal for animal crackers since this year is the 100th anniversary of animal crackers. There are four animals under consideration; right now, Nabisco employees are being asked to vote for which they like the best. Supposedly, Nabisco will open the voting to the public later this year.

Here's the four animals under consideration:

Koala bear

Walrus

Cobra

Penguin

I think we all know what it should be. When Nabisco goes public with this, we'll have to post this to Slashdot and get all those Linux lovers to stuff the ballot box.

ND was supposed to switch to kerberos 5 this morning. By our syslogs (and the 6 bazillion cron e-mails that I got), AFS was out (on schedule) for several hours this morning, and came back a little after 6am. But checking several machines after they have been rebooted (they needed to be rebooted to get the new authentication scheme --
it's complicated), they didn't appear to be using kerberos 5.

What's the scoop?

Not surprisingly, the OIT has said nothing. Curt sent a "WTF?" kind of message to the AFS/Unix list, but AFAIK, there's been no response yet (it's about 2pm now). The OIT really sucks sometimes. Haven't they ever heard of communication? Haven't we been harping on their [lack of] communication skills for years now? Haven't they learned yet?

Apparently not. :-(

I just got paid by the military for my 2 weeks in Atlanta, but DFAS took out state taxes for IN even though I explicitly submitted a form saying KY. <sigh>

Some guy just posted a Solaris system administration question to the LAM list. Weird.

July 20, 2001

There is nothing more joyful than wallpapering behind a toilet

The other day, I almost lost queeg. More specifically, I almost lost queeg's partition table. I was installing VMware to do some OSCAR testing. Wait... let me back up.

A long time ago, when I first got the box that queeg currently lives on, I originally had visions of dual booting it with Windoze. So I left a several GB partition on the disk for windoze. I even formatted it as FAT32, and mounted it under /mnt/windows in Linux. But then again, I never got around to installing/dual-booting Windoze.

Since I was running short on disk space (it's amazing how fast 20GB can get used up...), I decided to install my VMware virtual disks in the FAT32 partition. VMware complained that it either had lousy performance or couldn't lock the virtual disks or something in FAT32 partitions.

"No problem," I thought, "I'll just whack that partition and replace it with an ext2 (native linux) partition. I'm never gonna install Windoze anyway."

After a series of UTFS errors, I had gotten to the point where diskdrake (Mandrake's nice GUI partitioning tool) claimed that my partition table was corrupt and it couldn't read it. DOH!! After a morning full of fretting, backing up all my data, and carefully poking around with fdisk, I was finally able to restore my partition table and convert the FAT32 partition to ext2.

Moral of the story: it was the ReiserFS stuff that caused the majority of my woes (long story, I won't bother explaining here). While having a journaling filesystem is great, this ReiserFS stuff in Linux 2.2 can really bite you in the butt. I hope that it's better integrated in Linux 2.4.

On the up side, VMware is actually pretty nice. This is the first time that I've ever used it. I think I'll probably be buying a real copy of it (I only have a 30 day trial license right now) so that I can run windoze in that -- much easier than dual booting.

I really like the feature of their "undoable" disks. You can install an OS, mutz around with it, get it up to a known good state, checkmark it back to the persistent store, and then start testing. If your disk goes wonky, you can just say "throw out those changes --
let's reboot with the last known good state". More to the point, when you shut down the virtual machine, VMware asks if you want to commit all the changes that you've made to the virtual disk. If you say yes, all the stuff you did on disk will be visible the next time you boot that virtual machine. If you say no, the disk will be in the same state as it was when you booted.

Needless to say, this is extremely handy. It would be cooler if I could checkpoint the disk at any time (vs. only when I shut down the virtual machine), but there are some obvious synchronization issues involved there. Still, it would be handy.

I have 256MB of RAM in my machine, and that's enough to run 2 copies of VMware comfortably. Trying to run a 3rd at the same time causes major swappage.

Had a long chat with Darrell and Dian last night. It was good to talk to them again. Darrell is doing some very cool stuff at Yahoo. He has much more low-level kernel knowledge than I do; I really need to get into that stuff.

He also made a good suggestion about the versioning that we are planning on doing for SSI. It's amazing how we have lived thousands of miles from each other over the past decade, but yet our careers have managed to take many parallel paths. It's cosmic, dude.

D also suggested that I should have a way to get the titles from my journal entries in downloadable file that /bin/forture can use. It's a funny enough idea that I'll probably have to find the time to do that someday. :-)

The Code Red worm is running rampant on the net. I just have to laugh. It's a clever worm -- the authors made a few mistakes (like hard coding the IP address of whitehouse.gov -- duh), but the ideas behind it are both insightful and scary.

squyres.com has been hit with Code Red probes at least 20 times or so. kresge.com has been hit many, many more times than that. As of last night, www.cse.nd.edu had been hit over 58,000 (!) times just yesterday alone. (all the probes were ineffectual, because we all run Unix web servers, not MS IIS) D and I were wondering why my server had only been hit 20 times, yet others were seeing hundreds of thousands of hits. I wonder if the random IP address generator in the worm has a propensity for class A and B networks.

July 27, 2001

The great oepn source debate

I'm sitting here waiting for the MS vs. Open source panel debate to start (they're officially late starting, as of this point -- more than a few people have speculated that Microsoft hasn't showed up -- or "blue screened"). It's a big room and is slowly filling up (the center filled up immediately, of course). The Slashdot crew is sitting about 20 yards from me (Rob Malta, etc.), the Apache folks are right over there, Tim O'Reilly is strutting around the room. I'm sure that Miguel is around here somewhere, as well as various other open source luminaries. Quite a collection of people.

The amusing thing is that right before they opened up the doors to the room, RedHat passed out dozens of red plastic hats, and Sun passed out dozens of "OpenOffice.org" t-shirts.

So let's a lot of people wearing those hats, and at least a few wearing the openoffice t-shirts.

The panel should be interesting. Craig Mundie, the MS Veep who has been taking pot shots at the GPL and open source is the MS representative here today. He is countered by Michael Tiemann, CTO of RedHat. So Mundie is walking into an openly hostile audience. The first question that pops into my mind is "why would MS agree to this?" Do they really feel that they are right? Do they just want to show that they're not afraid?

I'm sure that news accounts of this debate will be all over .net... er... the net within a few hours. Might prove to be interesting.

I typed much through the speeches and panel. They're not guaranteed to be right (definitely abbreviated, and lots of typos), but I thought you might be interested. Some of the audience questions were pretty damn stupid, I have to say...

Talk name: "Informed Choice" - Our goal: creating an environment about informed choice - My goal about speaking: not legistlate, create a dialogue, and inform others about long term ramifications of their choices

MS has no beef w/ open source. We think it's an integral part of ecosystem that has fueled tremendous success. But there are aspects of this movement -- free software and open source. And the press is certainly confused.

Software as a business - MS choice: the commercial software model; built on a business model, licensing, investment in R&D, community and standards - Software industry: an integral part of US economy, 148000 commercial companies, 2 million jobs, resulting in $28.2 billion in taxes paid, export revenues of $121B

--> He's driving that countries should be worried about this 'cause
free software doesn't provide jobs and income, and is therefore
unhealthy for

--slide

Learning from open source

- expanded community programs - expanded source access: "shaored"source - range of licenses for different customers, partners and the intellectual commons, still provided under a commercial model

--slide

Summary: MS believes that the commercial model is good for the nationwide and global economy.

Michael Tiemann RH CTO

"To be, rather than to seem" I claim: to build an arch of trust, it is better to be open, rather than to seem open. Same to be trustworthy, rather than to seem trustworthy.

<applause>

He believes that the (?free/open?) source results in economic opprotunity. Cites fair/equitable competition.

Open source makes it much easier to be rather than to seem. Compares MS to alternatitive minimum tax -- which is neither alternative nor minimum.

Why would MS try this new high profile approach, when previous approach was working well?

Answer: Oct 31, 1998 -- the Haloween documents. There are a lot of smart people at MS. They see that OS model is a valid business model that can legitmately compete with MS.

GPL is the spine of OS. MS uses strong proprietary license. GPL is strong free license, like 1st ammendments. MS has benefitted (illegally -- his word) from application of its licenses. RH (and others) benefits from GPL protection. RH has always hit quarterly predictions, and went profitable a year early. Why is GPL bad?

Back to 1998. Revolution inside MS. Fueled by smart people in MS. Fueled by OS superiority. (cites purchase of Hotmail, tried to convert to windoze multiple times -- the light goes on and people realize "OS software is better"). "Do you think that the people who administrate those systems think 'Gosh, I wish that I could dump this BSD crud and replace it with windows?' I don't."

Shared source has nothing to do with building community outside of MS. It's not a licence, but a treaty, crafted by execs trying to buy time to quell the internal civil war.

MS has done much innovation. We are thankful for things like XML. etc. But "winner take all" attitude has to go.

When MS is ready to accept the GPL, and ready to accept fair competition (and many other comparisons), then we will welcome you to this party as a first class citizen.

Craig Mundie: to respond. It's hard to know what to think by looking inside a company from the outside. There are many different views. We're not embarassed that people come forward and ask questions, whether we do the right thing or not. We have a single focus, though. The leadership is single minded in going forward. We have consistent leadership, though. Those who disagree can go do something else. Many of the ways that Michael characterized MS as "civial war" does not exist (at the management level) in MS -- nor at the rank-n-file level.

Tim: Brian, you've coem from the GPL side -- clearly, you've thought a lot about licenses and why. What are your thoughts?

Brian: Likes cycle slide -- research, gvmnt, industry, and users. But he thinks its bidirectional. It was important to us that lots of people use it, but to get people to contribute back -- to build community. Even though the obligation is not there (for the companies who use apache) to contribute back, people do. They understand. The creation of licenses and regimes that are bidirectional is what is missing from this debate. I think we'll see different level of input in shared source vs. open source. But then again, there are millions of MS developers out there, so we'll see.

Craig:I agree (bi directional). Giving code back is only one way of giving back. Giving taxes is another way (institutionalized).

Tim: How much does MS give in taxes? <laughter>

Craig: First three days of WinCE shared source -- 10,000 people downloaded the source tree. We had a commercial kit for those who wanted it for a year now, we sold about 400 of those. We're happy about this. We give back financially, we give back in the stds world (XML, etc.). We will continue to seek ways to give back.

Tim: Dave: you're closer to the hacker level at MS, your thoughts?

Dave: No real war in MS. We're trying to learn good things from OS. There's a lot of people who have payed attention to OS. Sharing source is a who different beast vs. sharing binaries (from a supprot perspective). We're trying ot internalize that now. People do ask for more and more access to source code; it's become more central to people who development on a dialy basis. So now we're starting to develop these shared licenses -- it's a response to user requests. The standard that he's working on CLI/CIL is the same spec that Miguel is working on w/ Mono. Shows how short and simple the license is. I'd like to hear feedback, actually.

Tim: don't wait. Panelists, jump in.

Michael: I'd like to move from the nuance to substance. The efforts sound good. The logging companies are really nice as long as to let them cut down trees. Oil companies are very env friendly as long as they can drill for oil. But what about patents that prevent interoperability? So the substantial important difference is whether it is acceptable to -- where it is convenient -- to allow small parts of access?

Tim: But if the customers like it, who cares?

Michael: It's unfair; like civil liberties and crights, diff between those who make the rules and those who live by the rules.

Tim: ...missed... (general devil's advocate, siding w/ MS)

Mi: everyone needs full access to everything -- everyone can drink from the same water fountain.

Mitchell: the equilibirum that we have in the software industry today is flawed. The choice that is missing is the choice of leadership. Data and workflow is controlled by one entitty. That is not healthy for society, and for development of software, and not for the future. There's lots of smart people at MS -- but they get filtered thought the business vision of one company. We don't get to choose that from the lots of good ideas. OS should promote free software, leadership, etc. Characterizing OS as bad for policy is not healty; let's not let it succeed.

Craig: There is nothing on our part to characterize OS as bad. <laughter> The ecosystem that we're working on is not just for cmputers. Other things as well. MS has very little sway with telecommunications, electronics, etc. We've had little success there. <applause and laughter> But think about long term ramifications -- evolution of computing and ramifications.

Clay: issues of src code alone is less important than was 5 years ago. Meta issue is interoperability, not just OS. I'm more conccerned w/ open interfaces vs. the source code behind it. To Craig/Dave: In the Hailstorm documents, it says that there wou.ld be a wa for linux/solaris for them to participate. Can I use a hailstorm schema to have a palm pilot contact a linux server w/o a MS component in the middle?

Dave: Interop is key. There are a number of industries that have not seen the light and use all MS software. <laughter> If the customers want it, we will make it possible. There is no question.

Audience: "Answer the question!"

Clay: This is not tru where in classes MS does not have a monopoly. I'm gonna re-ask the question and try to get a yes or no. --same question--

<applause>

Dave: So... <laughter^gt; Yes, but a caveat. As you know, in distrib systems, interesting things are done when parts you need are brought to the table. So pilot will want to authenticate and then talk to linux server.

Clay: Real question: is it a choice or a requirement? You're saying it's a choice.

Craig: It's historically been "the API". In a world we see coming, it's clear to us that you can't depend only within one machine. We don't believe convential stuff of RPC and whatnot. So in that world, protocols, schemas, and message packets are akin to API. MS has always published the API. The OS community has borrowed those APIs and made complete implementations. So when we publish the protocls and whatnot, anyone can do anything they want with them.

Tim: But we're worried about patents. Even though published, MS still has control. Even if not by knowledge/source/protocls, by law. Will there be patent protection?

Craig: We're a business. We license IP. If it turns out that this business says that we should license the patents, then we'll do that. But we are a business.

Tim: But Apple was a business when you copied their interfaces. <laughter>

Brian : Still an issue of centralization. E.g., DNS. There's root servers. It's now privatized. The fact is right now, this is a critical point in the infrastructure. And we're concerned about it. Similarly, we're concerned that MS will control hailstorm, etc., etc. Worried that same type of centralization will occur in .net services. What draws people to OS is going away from centralization.

Craig: 2 things to think about. Right now, we're saying "This is what we're going to do". We've advertised what we're going to do. Downside: magic carpet/AOL. What is that? Hence, it isn't clear to me that we are granted automatic franciase in this area.

Tim: come back to point of health of overall ecosystem. Big concern about MS that you see yourselves as a small player in a big world. You think that you don't have power. But you do. In the ecosystem, finite resources. Makes it hard for new entrants to compete. OS says "we want in -- we want to have a chance". You guys have been so successful that its hard for others to succeed. So is proprietary vs. open "what's good for MS is good for the industry?"

Dave: Is it hard to enter to the market 'cause the expectations of users have been raised?

Tim: My exp is that it's easy to enter, but then MS comes in and takes over that part of the market.

Dave: I've seen repeated failures inside MS. We are not automatically granted the franciase.

Mitchell: But MS has plenty of money and backing to fail. But MS has efficient system to take $$$ out of the system (giving away browser, making it part of the OS, etc.). So ability for MS to extract $$$ is dangerous, OS allows people to collabotare together and joint together to be a larger whole.... missed...

Dave: We recognize that we are in a possition in which we have a lot of resources, and people are sensitive. We have started to try to be very clear about what we want to do. We can to carve out safe places
-- a standard. We need to continue to develop ways to make businesses in free markets. To continue to exist, but to foster trust and inniation.

Tim: So is it fair analogy to MS is switching from hunter/gatherer to agrarian?

Dave: I'd like to think yes, but can't say for sure.

Craig: MS would be nothing if millions of people didn't write apps for Windoze. And OS by itself is nothing. <laughter> Need a symbotic platform between apps and OS. Otherwise, OS won't sustain itself. So in a way, we were already a farming econ. In a way, we needed them. So people have diversified in number and type of platforms -- lots of diversity -- there's no direct transference, I can personally speak for our lack of ability to other non-computer systems.

Tim: Is this why the GPL bothers you?

Craig: No. Because the GPL makes it's own closed community.

Tim: So oes MS. <laughter>

Cr: If the GPL wants to explain how to stand on their shoulders

Br: The GPL tells me under which terms I can use software (as a business). The WinCE license does not tell me anything as a businessman -- it says "contact us" for business terms. It's only for non-commercial stuff. Vast majority of people write software for some commercial purpose.

Cr: Just call us. We'll figure it out.

Br: But that's different.

Dave: Not only is there the noncommerical license, there is a community-based shared source license.

Ron: perspecive of a lawyer. The legal business has only recently entered this arena. Most of the commericla businesses have made licenses tailored to their best commercial business. This is not just an MS problem. Trouble w/ GPL to those doing commericla business is that expressed about 4 ways, none of which are very specific. Some are consistent w/ derrivitave copyright law, some are not. There is very little guidence from the courts as to definition of "derrivate work". There is no useful test to know. So there's a huge uncertainty w/ the GPL.

Br: Stallman is working on version 3. I'm sure that he'd welcome MS.

Crai: We've posted 20 questions on the web.

Michael: MS shared source, It has stipulations about IP. If I look at MS software, I'm infected. <applause> It's the same language in both licensews (MS and GPL).

Ron: It's context sensitive. There are different language and different contexts.

Tim: Wrap up for audience. There are similarities between both licenses. Boths strong IP licenses, and probably both have ambiguities. So lets not go down the legal hole, 'cause we don't really know. Let's take a questions.

FSF president Q: MS stated that GPL is unamerican cancer. Yet this ctry was founded on freedom. GPL is founded on freedom. This has inspired free software movement. We'd like to challenge Mundie/MS to a second debate w/ the authors of the GPL to debate the philsophy of the GPL.

..somedude...Q: To Craig: Ecosystems and choices. Then subject ofp atents came up and then dropped. There's little doubt that many patents are silly. But MS holds a lot of them. Do you think it's right when an OS infringes on a "silly" patent that they are persecured?

Cr: Absolutely.

Q: Even if it's a stupid patent? IT takes money to challenge a patent.

Cr: Fine. Get your money. Patents are one form of protection. Been debated intellectually, legally, etc. Should we have patents? Our society has said yes. Some are stupid. But they have legal weight. At the end of the day, if you have a ptent, you enforce it, it has value. So MS and others who have patents will decide to enforce them.

Br: Are there any for .net?

Cr: I expect that there will be.

Dave: I think that patents will benefit OS as a structural thing. You want large corps involved, ...

Michael: We want our turn. <laughter>

Dave: To keep clear relationships, we need clear language and protection and way of litegation. Patent is outside of source. You need to help us to let us know what to exist.

Craig: Look in academia. Lots of them have patents and made $$$ off them.

Q: Steve Balmer said Linux is the biggest danger. I've been member of MS community since DOS 3.x and MSDN when it first began. Been a member of OS since win95. I understand OS "open" and "free". We beleive in open dialgue and open markets (and free beer!). When MS recently started using "community". CIA is part of "intelligence community". What does MS mean by "community"? You said earlier that there was freedom in the MS community by those who disagree they can leave. What does MS mean by "community"?

Cr: MS has lots of communitys. Employees. Developers. Customers. My comment (BTW -- we've used "Community" for a long time) earlier was that we can address needs better this way.

Q: Software ecosystem. As Tim and Mitchel pointed out, in traditional market of cash cow, you have an overwhelming position. It's difficult to make an annual growth revenue of 10-20%. Exp growth has limits. Is it time for MS to declare 0 revenue growth.

Cr: I don't think so. We have shareholders. Our job is to provide return to shareholders.

Q: Do you also have responsibility to ecosystem?

Cr: Of course. We try to be good corporate citizen. We try to provide good infrastruct for this country. We try to step up and deal w/ corporate responsibilities.

Mark Brickell EFF Q: I'm just a sysadmin. <cheers and laughing> One of the issues that concerns me is that we're in a monolthic society. Oil, media, power, etc. Only thing that seems free anymore is software. In that battle, you community has taken one of us hostage -- Dimity. How does MS feel about enforcement of DMCA? Are you going to keep taking hostages for free expression and free speech?

Cr: I suggest you address your question to Adobe.

Q: Did MS lobby for the DMCA?

Cr: We talk to people to all the time. The DMCA is what it is. It's the law of the land. Go change it if you want.

Tim: Does MS like it?

Cr: There's some we like and some we don't, like you.

Tim: There's very little that I like about it. <laughter> It goes too far.

Q: Ecosystems. As a biologist, the width has to do with its health. Narrow ecosystems (monocultures), you have to do things to keep them viable. Pesticides, etc. We've got a forest of all the same type of tree -- one parasite can destroy it. What's the vision of MS and other firms -- how do you protect them from this one parasite?

Dave: It's imp to our business to have a healthy eco.

Michael: Buy RH Linux.

Q: I interest of brokening peace between parties. michael said, "MS shouldn't be winner take all". But GPL is a "winner take all" strategy. Stallman says "eliminate competition". But GPL has potential to destroy (weaken?) ecosystem by creating a monocoluture. Why not offer something to both camps rather than either extreme?

Tim: This is a loaded hot potato. I think that the univ licenses are the best balance between freedom and making $$$. At the same time, I respect and support the right of MS to put out and make $$$ under thier liencese, and I respect GPL. It's "what works for you". If their customers don't like it, there will be choices. We are entering time of more choice -- because of new technology and OS. We can make the future what we want it to be.

Tim: We're out of time. If you want to ask Craig more questions, go to the Oct free software event. Thanks.

July 28, 2001

Open Source Convention summary

These are my thoughts -- not heavily edited, nor intended for publication in a "real" outlet (this is a pretty standard blog disclaimer). Heck, there's probably still lots of typos and grammar mistakes -- I read it once after I wrote it and made a bunch of changes, but I'm sure it's not perfect, nor would I expect them to hold up to rigorous scrunity. But they're a summary of my thoughts, and perhaps you'll enjoy them, and/or have something to think about after reading them.

The conference was in San Diego, a city to which I've never been before. The airport is actually in the city -- it's weird to fly lower than some of the buildings while coming in for a landing. The meetings and whatnot were spread across the two buildings of the Sheraton right near the airport. Walking between the two buildings was a nice five minute walk along a marina. Great weather provided some nice scenery between sessions/meetings.

There's great wireless connectivity in just about all the sessions. All the keynotes and breakout meetings had good wireless coverage -- even walking between rooms worked nicely. It's been really handy to have internet connectivity during some of the talks. Indeed, I transcribed most of the open source debate w/ MS's Craig Mundie and immediately mailed it off to a few friends and internal mailing lists when it was done. I even managed to stay not too behind on my e-mail -- lots of LAM mail that I didn't even get to this week (including some from Brian; oops :-\ ), but I did manage to handle a bunch of other stuff.

I have to admit that I found this conference more interesting than I thought I would -- I've actually gotten something out of this conference other than "open source is great, we should all use it." If nothing else, it was refreshing to see the enthusiasm of all the young (and not so young!) coding punks out there, and talk with them about what they were doing. It was also good to see that there are some smart, Important People wondering about the future of open source, and how to keep it alive.

That being said, there was also a bunch of the predictable "Microsoft should die" kinds of things, as well as "all patents are bad; they should be abolished" and "all software should be free". While I personally have nothing against these kinds of zealots, and, indeed, I may not totally believe their positions (and sometimes not believe them at all), I do understand where they are coming from and am at least somewhat sympathetic to them.

However, (IMHO) such zealots tend to ignore certain realities in their ideological zest. Someone recently said, "Some things just are. The speed of light for example, while I personally think it is too slow, is unchangeable." Ok, so there are few things that are as totally unchangeable as the speed of light. An example: getting congress to abolish patents is effectively trying to change the speed of light; there's no money in the open source movement to do so --
more importantly, there's too much money in the opposition that supports patents such that congress would laugh at the very idea of abolishing patents. Heck, they wouldn't even laugh, they'd totally ignore the prospect. It's an impossible task.

I am not such a pessimist that I believe this for all aspects of things that I personally believe to be undesirable, but I do certainly believe that one needs to pick and choose their battles. Abolishing patents is not a battle that we should pick (nor do I really believe in that one -- I'm just continuing my above example).

Does this make me a cynic? I prefer the term "realist".

I do believe that there is much that we can do. I believe that the work that we do (writing free software) is important, and that we can accomplish some really Great Stuff and help people in the mean time. Especially as an academic, we can produce Quality free software through the open source paradigm and still get funded (as is all-important in academia) while still in the university spirit of open intellectual research. We do what we do because a) we believe in it, and b) we enjoy it. That isn't going to change for the foreseeable future.

I arrived here Tuesday night. I registered at the conference and went to the Sun Grid Engine BOF (birds of a feather session --
typically an informal meeting to discuss a given topic). It was pretty much an ad for Sun's product, so I left and went to the CVS BOF. I specifically went to that BOF because we use CVS every day, it's a great tool. That being said, it sucks. There are some well-known problems with CVS that can be really annoying. CVS is kind of a "best of the worst" that is sometimes a common label among free software. :-)

My purpose in going to the CVS BOF was that I wanted to see what the future of CVS was going to be. Indeed, Brian Behlendorf from Collab.net was there, as well as Derek ?Atkins? (the current man-in-charge of CVS, so to speak). My take from the BOF was that CVS is loosely supported by Collab.net and Derek, and there are a few new features that are planned / halfway written. Over time, these features will be added to the mainline CVS distribution. However, they really want others to help contribute to CVS; they're not spending huge amounts of money and/or time on developing CVS. A bunch of the Bad Things that we don't like about CVS won't really be fixed, both for the reason cited above (Collab.net not spending a lot of time/money on it) and because CVS is somewhat architecturally limited such that fixing them would entail a complete rewrite.

Loosely speaking, the future of CVS is SubVersion (I didn't previously realize that the SubVersion people were the same as the CVS people). SubVersion will basically do everything that CVS does and fix all the known Bad Things about CVS by starting over from scratch. Apparently, SubVersion will soon hit a milestone where they start using SubVersion for their own version control -- this should happen by the end of the year. At that point, SV will be usable, but probably not very feature-rich, and not have any GUIs, etc. They expect to have a full, feature-rich SV in about 2 years.

Should be interesting. I probably won't play around with SV now, but maybe in a few months, when it gets comfortably past the "usable" stage.

Then I met Rich for dinner. It was great; I hadn't seen Rich in probably at least a year. We went to an Irish pub, had a feast of food, and talked about All Things Geek. Then we went to a Hookah Bar where we ran into two of Rich's friends. Rich rented a water pipe with some apple-flavored tobacco. It was interesting. I can now say that I have smoked a water pipe -- check that off my "life experiences" list. :-)

The next morning, I went to the two keynotes. The first was from Fred Baker, entitled "Will the Next Internet Generation Still Depend on Open Source?" It was somewhat doomy and gloomy, full of caution and worry. While I did not agree with all of his points, he did make some very good ones (that others in the audience certainly did not agree with). A good example can be summarized as, "When will my mother use Linux?" This is very true. Sun's usability report on Gnome that was published a week or two ago showed that even though things like Linux are pointy and clicky, they are still heavily geared towards geeks and not the general public. This is a problem. It certainly can be fixed, but it needs to be recognized first -- the open source movement has to "grow up" and recognize that things are what they are, and to get widespread adoption, we have to cater to the public, not the other way around. Very true. Another example that he cited is that customers want software that is stable. Features are great, but stability is necessary if you want to use software with a business. This is something that I think open source programmers are starting to realize ("stable" and "unstable" trees are becoming more common).

The next keynote speaker was much more fun -- he was W. Philip Moore from Morgan, Stan, Dean, Witter, and gave a speech entitled "An Open Source Success Story on Wall Street". He is a programmer for MSDW, and uses open source products all the time. His main point was that OS is great, and Big Companies like MSDW love OS. They even fund OS projects. An example (that I'm forgetting some of the details on) is that they needed some extra features in MySQL for their particular. So they paid MySQL a few hundred thousand dollars to do it. MySQL and the MySQL community wins, and MSDW wins (a few hundred thousand dollars is chump change to MSDW). Yay for everyone! :-)

Another of his points was that vendor support can suck. Just because you're paying someone support contracts doesn't mean that the support that you'll get is any good. Managers think that paying for support gives them a warm fuzzy fallback when things go wrong, but the reality of the situation is that this is not always so. The OS communities (in his experience) tend to give better support, and fix bugs faster than vendors. He was extremely happy with Perl and the Perl community over the years. Indeed, MSDW now uses large numbers of Perl scripts to run their enterprise.

So these were refreshing words to hear. And it brought to light a previously unknown open source champion -- the major corporations who don't necessary write open source software, but use open source software. Perhaps these companies may be able to be persuaded to take up the banner, so to speak. If open source helps their bottom line, what would happen if it all went away? (this is a much longer conversation, but it's an interesting proposition)

After that, I went to the O'Reilly Summit on Open Source Strategies (a track within the overall conference). Tim O'Reilly himself spoke, as well as some other well-known folks. It was fairly predictable stuff, though. The big question was how to make money off open source and/or free software. Selling support has frequently shown to be not enough (although not in all cases). Companies like RedHat are frequently held up as if to say, "See? Open Source and free software can make money!". When actually such companies are (at least today) more the exception than the norm.

Making money off open source / free software is a problematic issue. Starting from scratch with a new product that is both open source and free is a difficult position to make money. The ones who are making money, for the most part, were either established companies before they went open source (IBM, Sun, HP, etc.), or were deeply established products before they went corporate (Berkeley DB, Sendmail, etc.).

I don't know the answer to these kinds of questions; no one does yet.

I met some guys from the University of Arkansas at lunch; a sysadmin/instructor and two of his students. They were fun to talk to. After lunch, we all went to a talk from Dave from Microsoft about some of the infrastructure of .net and C#. Specifically, it was about CIL and CLI -- the meta language and run time environment for that meta language (I forget which is which) that will comprise the backbone of some of the .net stuff. It was actually fairly interesting; I think Jeremy/Todd/Ron/Jeremiah would have enjoyed the talk because it was about language design and compilers and the like. I got the impression that Dave was a manager, but still very much an engineer and coder, so he spoke the same language as most of the audience.

As with all things MS, it sounded great. It's totally vaporware at the moment (and potentially quite monopolistic), so I reserve judgment. But the technical side of it sounded pretty cool. Whether MS actually delivers or not is a different question. And more importantly, what is left out of what MS deliver? What were they not telling us? What's the catch? Historically, I have come to not trust Microsoft. It sounds great, but I'm not a believer. We'll see how it plays out.

After that was a talk from Miguel from Gnome about their implementation of the CIL/CLI/.net stuff -- Mono. He has a heavy accent (he's from South America... Brazil, IIRC?), and talks very fast. He's a funny guy, though, and pretty smart. His take on this stuff was that he thought it was very exciting, cool stuff. Their reason for doing it in Gnome is not to be compatible with MS (although that's a nice, desirable side effect) -- they're doing it because it actually provides them with good infrastructure for advancing Gnome. It gives cross platform, cross language connectivity, and a reliable and modular approach to the software engineering of a complex system.

Both talks were fairly interesting. It'll be interesting to see what Mono is not allowed to do -- I find it hard to believe that MS will allow them to be first class citizens in .net. We'll see how it plays out.

I went to the beginning of "The Challenge of Privacy and Security" back in the Summit part of the conference. I think the best quote was from a Ph.D. who is the head of a privacy watchdog, that went something along the lines of (ok, I'm totally paraphrasing. Cope), "However bad you thought it was in terms of privacy on the internet, it's actually 100 times worse."

She told horror stories of how companies are gathering and cross referencing enormous amounts of statistical data on web surfers, 99%
of it without the knowledge or consent of the user. Such information gathering is typically not done out of malice or a desire to control users, but to help their marketing -- companies don't think that they are doing anything wrong.

Another assertion that I have privately believed for a while is that initiatives like TrustE have failed. There have been well-documented cases of companies that were certified by TrustE (or one of the other companies like TrustE) who then reversed their position and started (for example) disseminating private information that they had collected on their users. The long and the short of it is that paying for a seal of approval is just that -- it doesn't necessarily mean that you conform to the guidelines of what is implied by that seal of approval.

Then I left and went to a talk from LLNL entitled "Steering Massively Parallel Simulations under Python". I went specifically because these are the guys who who pyMPI -- Python bindings for MPI. I talked to Pat, the main author for a little while afterwards. Their main purpose for doing it was to do rapid prototyping -- they could hack something up in python and play with various algorithms before handing it off to a computer scientist to code up in C for real production runs, etc. He said that coding up in python was much quicker for them than coding up in C, and so the net time saved was actually very large. Interesting perspective.

We also chatted about using pyMPI with LAM (it does compile and work with LAM, of course). He said that pyMPI was initially developed using MPICH, but then as more people wanted to use it, they added LAM to the "supported" list as well, and it was a good portability lesson for them. MPICH makes different assumptions than we do, so expanding their code to make it handle LAM as well was good for them (his words [paraphrased], not mine).

After that, I went to dinner in the big open-air tent between the two hotel buildings. I picked a random table and sat down with some people that were already sitting there. Most were business types of one flavor or another (programming consultants, independent small software companies, etc.), which provided some interesting conversation. The more interesting guy was Michael, who is finishing would could be best described as a "walkabout", in the true Australian sense of the word. He got married and three days later he and his wife started a would tour that went wherever their feet led them.

He was a programmer before he started his walkabout, and he and his wife are now winding down the grand tour, and thinking about stability somewhere with a real job, etc. So he decided to attend this conference to catch up on the current State of Things. Great quote from him about why it was time to end his walkabout, "We were in Nice a few weeks ago, and were thinking, '<Yawn> Another fucking beautiful cathedral. <yawn>' Yeah, that's a good sign that we're done."

He was interesting to talk to.

I went to the Slash BOF after that. More specifically, I was wandering by all the BOF rooms when I heard some voices that I recognized. It took a moment or two, but then I placed the voices: I had heard them in "Geeks from Space" installments on Slashdot. So I wandered in and sat down to listen to what they had to say.

Rob Malta and several of the other Slash and Slashdot crew were in there. They're an arrogant-yet-funny group, and have lots of inside jokes with each other. They reminded me of any good programming crew. They talked about some of the upcoming features in Slash (and therefore, someday in Slashdot). They also had a bunch to say about the optimizations that they code for (have to be able to handle an absurd number of page views every day), and the extensive infrastructure that they have behind the scenes to handle all the user traffic and thwart evil/asshole users. Pretty cool stuff, actually.

Andy came in really late that night; the plane that he was supposed to catch left 10 minutes early for no apparent reason. He and about 10 other passengers were left behind. So he had to catch a later flight. Weird.

We went to the Great Open Source Debate keynote the next morning
-- "Shared Source vs. Open Source: Debate and Panel Discussion" with Craig Mundie from Microsoft (MS), and Michael Tiemann, CTO of RedHat (RH).

I've already posted my version of the transcription of the speeches and debate. I should have done it immediately instead of waiting a day (indeed, I only sent it to a few friends and internal mailing lists), and I could have gotten slashdotted. ;-)

Andy and I chatted about the debate afterwards. He's my take, with some flavor/discussion from Andy as well.

Microsoft won. Craig Mundie calmly discussed his position, and was in an easily-defendable position. Michael Tiemann and [most of] the various people who asked questions from the audience came across as whiners, saying "It's not fair!"

Come on, folks -- Microsoft is in the business of making money. Why on earth would they change their model when their current model is working very well, and doesn't show many signs of abating? Their bottom line is to a) make money, b) increase shareholder value. That's it. Call it malicious, call it evil, call it whatever you want
-- it's business. Yes, they happen to be the biggest software company in the world and have tremendous influence, but (whether we like to admit it or not), they earned that position. Yes, with flawed and crappy software, but people bought it anyway. Regardless, that's not what we're debating here.

Don't get me wrong; I think that many open source products are vastly superior to MS products in many, many ways. But just because something is open source does not make it superior to proprietary/MS products. There's a lot of open source shit out there. Troll around on freshmeat, where every teenager who has every written a shell script has "published" their "software package". 98% of them are total crap and only work on the machine that the author wrote them on, or only work on Linux (which, to me, is useless).

There's a helluva lotta software engineering and design that has to go into a successful, Quality product. Just because you're open source and/or GPL doesn't mean that you are Better. So when I talk about technical superiority of open source products, I'm talking about the big Quality products, like Apache, MySQL, Postgres, PHP, etc., etc. Not every tiny little open source project out there.

</sidenote>

MS is successful for many reasons. But the fact is that they are very successful. They have an enormous percentage of market cap. Why on earth would they want to give up even 1% of that to open source? Of course they're going to fight. Of course they're going to doublespeak and claim that they are better (technical superiority). Of course they're going to say that GPL is anti-business. Of course they're not going to play nicely with others; that would be giving away market cap. Of course we're going to feel like the underdogs (which, in many ways, we are), and feel that this is not far. This is not under debate.

To defeat your enemy, you must first understand your enemy. So understanding MS's position is important. Call it greed, call it business, call it whatever you want -- they're making money and they're good at it. Understand that, and then calmly, rationally proceed. Bringing up religious arguments (software should be free, you guys suck, etc.) is not helpful, because like it or not, MS has the law of the land behind them right now. They are are in perfectly defendable position of saying "the law agrees with us" (I'm not talking about their potentially illegal monopolistic practices here --
I'm talking about their attitudes towards open source/free software and the fact that they don't feel that they need to be compatible with anyone else, and their "embrace and extend" attitude). You may not like that, but it's a fact (similar to RIAA issues with napster and whatnot). Like I said above, it is what it is. Go read The Prince.

The next step is to figure out what we're going to do about it. How do we a) make money as well, b) eventually displace Microsoft and/or force them to make higher quality products and place nicely with others? Reasoning with them won't work, because the almighty dollar is always a more persuasive argument -- trying to reason w/ MS saying, "hey, give us some market cap and then we'll think you're good guys" is simply not a compelling argument.

Indeed, some audience member asked Mundie to sit down with the GPL authors at some upcoming FSF/open source convention on Oct. 10th of this year -- the audience member said that if MS's beef was with the GPL, they should be debating with the GPL authors, not open source luminaries (they are different things, for those of you who don't know). Mundie replied, "Richard wouldn't join our discussion" (referring to the fact that MS tried to initiate license discussions previously). Hmm. That's quite damning, actually. I can imagine why they didn't (because it wouldn't have been constructive, odds stacked up on MS's side, etc.), but it comes across as "the GPL folks will only join the conversation when it suits them". Indeed, Mundie came to the open source convention where the odds where stacked up against him, didn't he?

Some have criticized Mundie for the following exchange during the debate:

Audience member: ... Do you think it's right when an OS infringes on a "silly" patent that they are persecuted?

Mundie: Absolutely.

Audience member: Even if it's a stupid patent? It takes money to fight patents.

Mundie: Fine. Get your money.

Mundie's position here is perfectly defendable.

Why should MS be punished when they have anted up the money to defend patents and whatnot?

The abided by due process; it's too bad that others don't have the money or resources to do the same, but it's not their problem.

They paid their dues, built up their company from scratch, and now have the resources and ability to do such things.

It's the law.

Is it annoying and contentious? Yes. Is it fair? No. Is Mundie right? Yes, he is. Democracy isn't fair. Neither is business. Capitalism does not equal freedom.

Our country is built on change. That change has always had rules associated with it about how to acquire that change. Right now, those rules involve a lot of money (lobbying in D.C., etc.). I don't necessarily like that fact, but try living in an oppressive third world country where people daily have to fight for food and then tell me that the US sucks.

Michael Tiemann came across as a whiner. "It's not fair", "We just want our turn", "Buy RH software". Shut up. That's not the point. Indeed, what would happen if RH was in MS's place? Would RH still be so altruistic? Who knows. The fact of the matter is that I wouldn't want RH in MS's spot -- I don't want any one company to be the top (this is one of my big beefs with MS, incidentally). I contend that any firm that enjoys a monopoly like MS currently does would act exactly the same way that MS is currently acting. There's a reason that unscrupulous bastards are king: Money talks, bullshit idealism walks. Specifically, I believe that RH would be just as bad as MS if they were King right now.

So I think Tiemann's approach was entirely wrong. He should have addressed the monopolistic practices of MS. He should have cited the illegal and underhanded activities (he sort of did -- his "be and not seem" remarks were right on the money, actually). Cite the concrete legal issues about what MS is doing wrong. Not whine about "it's not fair", and try to whip up religious fervor by drawing comparisons to Rosa Parks and free speech.

Some of the panel's most insightful comments came from Brian Behlendorf and Mitchell Baker. They came across as calm, insightful, and rational arguments. Much of the other stuff was fluff and religious fervor.

Perhaps the most constructive comments that I have heard so far in reaction to this debate were along the line of:

Just write excellent code. Keep writing it. Write the best damn code that you can. Even if technical superiority isn't a sufficient condition to win, it can certainly help.

If you don't know anything about the political process, let the appropriate people handle that. The politics of this beast are very complicated; let those who know how to play the game make the public movements. Stuff like "MS sucks!" doesn't help, and makes the rest of us look stupid.

There. That's my $0.02. Comments are welcome.

After the debate, Andy and I went down to registration to straighten some details out. Lo and behold we ran into Johnney (old timer in the LSC). Small world! We exchanged cell phone numbers and then ran off to the next session.

Andy and I watched a neat presentation from the Collab.net folks, and then a presentation from Sun's openoffice.org guy. Both were pretty good. The Sun guy had three important messages that he kept hammering throughout the speech (I hope I got these right!)

Open source is a lot of work

Open source is not free

Armadillos don't make good pets (arrgh... I have no idea what his third point was)

These are good and important messages. Sun spent millions of dollars converting the StarOffice tree to be open source (translating it to english, cleaning up the code, going through all the legal hassle of ensuring that they owned all of it, etc., etc.) which many people don't seem to realize. "Just publish the CVS tree!" isn't an easy thing to do.

Nothing ground breaking in these sessions, but they were good talks. We talked with the Collab.net guy after the sessions, and we might end up using some of their tools for our own work back at school.

Andy and I wandered to the open air tent for lunch. We grabbed some box lunches and were looking for a table to sit down at. We walked by a table where Tim O'Reilly was sitting, and I made the offhand remark, "Wanna have lunch with Tim O'Reilly?" We walked about 10 steps further and then Andy said, "Sure, why not. I didn't get to ask my question at the debate this morning."

So we sat down next to Tim. He was in the middle of a conversation with some other people, so we just sat and listened for a little while. Then I caught his eye and we introduced ourselves. We had a short chat about the role of universities in open source and the whole movement thing (Tim is a big believer in the participation of universities, incidentally). He had to run off to handle other things, but now I can say, "I had lunch with Tim O'Reilly".

We chatted with the other folks at the table too. It turns out that two of them were academics as well (Carnagie Mellon), and they had very similar views about software as we do, which is very rare in the academic community (i.e., software should "just work" and suck less, that it should work on all platforms not just linux, etc., etc.). I didn't catch their names (more specifically, I don't remember them), but I think they were the primary people behind the Festival open source speech synthesis software. There was another guy from Cisco at the table who was having animated discussions with them as well, so I think he was either interested in, or involved with the speech synthesis/telephony over IP kinds of things.

Michael, the interesting guy whom I met at dinner the previous evening, joined us at the table as well.

Andy and I eventually wandered outside to the grass by the marina to discuss "stuff", including the morning's debate and whatnot.

We next when to an OSX talk. I'm not a big Mac fan, so I didn't pay much attention, and instead took advantage of the internet connectivity to catch up on some important pending e-mails, etc.

Andy and I tried to go to some extreme programming sessions after that, but the sessions were already jammed and overflowing with people, so we went to the exhibit floor instead. We ran into Johnney again and generally wandered around the floor. I only got 3 t-shirts. There was another free t-shirt from O'Reilly, but you had to buy the "Learning Perl" book, which I didn't really want to do (and therefore the t-shirt wasn't really free, was it? Hmm...).

Nothing too earth-shattering on the floor, but I did chat with the president of SAGE (System Administrator's Guild), who's a sysadmin from the University of Wisconsin. He knows Curt (of course), and so we had a nice talk about Condor, LAM, and general sysadmin stuff (I amused him with one or two Army sysadmin stories from Atlanta). I might well sign up for SAGE; who knows how it might benefit me in the future (it's a tax break, too!).

He said that one of the things that they are working on is a standardized test for system administrators. I think that would be great. Indeed, when I first got to my army post in Atlanta, the whole network was a mess. And this was from a full-time sysadmin who they paid a good amount of money to. Here's one example: when my predecessor applied patches under Solaris, if he ran out of space in /var, he would NFS mount some other disk to finish applying the patches (if you don't know why that isn't a good idea, don't become a unix sysadmin).

We were a little late getting to a session back in the Summit track from Eric Schmitt (sp?), president (I think, or CTO, or CFO, or CIO, or...) of Google. He had some pretty interesting takes on open source and its ramifications. Well reasoned, well thought out, etc. A good speaker, and an intelligent man.

Fun facts about Google that he shared:

Google has 4 data centers. Each center has a huge number of computers (he said the numbers, I don't remember -- let's say 1,000 each). Each data center is hosted by a professional company, such as Exodus.

They stack their computers 40 or 50 in a rack, and have to put the racks as close together as possible "for speed of light reasons".

They use Pentium III/750 chips rather than Pentium IV chips because when they tried PIV chips in 40-50 computer racks, they would melt the ceiling tiles. And this is in a professional hosting site, where they have enormous air conditioners, etc. They had to bring in a cooling engineer to redesign the air flow, etc. So until the chips are cooler, they're limited to PIII/750.

I don't remember too many of his specific points (should have taken some notes...), but I do remember agreeing with most of them. We went up to chat with him afterwards. Andy asked him about the role of universities, etc., in open source. At some point I interjected saying something about how most academics only develop proof-of-concept quality code. He agreed, saying (paraphrased), "Most academics code in Java, C, or Perl for Linux."

"Oh no," I said, "We code for POSIX -- we think that our software should work under all flavors of unix."

He literally did a double take. "Really? That's very unusual in academia..."

So that's pretty cool. I made the president of Google do a double take. :-)

We were walking out of that session, on the way to meet Johnney for dinner when we heard someone say "cluster". We turned and it was a guy from Compaq. We butted into the conversation and learned about an interesting project at Compaq to make single system image systems for linux (also see here) -- but different than Scyld. Their idea is to take a cluster of machines and show the entire aggregate as one machine. i.e., all the processes, all the disks, all the RAM, etc. This is different than Scyld, who believe that the non-master nodes are just shells for computation. For example, if a process migrates around in one of these systems, the entire process migrates -- it doesn't leave behind proxies for system calls (sockets are still problematic, though). Indeed, we might have less trouble porting LAM/MPI to this kind of system than to Scyld systems (don't have the filesystem issues, for example). Could be interesting.

We joined Johnney, his cousin, and a friend of theirs who lives in San Diego for dinner at the Ruth Chris Steakhouse. Yum. Good food, good conversation. It was great to see Johnney again.

Andy and I went to an extreme programming BOF that night which was somewhat interesting. Much more interesting was a BOF entitled something like "When Politics and Open Source Mix". This was an extremely enlightening BOF.

It was hosted by some guy (I'm so horrible with names -- I wish that I could remember his) who used to be a white house staffer, then worked as a programmer for a defense contractor (Nichols Research -- a "beltway bandit"), and now does independent consulting. There were only a few random people at this BOF, which was a shame. It seems like there needs to be a lot of people involved.

His main point was that he has seen the political process from the inside. In general, it goes something like this:

Congressman Smith gets a complaint about X.

Congressman Smith knows nothing about X, so he turns to his staffer and says "Find out about X. Find out how we can a) resolve it, b) make me look good, and c) make our contributors happy" (not necessarily in that order)

So the staffer goes off and first calls all their contributors and says "How do you feel about X?"

The contributors weight in on how they feel about X.

If the staff is feeling adventurous, the staffer will try to find out more about X outside of just the contributors' opinions (but not always). This may involve, say, Google searches, some more phone calls, etc.

The staffer will take all these opinions, and reconcile them with the three goals set out by Smith (resolve the issue, make me look good, and make our contributors happy).

The staffer will present their report to the Congressman Smith, who will then act accordingly.

This is a big part of what happens -- making the contributors happy. For example, have you noticed how the Washington Senators and Congressmen are against the Microsoft lawsuit? Why? Because MS contributes lots and lots of money to them. It's all legal --
there's nothing nefarious happening here. We're not talking direct bribes (try not to be cynical), we're talking support for campaigns, creating jobs for constituents, etc. But the total dollar figures are staggering.
Money talks.

MS publicly admitted that they goofed w.r.t. the initial lawsuit back in 97 or 98 -- they had completely ignored D.C. until that point, thinking that it was irrelevant to what they were doing. They admitted their mistake, and have since rectified it. MS now spends enormous amounts of money every year lobbying in D.C. and elsewhere. Have you noticed how New Mexico just settled the lawsuit with MS? There were many reasons (new DA who inherited the case, etc.), but at least some of it was due to the fact that MS "made it easy" for them to settle by offering incentives and attractive terms to settle.

Our BOF leader's point: most congressmen and senators don't have an opinion about open source or the GPL right now. But soon enough, they will be forced to have one. Some court case will occur, or MS will address it with their own lobbying, or one of a hundred other things will happen that will force the issue. The question is: if MS is doing all the talking to the lawmakers, who is talking for open source?

It's an easy enough word association for the senator or the staffer -- "Hmm... open source software. 'Software' -- I guess I should call Microsoft to get their opinion." But who do they call for Open Source to get the other side of the story? This is the big question.

Do we want them to call the FSF? Talk to Stallman? Personally, I don't think so. Their religious fervor might well scare off the staffer and/or reinforce MS's position. Perhaps they should call RedHat. Or O'Reilly. But a) I highly doubt that they've heard of RH, and b) why on earth would they call a book publisher about a software issue?

The problem is that there is no one person to call. This is kind of the nature of open source, actually -- a distributed group of people working together for some common goals. But that nature makes us politically vulnerable. And it may become a problem in the future. It's depressing, but true.

There are other ways to influence the process, but it all comes down to money. And MS has a whole lot of it. One way or another, there needs to be some central group(s) organized in the open source movement to start handling the politics of it. One suggestion that was floated was to get the Big Companies who use open source involved (Sun, IBM, Compaq, HP, etc.). Indeed, IBM is going to dump $1B into Open Source this upcoming year. So likely the NY senators and congressmen will be on our side. But that isn't nearly enough. What about all the countless other companies out there that use open source? This sounds crass, but forget all those millions of teenagers and spare-time developers out there who are writing open source, what about the companies who are using open source to increase their bottom line and increase shareholder value? I would think that they would be pretty peeved if open source went away.

MS made the mistake of ignoring the politics a few years ago, and has since taken steps to rectify the situation. We need to do the same, lest we get the equivalent of an anti-trust/monopoly lawsuit against open source. Our BOF leader said that this was the first time he had spoken out about these kinds of topics; we encouraged him to spread the word since he was so well informed.

This was very enlightening to Andy and me. Depressing, but forewarned is forearmed. I don't see either Andy or me getting involved on this front, but we'll see how it plays out.

It was late after this (after 10pm), so we went back to our hotel after this. We went to the keynote the next morning, but didn't pay much attention (it was mainly three people who stood up and said, "we use open source in our company, and it's been mostly great, but we did have to work around some problems."), and just answered e-mail and the like. Then we got on planes and headed home.

All in all, like I said at the very top, it was an interesting conference. If nothing else, the enthusiasm for code was quite contagious. We didn't attend many of the "technical" sessions (there were oodles of meetings on MySQL, PHP, Perl, Python, and all kinds of other open source projects), but the buzz was everywhere.

July 31, 2001

I ate, drank, and slept Tap.

I found an old journal entry that I never finished... Since Telocity is having connectivity problems (first it was nd.edu this morning. Now Telocity. I was having shaky connections this morning, and now I can't connect to anything. Ugh.), I might as well finish this off even though the events are from about a week or two ago.

Random factoid that I learned today -- the first digit of IP addresses indicates whether the address was a class A, B, or C.

ND football lottery: Won Pitt, lost all others. Oh well; I'll still be a student at ND this upcoming fall, so I'll still get married student tix.

I'll see Rich tomorrow at the Open Source Conference.

I forgot to plug D's absurdly cool house-o-meter. http://www.kresge.com/hstat.html.
Tracy and I spent the day at lake w/ Janna and their new boat water skiing and tubing. Janna are good at skiing. I apparently had some spectacular "dismounts" from the tubing, including some uncontrolled cartwheels across the water. I somewhat hurt my left shoulder/bicep when I went over some huge bump (got airborne, too) on the tube. I also made the mistake of taking my shirt off to go swimming in the lake, and totally forgot to put sun tan lotion on my shoulders. Doh.

I fixed my car's stereo problem. The wiring was causing screeches -- whenever I went over a big bump, the exposed wires would brush up against the metal speaker casing, causing the screech. Funny, though, it didn't look like it had ever had insulation.

We got our tax rebate today -- I have to admit that I was a bit surprised. But hey -- cash is cash. (I've since deposited the check and it didn't bounce -- yay for the US government!)

Dave, did you mom change her phone number again?

I heard on the radio that England's Queen Mum's birthday is this Saturday.

She'll be 101.

While driving to Bloomies yesterday, I heard Kevin Bacon and his brother (Michael?) sing some songs from their current album. They're surprisingly good musicians. They had some funny stories about their pasts and Kevin's movie career, too.

Lummy has been very pleased with his Linksys router that he uses to gateway his DSL traffic to his FAN (family area network). It doesn't show up at all to an nmap scan, but he can directly ssh to one of his internal boxen. It seems to be a fairly powerful firewall, allowing a variety of configurations including IP forwarding to up to 10 ports. Check it out.
So I bought one myself; it should be here in about a week. They have models with embedded 4 and 8 port switches, but I already have an 8 port switch, so I just got the single port version of this router. It will certainly be easier than maintaining that dual-NIC box that I have right now, and be [hopefully] more secure, on multiple levels.

I've received the SirCam virus from the same idiot (someone at @optonline.com) about 10 times so far. I've replied each time saying, "Update your anti-virus!", and I have started CC'ing the postmaster. A bunch of my replies have come back "Over quota". Idiot.

There's still a number of bugs left in OSCAR 1.1; I don't think we're going to make a release by Wednesday. :-(

It looks like the router in the engineering building at Notre Dame is fried somehow (engr-e06.gw.nd.edu) -- no traffic is able to get through it. This means that I can't get to my e-mail.

August 5, 2001

Magna cum laude, summa cum laude, the radio's too laude

OSCAR 1.1 has been released. Woo hoo!

Here's an embarrassing note: during the OSCAR teleconference this past week, we were plagued with all kinds of audio troubles with Intel's teleconferencing system. People would drop in and out, echos would abound, etc. But we still managed to have a reasonable conference.

The conference is normally scheduled for an hour. At the end of the hour, though, we weren't quite done. An automated announcement said, "To extend your conference for 15 more minutes, hit *9." Everyone agreed that we should continue to finish up the pending details, so I hit *9.

A split second later, I realized that that was my fax machine --
it's programmed to pick up if you hit *9 (handy when you only have one telephone line; you can answer from any phone in the house and make the fax machine pick up if it's an incoming fax, not a person).

I had to race down the hall and rip out the phone cord from my fax machine, and then come back and tell everyone what happened. How embarrassing. :-)

We also released LAM 6.6b1 that includes, among other things, a first cut at Myrinet support. It lacks some optimizations (doesn't pin user memory that is already pinned, doesn't use shmem for communications on the same node), but those will likely have to wait until post-dissertation.

My Linksys router box came. I got it setup nicely, such that it does selective IP forwarding to my back-end boxen. I found a handy feature in OpenSSH that allows it to listen on multiple ports for incoming connection. i.e., I don't have to muck around and have two different OpenSSH servers running , each sitting on a different point
-- OpenSSH allows this behavior just by editing a single config file and listing multiple ports. How cool is that?

Why is this important? shh normally accepts connections on port 22. With my DSL connection, I only have one fixed IP address. But I have two unix machines on my backend LAN that are generally on 24/7. I would like the ability to sshdirectly to both of them from the greater internet. But there's only one port 22. So my linksys box forwards all incoming port 22 requests to one machine. But what about the other? This means that I have to pick some other port.

The bummer about my linksys router is that it will only IP forward on the same port -- so I can forward port addr1:port to addr2:port. I cannot forward addr1:port1 to addr2:port2. Bummer. So if I have two incoming ssh ports on my router, the second (non-standard) port has to be forwarded to the same port on a backend machine. This is where OpenSSH's feature comes in handy -- not only does it listen on port 22 for normal ssh connections (e.g., for connections from my internal LAN), it also listens on port N for connections from the greater internet. Cool!

I was short a cat 5 cable, though -- had to run out to Best Buy to get one.

ARRGGHH!!! It seems that I deleted all my pine mail for July 2001. How the heck did I do that? It must have been in the monthly archive on August 1. Doh. :-(

So I initiated a report with suggest@darwin.helios.nd.edu, and they actually restored the last backup (from July 31) within a few hours, and salvaged it all. Amazing.

Windoze sucks.

Oh, I'm sorry -- have I said that before?

I'll say it again: Windoze sucks.

The following describes a windoze "gotcha" that bit me on Friday. I know that most of the jjc readers have nothing to do with windoze; I describe it here mainly because a) I'll remember it this way, and b) you never know when you (jjc reader) may need to have a few 'doze sysadmin tricks up your sleeve.

My church just bought two new computers for staff members to replace some really aging computers (the old ones we so bad that they would swap almost continually, making any amount of work extremely hard to do). They were Gateway PIII 1GHZ machines (you really can't get much lower than that these days without going into Celeron country, which I highly recommend against!) with Windoze 2k. This now makes three w2k machines; the rest are all w98 and w95.

My church actually has a little LAN setup in their offices (I've described it in previous journal entries) with about a dozen machines on it. They do a few windoze shares to share some directories between machines for various databases and whatnot.

I had intended to spend 2-3 hours installing the two new computers, copying over the data from the old computers, installing the extra software that they needed, training the staff members in the differences with w2k, etc. I should have known better. <sigh>

Setting up the computers was easy enough; transferring the data, installing the extra software that they required, etc., wasn't too bad because the staff is actually fairly organized, and had all their data files in one place, etc. Yay for smart users! :-)

One weird thing, though, Printshop -- I think it was a fairly old version -- wouldn't work for regular users (i.e., not the "administrator") unless I installed it as the user. i.e., when I installed it as "administrator", it would give amorphous errors when a regular user tried to run it. I assume that this was because of permissions issues (I only had the CD case, not the original box, so I don't know it was supposed to support NT/2k or not -- I suspect not). Whatever. I temporarily bumped up the user's access level, installed the software, reduced the user's access level to its original state, and then all was well. <sigh>

But that wasn't too big of a deal; it only took an extra 15 minutes or so to figure out.

One of the two new machines was replacing a machine that previously shared one of its directories to the rest of the LAN. This is where my troubles really began.

There is no NT domain on the LAN -- those cost many thousands of
$$$! (before you scream "use Linux/Samba!", read the rest of this entry) So instead they just share a Windoze workgroup. It works well enough; we're talking about a staff that mainly does word processing, some spreadsheets, and a few databases --
nothing really fancy.

I setup the sharing on the w2k machine and then went to a w98 machine to try to mount the share (you know, check that it actually works. Sometimes this is a novel concept to IT support staff :-).

It asked for a password. WTF? It never required a password before (i.e., when the w98 box was the sharer). Not understanding why it was asking for a password, I tried a couple of obvious passwords that I thought it might be, all with no joy. Weird.

I went back and double checked all the sharing settings (permissions, etc.) on the w2k box, but everything looked fine. I went back and forth for quite a while, but could never get the w98 box to mount the share properly. Weird.

I called Johnny to see if he could help (it was about 6pm by this time). I described what I had done to him and he said that it sounded essentially correct. He was actually in a bookstore at that point, so he went over and pulled out a w2k book and looked it up, and indeed, I had everything setup the way that I should. Johnny had to run, so I continued on by myself. Unfortunately, this machine was a rather business-critical machine (more specifically, the share that it provides is rather business-critical), and I had to get it working. Bonk.

After much trial and error, I finally figured out what was going on:

The first important factor was that there is no NT domain. As such, there is no global authentication across all the machines. Indeed, there are only two accounts on each of the three w2k boxen: administrator and the user who sits down at that machine. This is an important fact.

w98 and w95 machines have no real concept of a user, so this had never mattered before. i.e., w9x sharer permissions are not based on the concept of a user.

When the w9x boxen tries to mount the share from the w2k box, it uses the username that the user "logged in" with (you know the "login" window that you can set w9x up with -- although you can hit ESC and skip it...). However, given that there is no global authentication on this LAN, that user will not exist on the sharing (w2k) machine.

In this situation, if the sharing machine is a w2k box, it will designate the share request as if it were coming from the "guest" account.

The "guest" account is initially set to "disabled" on w2k (which, although frustrating for me the other day, is actually a Good Idea). So I had to enable the guest account and assign a password to it. I then entered that password on the w98 box that was trying to mount the share, and it worked.

Woof. Stepping back, it all actually does make sense, but there were precious few clues for the uninitiated during the process to figure out what was going on. It would have helped immensely if the w98 box had shown the username that it was asking the password for. That would have tipped me off immediately. But it doesn't -- it just asks for a password.

Of course, there were at least two other alternatives that I could have done to solve this problem, but neither were attractive:

Setup a Linux box with samba as a primary NT domain controller, make all the windoze machines be clients in the domain, and then have all authentication centrally handled. The problem with this is that I'm not going to be in this parish forever, and I don't want to set them up with technology that they don't know how to maintain that they rely on for day-to-day business, and then leave them stranded when/if I move away from Louisville. Maybe someday, if it turns out that I'm going to be in Louisville for quite a long time. But not today.

I could have moved the share to a different machine (w98) and avoided all these problems, but a) someday all the machines in that office will be w2k and the problem will arise anyway, and b) there are actually political issues involved, so the share had to stay on that machine. :-)

So all in all, I'm not actually all that thrilled with the solution from a security standpoint. I had to enable the guest account to anonymously export the share. Granted, this is effectively no different than shares from a w9x box, so it is arguably no less secure that it was previously, but it still bugs me. And since there is no central authentication, I don't want to get in the business of maintaining separate accounts on all machines for every user -- that's an N^2 problem.

Grumble. Perhaps linux/samba is in their future someday, since there's no way that they could afford a real PDC license. Grumble.

Another question -- why does fast food taste so horrible when it's cold? That is, it tastes 37% worse than normal (non-fast food) does when it's cold. Why the disparity?

Does it always taste bad, but when it's hot, we're so concerned with not burning our mouths that it goes down so fast that we don't notice?

I tried switching to mozilla 0.9.3 'cause they claim it's more stable than netscape 4.77. Although it hasn't crashed on any of the things that mozilla has previously crashed on (SSL pages at USAA, LSC page, LAM/MPI page, etc.), it's still not 100% stable. I find myself switching back to Netscape periodically because mozilla won't load or render a page correctly. We'll see how it goes. I deleted oodles of cookies, and am denying cookies to advertising sites left and right, which is nice.

Here's a few bugs that I've noticed:

seems to have problems w/ typing in urls

doesn't bring up entering/exiting ssl notice until after page is
fully loaded/displayed

sometimes it stalls resolving IP names, even though "dig" on the
name responds immediately

It has a neat feature to block images from a given web server. My excite pages now have remarkably few advertising banners on them. Cool! (One has to be selective, though -- you can't just block all banners, because some of them actually come from servers were you do want to receive other images. Slashdot does this, for example).

Tracy and I watched the DVD of "The Emperor's New Groove" the other day. A fun little movie; it's all about Llamas. Being a Llama myself, I laughed a good deal at the llama jokes.

I got a new laptop, but then had to leave it at IU. Doh. :-( It seems that they hadn't gotten the paperwork for me to take the laptop home straightened out yet. Hopefully, it'll be worked out when I'm up there next week.

I configured it all up; it's great. 900Mhz PIII, 14.1 inch SXGA TFT (1400x1050 -- wow), 128MB RAM, 20GB. It came w/ 'doze ME, which I promptly erased (am I eligible for the MS refund?). I loaded up Mandrake 8.0 on it, as well as VMware, in which I loaded Windows 2k and Office 2k (bought at the IU bookstore for $5 and $10, respectively).

Some random things learned:

sendmail needs the hostname in /etc/hosts, and mandrake didn't put
it in there (weird)

I finally caught up w/ Eileen while I was at IU (she and I were both at ND in grad school together; I probably only met her a handful of times at ND, but we have a bunch of mutual friends and common history at ND). We had a great dinner and corresponding conversation. I helped her pick out a new Dell laptop for work (she's faculty at IU in the Latin Studies department), and inadvertently convinced her to buy a Palm m100.

Her new m100 arrived today (Friday), and she's been playing with it. I've already sent her the , and strongly recommended the parens calculator and DateBk4 apps.

August 12, 2001

Dave tells me that there's lots of company policies that apply just to me

Scenes from the upcoming blockbuster movie, "Pushing Bugs", a gripping story of the stress and tension in the lives of Bug Traffic Controllers trying to provide safe passage of insects across human automobile highways:

BTC:

"Junebug 357, you are clear to cross I-265 southbound lanes at mile marker 123.niner. Proceed immediately to cross at minimum safe altitude of niner feet to avoid approaching oncoming small Japanese vehicle, over."

I've gotten oodles of hits from Code Red on my web server -- 141 of the CRI variety, and 669 of the CRII-and-beyond variety. I think the funniest two CRII hits that I've gotten are from:

msgr-cs30.msgr.hotmail.com msgr-cs31.msgr.hotmail.com

I kid you not; this is straight from my web logs. I had read articles that MS had some of their hotmail servers infected, but to see proof of it right in my own web logs is just too darn funny. :-)

Saw Gone in 60 Seconds tonight on DVD. I thoroughly enjoyed it; it was much better than I thought it would be. I wouldn't call it the greatest movie of all time, but I would say that it is definitely worth seeing. Good action, good humor, good psychos, and good character development. I've always like Nicholas Cage. I was surprised to see Angelina Joline in it, too (took me a little while to recognize her -- her hair is bleached, much longer than Tomb Raider, and styled entirely differently, sorta dreadlocky).

I give it 30 minutes.

It amazes me how other kinds of scientists and engineers use clearly sub-optimal methods when it comes to computers. This is sounds like an unjust elitist view, but hear me out anyway.

In looking for a good demo application for my dissertation code, I heard about some DNA sequencing code that uses MPI in a manager/worker model. Sounds about perfect. I download the latest copy and had a look at it. One thing that struck me right away (and this is definitely elitist) is that you have to manually edit the Makefile. Yuk.

So there's no configure script; I can deal with that. They have "MPI defaults" for you in the Makefile, but they are a) based on MPICH (that's like rubbing a cat the wrong way :-), and b) using the most difficult method rather than just the mpicc wrapper scripts that MPICH (and LAM!) provides.

Sidenote: I've seen lots of MPI projects that do this... why do they do that?

And they make other MPICH assumptions -- such as assuming that MPI will give the same command line arguments to all executables, even if it's an MPMD model. That one threw me for a while -- I wasn't expecting that (nor did I know that MPICH did that).

The overall design of the program itself is actually pretty clever, yet complicated -- it seems to be clearly written by some scientists/engineers who want to "get it working", and you gotta respect that. But it shows some naivete in its overcomplication (IMHO, mind you) -- I think that the overall model could be less complex. For example, there's a fairly elaborate scheme in place to get the data back and forth from the manager to the workers. But it involves four different binaries (three required binaries, and an extra optional "monitor" process).

Even though the overall program seems to work ok, it's not optimal. Yes, you get speedup running on multiple processors, but the speedup is less than linear. And it seems fairly complicated for a manager/worker setup. There's a separate "foreman" executable, for example, that relays work between the manager and the workers. More than that, though, the foreman spins on non-blocking probes. This eats up CPU cycles like nobody's business.

All that being said, this is actually quite advanced for non computer scientists (and for many computer scientists, as well!). It's probably on the forefront of its field in DNA sequencing codes. Note, however, that I make these observations about many computing projects that I have seen -- not just the one DNA code that I cited above; the DNA code is just the most recent example.

So what does this tell us? I guess that that's part of our jobs as computer scientists -- to make tools that other scientists/engineers can use to build complex systems. Tools that suck less than most current tools. Indeed, Lummy had a good observation recently:

One interesting insight I gathered while at the Livermore meeting last month was along these lines. There is a real reluctance in the scientific computing community to use C++ --
especially advanced C++ -- in scientific codes. The reason is not that the people are dim or lazy. Rather, the intellectual capacity taken up by (advanced) C++ leaves room for very little else. These guys also have to be experts in numerical analysis and their application area. Also being an expert in C++ is not really feasible.

The tools that are available are generally powerful, but most of them suck for various reasons (a typical example that many readers can probably associate with is how MS Windows periodically "freezes", or dies the Blue Screen of Death). Indeed, using the current generation of tools to their fullest power requires significant expenditures in terms of time and learning -- something that most people just don't have the resources to spend. How to reconcile the use of building complex systems without either rolling individual solutions or spending huge amounts of time learning complex tools?

An idea that I have been telling fellow scientists and engineers about for quite some time is what I euphamistically call "C+". It's not C, and it's not C++ -- it's somewhere in the middle.

Most scientists/engineers know C but are scared of C++ for exactly the reasons that Lummy cited above. I'm a big fan of essentially writing C code, but using a small number of the advanced tools in C++ such as: std::string, passing by reference, the STL (maps, vectors, and lists), and basic object usage (to get guaranteed initialization/destruction). You don't need to go into full-blown object-oriented design or use all the whacky, bleeding edge features of C++; the simple tools that I listed above are extremely powerful and provide oodles more functionality than you get in vanilla C. They actually save time when programming, and allow for elegant solutions to programming problems. The learning curve on these tools is actually quite low. These are good examples of software that suck a good deal less than most other tools.

As many have noted, the current state of technology in software is really in its infancy. Consider what used to run your computer 5 and 10 years ago. It was vastly different -- there are a few essential concepts that have remained constant, but software itself has changed dramatically over the last 10 years. What will it do in the next 10, 20, 50 years?

Indeed, computers are made for the kind of rote, menial tasks that software tools are supposed to provide for us. So why do I have to spend so much time writing configure.ac for my portable unix program? Why do I have to spend so much time making an iron-clad robust build system for my portable software? Why do I have to spend so much time configuring my computer before it's safe to be put on a network? All of these kinds of things should be able to be automated for me -- it's the software that needs to be able to handle these things.

So that's what I see my job as: to make software that sucks less. It's challenging and exciting -- to be able to give someone power to do things that they have never before been able to do. To actually be able to increase productivity of others just by providing competent tools; that's neat stuff.

----

Ok, I'm a geek. But I've always admitted it. Hell, I love being a geek. But even more than that -- I don't just thoroughly enjoy it being a geek, I revel in it. :-)

August 16, 2001

Everything's kinda backed up in le-kitchen, Dave

I still didn't get to take my new laptop home from IU; paperwork is still progressing. :-(

Some quickies:

I love my DSL. I streamed some MP3s to my new laptop speakers in Bloomies from my home server.

I love the high speed ramp from I64 to I264 in Looieville. I don't even have to slow down from 75mph going from one highway to the next.

Tucson plows on. I've been doing some interesting development with it over the past 2 weeks. More details pending.

Here's an odd fact that surprised me: I actually have 13 Depeche Mode CDs (although 2 of those 13 are the second CD in a 2 disc set, and 1 is a single CD). This makes DM the artist that I have the most CDs of, by far. The next closest that I have is Peter Gabriel with 6, and Sarah McLachlan, Tori Amos, and U2 with 5 (I think U2's got a new one out... hmmm...). Why the heck do I have so much SM and TA? However, I have a whopping 47 "various" compilation and soundtrack records; I think that that is what I have been spending the majority of my CD money in recent times.

I hear through the grapevine that Louii is getting married; the rumor is that she is marrying someone she met off the internet.

I converted my coderedwarn.pl script to send the vigilante script. How funny is that? I disabled it after a little while, just because it seems like a risky thing to do. But it's funny as hell...

I discovered by accident that pine has a threading mechanism. Who knew? Certainly not me. It's not quite nice (although perhaps this is just the nature of threaded indexes) in that a new message may appear anywhere in your index, possibly even in a previous screen. So you might not see new messages if they get threaded above the current screenfull of messages in your index. Additionally, there's no indenting (I think I've seen mutt and wanderlust do indenting), so other than looking at the messages above/below the current message, there's no good way to know where in the thread the message is.

Much consultation on the organization and design of OSCAR 2.0. We finally decided to have binaries not reside in CVS, but reside somewhere via HTTP instead (OSCAR's sourceforge page, for the moment). Stealing an idea from one of the other OSCAR developers, I setup an automake configuration that will do a wget for any files that you are missing after you do a cvs update. Nifty. So we still have version control -- sort of. The Makefile.am's are version controlled under CVS, and they list the binary files that were associated with each version. Not perfect, but it's better than nothing. And it's better than putting binary files in CVS.

August 18, 2001

Jeff's Journal

I still get oodles of phone calls for a Mr. Rogers. I think he must have owned the phone number before us. Very annoying.

If you ever see him, please tell him to stop having people call me.

They tell me that I'll be able to take home my new laptop and monitor from IU this week. Woo hoo!!

This is a good birthday present for me. :-)

To end further controversy in the US and abroad, I have decided to donate some of my stem cells for scientific research. All those scientists are saying that they need infant stem cells; people are constantly telling me that I have the mentality of an infant. Hence, I've must have what those scientists need.

Brian has none of his MP3s down at IU yet, so he's streaming from my DSL. Too funny! I love my DSL. :-)

I found and old book of "Lord of the Rings" recently. I couldn't find any copies of any of the other books, and I didn't really want to read that old book because it's kinda fragile. So I bought the entire Hobbit/Lord of the Rings series yesterday, and started reading The Hobbit. I remember the basic story, but it's been years since I've read these books.

I only realized after the fact that the movies are coming out in the not-too-distant future.

It turns out that my long-held suspicions were correct -- the dual NIC/IP forwarding setup on my former router linux box was killing the network periodically. Periodically, when I flooded too much traffic through the machine, it would just freeze up and be completely dead. This usually happened when, for example, I would buy a new CD, rip it into MP3s, and scp them to my server.

If I scp'ed 2-3 songs at a time, with a spacing of at least 15-30 minutes between each scp, I was fine. Anything more than that would risk a freezeup.

Well, now my server is no longer my router (got my cool linksys router for that now), my server only has one NIC setup and functioning. I just bought 3 CDs yesterday and ripped them to MP3s. I tried an experiment and scp'ed them to my server all at once --
worked without a problem. So I don't know exactly what it was in the dual NIC/IP forwarding that cause it to barf (heck, it has a pretty old kernel -- 2.2.something), so it's quite possible that whatever the problem is has been fixed/replaced in the 2.4 series (they re-wrote much of that stuff in 2.4.x).

Here's the CDs that I bought:

The Crystal Method, The Crystal Method: Pretty cool stuff. I had several Crystal Method songs on various mix CDs that I already owned, and bought this CD on the fact that I liked those songs. Their songs are a pretty good, not total spazoid, but have good energy and techno beat. If I had to give it a genre name, I'd have to call it "mellow techno". Cool stuff. It's good coding music.

Fatboy Slim, A Break from the Norm: A little disappointing. It's samples from older songs, and generally slower stuff. Not really what I was expecting, but I guess it's still ok. It won't make it onto my high-rotation list, but I'll probably still listen to it now and again.

U2: All that You Can't Leave Behind: U2 is just getting more and more pop-ish. It's ok, but it won't get played much.

A friend of mine recently unexpectedly became unemployed. Please keep this person in your thoughts and prayers.

August 25, 2001

Chock Full of Notes

Some longer ones, some shorter ones, and some ones in the middle.

I saw Planet of the Apes tonight. Good flick, a little lacking in character development. Only thing in common with the original was the fact that it was a planet rules by apes and gorillas; most of the story was different. They didn't even make much of the class difference between the warrior class (gorillas) and the ruling class (apes); it was a much stronger theme in the original movies. I give it 10 minutes.

I also saw a preview of The Lord of the Rings. It's a three year project! They're doing all three books -- Lord of the Rings, The Two Towers, and the Return of the King. LOR is scheduled to be out this December, TTT is supposed to be out December of 2002, and RotK is supposed to be out December of 2003.

Random note: the subtitle for The Hobbit is There and Back Again.

I got my PalmGlove today. I'm a little disappointed in that it has a snap cover instead of a velcro cover. It's also not quite as sleek as I thought it would be; I probably makes the pilot about twice as thick. But it's padded, and it should protect the pilot (which, as has been pointed out to me by several people, is something that I need). We'll see how it goes.

I had a busy 2 days at IU earlier this week. But I was able to take laptop home. Woo hoo!

On Monday, I had dinner w/ everyone, including Katie and Todd's friend, at a local brew pub. It was good company and conversation --
much fun was had by all.

We're setting up milliways to be our main server at IU. We came to good agreements w/ Rob in the IU/CS admin department about what they will do for us and what we will do for ourselves. It's very helpful to be able to speak sysadmin; we should be able to get along well with those guys.

We have to have a good portion of our web pages (read: everything except lam-mpi.org) down on milliways before the IPCRES rollout ceremony next week. Will likely have IMAP, SMTP, CVS, and other things setup by then as well.

I saw a roadmap presentation from Compaq about the alphas while I was there, mostly dealing with the future of alpha as it is intertwined with the itanium. It was NDA, so I can't say much, but it was interesting. We'll see what happens.

I bought an "A" parking permit for IU. Not as expensive as I thought, but it still wasn't cheap. And I'm not guaranteed a parking spot, either. Not like ND (plenty of parking, just perhaps farther away). City campuses are different, I guess.

Brian and Jeremiah labored to bring down, convert, and generally update the web pages from nd.edu to iu.edu. This was especially painful because some of the web pages are stored in the CVSROOTs of actual projects, not directly in the CVSROOT of the web pages. So we ended up bringing the entire CVSROOT down from nd.edu as well (which is probably a good thing).

I'm going to spend more time today fighting mailman and mhonarc setups on milliways. We're going to CVS our local setup so that things should be a bit more centralized this time.

Here's some quickies:

TNT is running a witchblade marathon tomorrow -- all 12 episodes, back to back. I missed the first 15 minutes of the first episode or so; I'll probably catch them then.

I committed a patch to xiph.org today. It's my first in a long time. It allows us Sun users to compile from CVS without being forced to use gcc/gmake. It was an automake thing, actually.

I'm working on the OSCAR 1.1 press release. It's like herding cats.

The power management settings still aren't right on my laptop; it doesn't quite suspend / restore properly yet. I finally found the script where all the magic happens; I may dwiddle around with it and figure it out over time.

I finally got and submitted paperwork to get paid for my Army time in December of 1999.

My dad noted that on my birthday a few days ago, this is a brief span of time where he is exactly (well, close enough) twice as old as I am: 30 and 60.

Ookiness with "#! /usr/bin/env python" and "#! /usr/bin/python" on RH 7.1 (i.e., milliways). It turns out that the former seg faults sometimes (took quite a while to figure that out). WTF?!?!?

JeffJournal has moved to lists.squyres.com in preparation for moving everything out of nd.edu.

Perk mailed asking about setting up his own DSL CAN. I think I sent him more information than he expected. :-)

Saw the DVD version of Apollo 13 last night w/ Tracy (she gave it to me for my birthday). It's a great "engineers rock!" film.

squyres.com continues to get hit by Code Red attempts. <sigh>

Don found a neat feature of OpenSSH that's quite helpful for squyres.com's setup. In the "Host" section of $HOME/.ssh/config you can not only list a different username, but a non-standard port as well. This even applies to scp; how cool is that? So I can add the following to my config file:

Host MY.SERVER.NAME Port 2222

gcc 3.0.1 came out this week. I can get it to build and install, but anything I compile with g++ seems to complain about missing symbols, and I can't quite figure out the problem. Oh well.

August 29, 2001

It's like a pastel black

Janna came over for dinner last weekend; that was fun.

Annoying Mozilla bug: accidentally added a server to the "block images from this server" list (that option is right below "view this image" on the right mouse click popup). Oops... didn't meant to do that. Let's go remove it. Took a little looking around to find the menu to remove servers from the "block images from this server" list, but I finally found it. I found the server in the list, selected it, and clicked on "Remove server". The server disappeared from the list, so I "Ok"ed out of the window and reloaded the web page. The image was still gone. Hmm. Go back to the "block images from this server" window -- yep, the server was still listed there. Repeat the whole procedure. Nope, the server is still on the list. Quit Mozilla and repeat the whole procedure. Nope, the server is still on the list. This is clearly a bug.

Luckily, this list is in a text file that I could go edit to remove the server from the list. But that was annoying. It's also fairly inconsistent with its behavior of various MIME types. You enter in an action and it doesn't seem to stick. I'm not gonna go into detail, but suffice it to say that it's fairly annoying. I find myself gravitating back to Netscape 4.77.

Found a bug in LAM's configure script that showed up when using autoconf 2.52 and the --without-romio switch. Suffice it to say that it was a combination of autoconf being "too smart" and us having an early test in the script inside of an "if" block. autoconf 2.52 moved up a bunch of setup kinds of tests to be inside the if block. So if you did --without-romio, all these setup tests would get skipped, and Much Badness would occur from there.

Took a while to figure that out, though...

The Pervasive Technology Labs rollout ceremony this past Tuesday seemed to go well. Lots of important people were there, including press, etc. I met and chatted with the folks from the other two labs to find out what they were about. Sounds like they are doing some interesting things; we'll probably be collaborating with them in the future. Should be interesting and fun.

We spent a good deal of time the week before moving a lot of stuff down from nd.edu to iu.edu. Our new home is http://www.osl.iu.edu -- the Open Systems Lab. It was actually fairly complicated and took a fair amount of coordination. Almost everything has been moved, with two notable exceptions:

It seems that python is unhappy on milliways (our main server). Rob thinks it may have something to do with the fact that milliways is a linux SMP box. The following program sometimes core dumps:

#! /usr/bin/env python print 'Hello, world'

It even [sometimes] core dumps if you use /usr/bin/python instead of the env stuff. Weird.

lam-mpi.org hasn't been moved down yet. Since that's not obviously in nd.edu, we let that one stay on ND servers for the time being, and concentrated on moving everything else before the rollout ceremony this past Tuesday.

During the last two weeks, I've gotten really stalled in my drive across Louisville to get to Bloomies. Perhaps it's because school has started again...? For example, this past week, it took me about 45 minutes to cross Louisville, where it usually only takes about 10. It's all highway, but the traffic has been slowed down to stop-and-go. Might have to leave a little earlier next time to see if that helps any.

George from UITS suggested an alternate route from Bloomies to I64. It takes about the same time as taking IN 46 to I64 (about an hour), but exits on I64 about 20 miles south of where IN 46 meets I64. So it's a net savings in distance. Yay George!

September 2, 2001

Dave, what does "sagacious" mean?

Turning off L2-cache is of course an effective way of throttling performance, if that was what you were thinking of.

I have found a potentially security issue with the US Postal Service. It involves the asynchronous delivery of addressed messages.

Let me start by saying that I love the US Postal Service. The provide a great service that works pretty well. Their operating procedures are quite sound; I think they are modeled off UDP.

Let me explain: you drop off a message at any random point in the network, and as long as it's properly addressed and paid for, the US Postal Service will do a pretty darn good job of getting your message to the destination. If the USPS can't get the message delivered properly, it will be returned to you. Failing that, go to the bitbucket for a year or so (the dead letter post office). This is very much like UDP (although dropped packets certainly don't stick around for a year in any present implementation -- I think the US Postal Service just has chosen to have a Very High Quality Implementation of UDP).

I put outgoing mail in my mailbox at home all the time, and put the flag up to indicate that there is mail to be delivered. The mailman comes along at some point in the day (admittedly, the exact time of which has proven to be fairly random -- they probably use Microsoft schedulers). He takes my outgoing mail and inserts it into UDP^H^H^Hpostal network. My mailbox works as a termination point as well -- the mailman put messages addressed to me in it.

Overall, the system works pretty well.

Until now.

This morning, I put a bunch of outgoing messages in my mailbox, and raised the semaphore ("put the flag up"). Later this afternoon, I did a sema_trywait() and noticed that I had, indeed, failed to decrement the semaphore (because the mailman already did), so my new messages must have arrived.

I went out to my mailbox and dequeued my messages. I was flipping through then when to my surprise I discovered that the last message in the pile was actually one of my outgoing letters! Clearly, I had caused a buffer overflow in the mailman, and as a safeguard he just transferred the outgoing message back over to the incoming queue so that it wouldn't be lost. So you gotta admire that -- even in a catastrophic failure, no data was lost. Pretty cool.

But it causes me to wonder -- could I execute arbitrary code on the mailman? Don't be a pervert; just think -- what if I could write a 1-3 code snipit that would allow me to view other users' mail? The potential damage could be quite severe.

September 8, 2001

Jeff's Journal

Saturday

We flew out of Louisville at oh-dark-hundred. Somehow we got bumped up to 1st class on our flight to St. Louis, which was nice. Too bad it was only the flight to St. Louis. The flight to San Francisco was uneventful enough -- it was on a 757. I don't think that I had ever been on a 757 before; it had a surprising amount of space and leg room. So I'd actually have to say that it was the most comfortable continental coach flight that I've ever had -- I actually sat in the middle seat and had plenty of room. Tracy sat in the window seat and was amazed by the Rockies and whatnot (she's never been further west than Iowa).

The movie on the flight was Dr. Doolittle 2 (Eddie Murphy), which I found to be quite amusing. It was a cute little movie, and I had a several big laughs, so I recommend it -- I'll give it 7.5 minutes. I finished The Fellowship of the Ring on the plane. Although I did bring the next book in the series, The Two Towers, with me this week (always pre-plan your reading schedule when traveling!), there wasn't much time left in the air after I had finished the Ring.
On a whim, we upgraded our rental car to a Volvo S80. Pretty nice car, actually -- lots more buttons and toys than our Honda Civics. It had a Magellan GPS device thingy built in as well, and it worked surprisingly well. We punched in Darrell's address and it started giving us directions (audio and visual, on its little monitor screen
-- "Turn left in 2 miles", "Turn left in 1 mile", "Be ready to turn left", "Turn left now"). It was surprisingly accurate. Halfway to Darrell's, we decided to get some lunch, so we got off US 101 at Palo Alto and looked up "restaurants" on the GPS device and found a nice pub-like place. Then we punched Darrell's address again, got directions back to the highway, and it literally took us to Darrell's front door. Pretty cool.

We hung out with Darrell and Dian all day, had some wine, Darrell made chicken cordon bleu for dinner which was great (he insisted that it sucked). But it was quite yummy. It was a lot of fun hanging out with D&D all day.

Sunday

Did more hanging out with Darrell and Dian (haven't seen them in so long...). Lots of good conversation, laughs, etc. Darrell took us to see the new Yahoo! complex... sorry, campus... which was very cool. They just moved into this campus about 4-5 months ago. Their corporate setup is extraordinary. Everyone has their own cubicle (which is not extraordinary), but there are millions of little random conference rooms, each of which has a full computerized A/V setup including a teleconference phone, white board, conference room and comfy chairs, etc., etc. Lots of little break areas (coffee and sodas and whatnot are free) scattered around the buildings (there 4-5 buildings in this campus, BTW). They even have lots of Yahoo!-style furniture, meaning that it's purple and yellow, somewhat ornate or fairly modern-looking in design, yet pretty comfortable to sit in.

There's a building that has an enormous fitness/health center, a huge restaurant, complete with outdoor grills and private dining rooms for official functions, a conference/learning center with at least a half dozen or so classrooms (each of which has a full computerized A/V setup, of course). Incredible setup.

Whoever their designer is, they did a very tasteful job of incorporating purple into everything. Even the sprinkler heads on the grounds were purple. We didn't get in to see the development group's server farm (Darrell is in the Yahoo! Infrastructure Development group -- his official title is "Technical Yahoo") because Darrell hadn't yet activated his proxy key to work on the door to the room in the new complex. Bummer. It was the weekend, so there were no sysadmins around to let us in, either. We might stop at Yahoo! on the way back from Carmel later this week; we'll see.

We moved Darrell's old Atari Star Wars video game (full size) upstairs, and not without a good bit of effort. He'd been intending to move it there for quite some time, but never had the manpower to do so (in the end, it took me, Darrell, and Tracy to get it up the stairs). I'm no judge of weight, but this thing must have weighed at least over 100 pounds. There was much rejoicing when we got it up there, and we all played a few games (it's rigged to not need quarters, of course). A little while later, we were all downstairs again when we smelled a electrical-burning odor. We went up stairs and found that the game's video monitor had fried itself. So sad. :-( It's unique in that it's an X-Y monitor, not a raster monitor. Darrell has no idea how to even begin to fix it since he knows nothing about television repair. Doh. :-(

We went to Outback for dinner because it was Darrell's birthday (he loves the steaks there). Always good food and fun at the Outback.

Monday

Work up early for some reason and did some work. Played with the new automake with my dissertation code. It certainly seems to fix a bunch of bugs that existed in automake 1.4. Here's what I found:

Don't need to include a bogus PROGRAMS line in a top-level directory when making convenience libraries.

Don't need to have a bogus noinst_HEADERS line in a top-level directory to make the tags target work properly.

Seems to be better about making the various *clean targets, even in the directories that are not conditionally selected.

The new automatic-dependency-generation scheme seems to work, but I don't have anything other than gcc to test it with at the moment; hopefully it will work with KCC and the Solaris Forte compilers as well.

The ar replacement stuff doesn't seem to work with libtool; it only seems to work with the static linking that is built into automake. And unfortunately, you still need libtool to make the multiple-directory-library thing work. :-(

We went to Carmel-by-the-sea today. The Magellan took us there pretty much without fail. We stopped at a random pub-like place for lunch somewhere on the way down (again, with the help of the Magellan, but it was slightly off in the final destination). We checked into our hotel in mid/late-afternoon. It's mounted very high on a hill overlooking the Pacific and has a great view of the sea, the waves, and the rocky coast. It's quite breathtaking. Since it was so late in the afternoon, we just lounged by the pool and walked around on the hill trails and whatnot before going to dinner.

Amusingly enough, my cell phone doesn't get any service at the hotel. Not that anyone has called me, but it makes the battery drain at a high rate, as if it is in a cellular area. We must be just so high above Carmel proper that we're out of range of the cell/digital towers.

We went to the hotel restaurant for dinner, and I had some excellent local fish for dinner (don't remember what it was) and a local wheat beer; Tracy had a local vintage wine with her dinner. It was all yummy. We even had a working wood fireplace in our room, so we felt compelled to try it out. Fire, fire! (Beavis voice) :-)

Tuesday

We drove up to Monterey today to see the Monterey Aquarium and generally bum around the town. The Aquarium was neat; I got a fuzzy penguin for my desk. We walked around the town a bit and saw all the local sites and whatnot before heading back to Carmel for dinner at a local grill in the middle of town. More local food and vintage again, but some of it didn't agree with Tracy. :-(

Allison called Tracy back today and they decided that we'll go see her and her husband on Thursday evening before going out to dinner somewhere.

We did the 17 mile drive throughout the Monterey peninsula that winds around the coast and through the famous Pebble Beach golf course (and some others whose names I don't remember). Quite beautiful scenery, and some really big/expensive houses. We saw the official Pebble Beach tree -- it's over 240 years old.

It seems that I'm tromping through The Two Towers at an alarming speed and I might actually finish it before the week is out, and I didn't bring the last book in the Ring series with me this week. Doh! So we stopped in a bookstore and I bought the new Clancy book, The Bear and the Dragon since it's now out in paperback. This will definitely last me through the rest of the week and the flight back.

Wednesday

Some random notes about the Magellan GPS receiver:

The + and - buttons, which are for zoom out and zoom in, respectively (yes, you read that order correctly), are opposite of what one would expect. + means zoom out; I guess that means "show more map". - means zoom in; I guess that means "show less map".

We would periodically lose the satellite signal when in the foothills in various areas, and it would think that we weren't on any roads. Hence, it would tell us, "please proceed to the indicated route", even though we were already on the route.

When you look up something by category (say, a restaurant), you can't directly tell it to take you there. Instead, you have to remember the street address, exit out of the search functionality, and then go enter that address in the "plan a route" section. Pretty lame.

You can't insert waypoints at all -- so you can't say, "take me to A, then to B, then to C." You can only say "take me to A". Waypoints are a very useful feature; it would be quite a good feature to add.

It seems that roads/ramps/etc. that have been surveyed for GPS are marked in green. Other roads are marked in grey. Took a while to figure that out.

You can switch from the "map view" to the "turn by turn" view, but the directions / street names are clipped (vs. wrapped) in the "turn by turn" view. That's somewhat difficult, especially when you're not familiar with the area, and half the name of the street is missing.

They really need "page down" and "page up" buttons. Currently, you can only scroll by individual items.

All in all, the GPS was somewhat off in some cases, but it was remarkably accurate (all things considered), and it ended up saving us a lot of time and hassle trying to figure out how to get from point A to point B.

We left Carmel this morning, and drove up the coast on CA route 1 to take in the scenery. Very beautiful stuff; some areas of the CA coast are actually cliffs and quite dramatic. We took a hard right at one point and headed back to Yahoo! so that Darrell could give us a tour of the development server room (geeky, yes, but I really wanted to see it :-).

Whew -- that hard right took across large hills with lots of windy, turny roads with harrowing 10 mph, hairpin 270 degree turns. Somewhat stressful to drive across. But we finally got there. Darrell met us at Yahoo!, and escorted us into the server room. It was pretty cool. A big room, just full of racks and racks of machines. Probably about only half of the racks were full, but it was still a lot of machines. This is the server farm of just the development group of Yahoo! -- the production clusters are distributed around the world, and are much, much larger (Darrell says that the one located in Sunnyvale is about the size of the football field). It was pretty impressive.

We got to see the Yahoo! campus with a few people in it (there was nobody there over the holiday weekend). We also visited the Yahoo! store because it was closed over the weekend, and I bought a few things.

Discovered another cool feature of our car -- the rear-view mirror has no knobby thingy to adjust for night-time driving when headlights are bright. Instead, the mirror itself detects the bright headlights and tints the mirror to dim them. Handy.

Tracy and I continued on to San Francisco and checked into our hotel. The carpool lane on US 101 is a Good Thing.

We were pretty beat from all the driving and whatnot, so we just had a quiet dinner at a local diner and planned out Thursday's activities.

Thursday

We woke up early and took a bus tour of San Francisco. We saw all the major sites, and got to walk around at some of the higher outlooks over San Francisco. Great scenery. In the afternoon, we hung out on Fisherman's Warf, had lunch, did a little shopping, and waited for our ferry to Alcatraz.

Alcatraz was more interesting than I thought it would be. It was about a 10 minute ferry ride to the island. We watched a short video named The Secrets of Alcatraz that gave a brief history of the island and the things that have occurred there. Alcatraz was a military encampment, a defensive fort, and a prison. The island itself is only 12 acres. You can't walk too much around the island --
most of it is closed off. You can walk around some of the old Army barracks/guard apartments, up the hill to the cell block, and around in the cell block itself. Outside, only some of the walkways are open, including the recreation yard.

We took a walking audio tour by renting little audio devices (probably an MP3 player) that did a pretty good job of giving a history of Alcatraz (concentrating mostly on the history of the cellblock), had surprisingly good sound, and had you walk through various parts of the cell block while it talked about them.

After the audio tour, Tracy and I wandered around the island a bit. We strolled around in the recreation yard off the cell block building, and we randomly ran into Steve D., a previous grad student at Notre Dame in Computer Science, with whom I shared an office for a year or two. He currently lives in the SF area somewhere, and was taking his visiting brother to see Alcatraz. How random is that?

We wandered around on Fisherman's warf a bit more when we came back from the island, and then took a cablecar back to our hotel. The cablecars are kinda neat -- there is actually a cable running under the streets that the cars grab onto in order to move, and let go in order to stop. There are always pairs of cables running in parallel to each other -- one for a track in each direction. Not sexy technology, but it seems to work well enough.

We picked up the cable car at the end of its line. Each car has a distinct front and back, so when it reaches the end of the line, it has to be turned around. Since it's a cable, you can't really have a circular track. So what they do is run the car off the end of the track onto a big rotating wooden disk (that has no cable underneath is). Workers then spin the disk by physically pushing the car around so that it's pointing up the track in the opposite direction. They then push the car onto the new track, and start it up going in that direction. Again, not sexy technology, but it's worked well for quite a long time.

We met Tracy's college roommate Allison and her husband Jack for dinner. It took us a while to find a parking space and an open restaurant, but we finally did. Dinner was good, and the conversation was fun; I know that Tracy and Allison were glad to see each other again. Jack's a funny guy, so we all had a good time. The conversation drifted from what-are-we-doing-these-days to the DOJ's surprise announcement of not wanting to break up Microsoft anymore (don't get me started...). We left Jack and Allison at their apartment and headed back to our hotel.

The grade of the some of the hills in SF is just unbelievable. Wow.

Friday

We went up to the Napa Valley today and had lunch in Napa itself at some random restaurant (good club sandwich). We wandered around Napa and found a store that sold the wineglass trinkets that Darrell had (handy little sets of trinkets that you attach to a wineglass so that you can uniquely identify your glass from among a set -- say at a dinner or something). We continued up into the Napa valley and took a tour at the Mondavi winery (Tracy had made reservations in the morning). It was pretty interesting. They had a pretty modern setup (they renovated their winery within the last 2-3 years); hearing about their process and whatnot was pretty cool. The woman who gave the tour had obviously been with the winery for a long, long time, and her performance was flawless (very polished).

We tasted some wines at the end of the tour - a chardonnay, a cabarnet savingnon (sp?), and some dessert wine whose name I don't remember. Then we went to the wine store, and after a bunch more thinking and a little more tasting, we bought four bottles to take home with us. Yum.

Traffic was pretty heavy on the way back to San Francisco, so it actually took quite a while to get back to the hotel. On the way, we drove down "the curveyist street in the world". I'm quite sure that I've seen this street in a movie somewhere, but I don't remember which one. It's an amazing street -- I have no numerical stats on it, but it's a one-lane, one-way street that comprises of a series of 270 degree turns going down an extremely steep grade. Woof -- I can't even imagine living on that street.

We went to a random Irish pub for dinner. Yum.

Saturday

We got up at oh-dark-hundred for our flight. The GPS device had a "Return to Hertz" feature. Even though it came up with a whacky way to get to the airport, it was still cool. :-)

We took an early flight back to St. Louis (it was all that was left when we booked our tickets). It was on the same kind of big plane that we flew out so there was plenty of room, plus the flight wasn't nearly full.

The in-flight movie was Bridget Jones's Diary, a romantic comedy. It was cute, and had Hugh Grant in it, as well as a bunch of other random British actors. The second movie was Shreck. Three words: funny as hell. But we only got to see about half of it before we landed in St. Louis. Doh!! So I'm sure that we'll end up renting it sometime in the near future to see the whole movie. Plus, it will be good to see all the details -- the screens on the plane were pretty small and a little distance away. So the DVD clarity will be nice to see.

We actually landed early, and the flight to Louisville was uneventful. So we're home again, home again. Whew! I'll submit this entry now; I've got some other random notes, but I'll keep this entry specific to the California trip.

Only sagacious heads light on these observations, and reduce them into general propositions. --Locke.

Syn: See Shrewd. -- Sa*ga"cious*ly, adv. -- Sa*ga"cious*ness, n.

Tracy and I bought 3 CDs while in CA for our driving pleasure since we were driving all over the place and couldn't get a consistent radio station for more than 30-40 miles.

Enya: The Memory of Trees. More of the same classic Enya style. Good trance-like music for those mellow coding days.

George Winston: Summer. Piano stuff. I kinda tuned it out in the "background classical elevator muzak" category when we played it in the car, so I can't really comment on it intelligently.

Various: Friends again, songs from the show Friends. A few good songs, and some funny clips from the show. I can see a small number of those songs making it into a regular listening schedule.

After ripping these three CD's into MP3s, I'm still only using 38%
of my disk (a 45GB disk) that I bought specifically for music. A rough count shows that I have about 266 CDs encoded as MP3s. This still leaves me plenty of space to re-encode all my music in .ogg format when the ogg/vorbis encoder goes stable. Yummy!

My server went catatonic at some point while I was gone (possibly Wednesday). I found it this way when I returned Saturday evening. According to the logs, the server was still functioning, but it was extremely slow in responding from the net (such that a "GET / HTTP/1.0" effectively never returned anything, and ssh would hang while connecting). So I power cycled it.

I don't quite know what happened -- I suppose that it has done this before when I was running the dual NIC configuration, but according to Don, it went really, really, r e a l l y sloooowwww before it died. This unfortunately happened on the day that fhffl.com was doing their draft, and it screwed up the process royally. Doh. :-(

Apparently, Don and Ed realized part way through the process that they had a CVS checked-out copy of slightly older files on Don's laptop, and so he fired up apache and ran the rest of the draft from there. Still a bummer, though.

This also screwed up John's access to his e-mail. Double doh. :-(

While I'm certainly happy to provide these services to my friends, it does say something that I don't use these services myself -- I pay someone to host squyres.com's e-mail and important web sites (my friends all know this -- Don and John were just unfortunately burned by this). I don't run anything like a production environment, nor do I want to (at least not out of my home). For example, I don't want the rest of my family's e-mail to be affected if/when I need to take my server down for maintenance. Indeed, in the (literally) years that I've had the squyres.com e-mail hosted with Pennyhost, I've really only had one problem
-- my dad's mail went into the void for about 2-3 days when Pennyhost moved over to new servers (but no one else's mail was affected --
). The rest has been essentially 24/7 continued service.

Indeed, my DSL modem has been crashing a bit recently (don't know why that is happening, either...). Although this is the first problem that I've had with the server itself, I'm not around all the time to make quick fixes when something goes wrong.

outpost.com is quickly falling out of favor with me. I ordered 2 things for Darrell's birthday last Monday. I had them directly shipped to Darrell instead of to me.

I came home last night to find a call from outpost.com calling to verify the order because I had it shipped to someone else. That was initially annoying because it had delayed the shipment, but then I thought about it, and was actually pleased with outpost.com for doing this -- they were looking after my interests, after all. So that was ok.

The message included a phone number, which I called. I gave my order number and said that I wanted to confirm the order. "Oh, that's already been approved," the woman told me. "We checked with your credit card company."

WTF?!

Why would they check with my credit card company for verification of my order?

What exactly did my credit card company tell them? They certainly know nothing of Darrell's address.

Of course, I didn't think of these questions until after I had hung up. But it's still somewhat annoying.

Even worse, only one of the items that I ordered was shipped. The other was "out of stock", even though it was clearly "in stock" when I ordered it on Monday (at least, that's what their web site said!). And then to add insult to injury, outpost.com doesn't know when they will get any more of this second item -- "The manufacturer hasn't informed us of when we'll get more."

Grrr...

So Darrell has a ReplayTV unit and was expounding on its greatness while we were there last week. So I went on the web while watching the ND/Nebraska game (and why not? The game was barely worth watching...) and checked out their web site. It seems that just this past week they announced a new model -- the 7000.

It certainly seems impressive, and the fact that it carries no monthly fee is compelling. Especially since TiVo's future is uncertain. Then I noticed the price tag -- starting at $700. Yow.

I'm also unclear how it works with my cable receiver box. Darrell says that it works via RS232 or IR relaying to the receiver box. I don't quite know how the RS232 would work (there's so many different cable receivers out there; I can't imagine that they interface to all of them, unless there's an industry standard interface, but I kinda doubt it), and D says that the IR relaying is somewhat shaky. Hmm.

There's no reason to act on this impulse right away. Indeed, seeing Darrell's impressive setup gives a lot of food for thought about how I want to do my home entertainment center anyway. Heck, Tracy and I spent an extended lunch talking about possibilities for a whole-home digital entertainment system at some random Irish pub last week. This will take time and a lot of thought/planning.

I should mention the ND/Nebraska game. The first quarter was dismal. Dismal, dismal, dismal. It really showed that this was our first game of the season, and Nebraska's third.

Our defense woke up in the second half and generally did a pretty good job. But our offense continued to suck throughout the whole game (pretty much the story of last year...). I'm not sure that I agree with this while 2-QB strategy; it can be tough for a team to sync up with 2 different QB's. You can tell that we have young QB's; they don't have a lot of experience and need to calm their game down a bit.

Granted, Nebraska is a very good team. As a rule, I don't bash on the ND football team, especially since I know just how hard it is to be an NCAA athlete at ND (read: extremely difficult, both academically and athletically). But some refinement is definitely necessary, and darn quick. It would have been nice to be able to score one more time in the fourth quarter.

We'll see what happens with the rest of the season.

This past week, I was traveling and had no access to the internet. I only had my laptop with me. I was playing around with jam (the make-redux thingy) during various down-times throughout the week. Since I had a small screen, I was using lynx (a text-based web browser) to read their docs (which are in HTML). There are no images in the docs, and even if there were, they wouldn't have mattered much to what I was trying to do (read and re-read textual information).

Later in the week, I accidentally had Netscape (or Konquerer --
can't remember which) up, so I brought up the jam docs in that instead. I was amazed at the visual difference. The presentation of the same text with the same HTML in lynx vs. Konq/Nets was enormous. I found the text much easier to read in Konq/Nets. I liken the difference to coding in plain vi vs. coding in colorized emacs or vim. Even though one intellectually knows that colorizing keywords and the like will "help", you can't truly appreciate the difference and assistance that proper syntax-hilighting does for code until you start using it regularly (i.e., it's an enormous help, no matter how good of a programmer you are).

Here's some of the things that I noticed:

lynx uses the same colors for multiple things (I'm sure that you can change this, but I was going with the defaults -- i.e., what most users see)

lynx does not visually separate between <LI> items (and possibly some other things; I didn't compare closely)

lynx does not show "." or numbers to start each <LI> item (<UL> vs. <OL>)

K/N pretty much does the opposite of what I mention above. As a result, I immediately switched to K/N for reading the docs because it made the job of reading much easier.

I fired off an e-mail to Jeremiah (who uses lynx almost religiously). We frequently tease him about using lynx, and he's almost always had good answers to our teasing along the lines of "lynx doesn't crash", and "lynx is pretty solid and reliable", and "lynx is a lot faster" (although that last one is less important with generally faster machines these days). I asked him the following questions:

Have you experienced this? (the loss of textual information due to presentation)

Have you ever compared the output of the same HTML between lynx and a gui browser? (since he generally browses text-heavy web sites when he surfs, such as news sites and the like)

I'm genuinely curious. Even though I have only traditionally used lynx for quick-read kinds of things (like docs on a local hard drive), I will probably not continue this practice, and use something like K/N in the future because of this experience.

September 11, 2001

September 11, 2001

What a horrible day.

There isn't much that I can say that can sum up what we all are feeling; many others who are much more eloquent than I have said better words than I can muster. Along the same lines, there's a bunch of idiots saying just clearly stupid things. The media has clearly been sensationalizing this tragedy. I had to stop to get gas to make it back to Louisville; it was amazing what some of the locals were saying inside the gas station (e.g., I actually heard someone say "Buy guns. No one is safe.").

Driving out of Bloomington today, there were tremendously long lines for gas at every gas station. Some of the stations had clearly artificially raised prices. The Louisville news showed a gas sign at
$2.50 for unleaded (the normal price around here 2 days ago was about
$1.60-1.70). Is there no end to greed?

But then again, I was truly inspired hearing about all the stories of random New York people helping others. Thousands of volunteers just showing up asking for what they can do to help. Blood banks filling with hundreds of volunteers -- all over the country.

This is a horrendous tragedy. Like most others, I'm still finding it hard to believe that this has happened. "It's just like a movie," I've heard many say.

But this is just what the terrorists want. The first part of that word is "terror" -- that's the real win for these people. Not just a horrific event, but the long-term affects of creating terror in our lives.

I refuse to submit.

I will continue to live my life. I will grieve with the rest of this country, and try to comprehend what has happened. But I will not live in terror.

And so must everyone else. Will there be heightened security? Probably so. But I will still be me, and you will still be you. Remember that, and remember that we Americans are a resilient people... even if it doesn't feel that way right now. We will live on. We are resilient people.

Remember that, and be that.

What about retribution?

This is actually a complicated issue; my feelings on this are varied and conflicting. The outraged American male in me wants immediate public, brutal retribution on all who were responsible. Seeing the TV shots of Palistineans rejoicing at our loss is just incomprehensible. And I do not doubt that the US government will activate the military in the next 24 hours for some kind of retribution. However, swift retribution by the US will undoubtedly spawn additional anti-American sentiment, and potentially create a downward spiral of cause and effect. But doing nothing would create the perception that the US is weak, and potentially invite more terrorist activity in the US.

I can do anything about this, so I can only hope and pray that the US's response (as I believe that there inevitably will be one) is a directed attack and only the guilty are affected.

As with the rest of American and American-friendly parts of the world, my thoughts and prayers go out to all who were affected today.

September 12, 2001

More aftermath

Just a quick journal entry to answer a question that many have asked me over the past 24 hours...

Many friends and family have anxiously asked if I am being activated in response to the terrorist attacks on the US yesterday.

The short answer: no. I am a computer geek; there's little that I could do to help in a situation like this. I highly doubt that I will be activated because of what happened yesterday. Never say "never", of course, but strongly doubt that it will happen.

I do really appreciate the concern, however. It's touching to know that many people know and care about Tracy and I; thank you so much for asking.

Clan Squyres was fortunate enough to be unscathed by yesterday's events [at least directly]. Cousin Maggie and Uncle Jim are all safe and sound. Deyun, and old-time LSC'er, works right near the towers, but is also safe.

Remember: refuse to submit to the terror. Today is a new day. In many ways, the world is different than it was yesterday, but it is still a new day. And that is something.

Jeff's Journal

For the memories.

It has been said that the shuttle disaster was the JFK of my generation. Indeed, I still remember where I was and what I was doing when I first heard about the shuttle blowing up (freshman in high school, standing at my locker when the guy with the locker next to mine told me about it). Why do I need two of these kinds of events in my life?

I was at IU/Bloomington. We were having a working meeting of members of the OSCAR core group. September 11th was the second day of the working meeting. We had worked until about 10pm the night before, and I had gotten in to the conference room a little early that morning to write some notes up on the board, outlining what we had decided the nigh before, etc.

A few minutes after 8am (Central time), the first of the others walked in the room and said, "Did you hear the news? A few minutes ago, a plane slammed into the World Trade Center in New York."

Over the course of the next 30 minutes, the rest of the OSCAR group came in and each had a new bit of information: a second plane slammed into the second tower. Rumors of a plane hitting the pentagon. And so on.

Brian came in during this time with the router so that we could all hook up our laptops and start surfing the web to find some concrete information. Unfortunately, as it was happening, there was little information on the web. News sites were hard to reach; cnn.com was especially difficult to get to. It took quite a while to get hard details from the web, and rumors abounded (sensationalistic journalists didn't help, either).

Since we had limited time together, we tried to work through the day, and tried to not surf for news except during our breaks. Reports of the towers falling down were met with initial disbelief (comments like "I can't believe that that's true"); it took quite a long time before concrete confirmations were available on the internet.

Several of the OSCAR group members were federal employees (they're researchers for the federal labs), so they got updates from their home offices via cell phone and e-mail fairly regularly. Most of them had closed for the day, and/or gone to higher alert statuses. Notre Dame closed. IU didn't. Brian mentioned that he was glad that he wasn't at Sandia; Sandia is physically located on an Air Force base -- the security there (particularly since they do Secret stuff at Sandia) must be super-tight.

Some of the OSCAR folks had driven to IU, so they had no problems getting home. Others had flown in from far away; they had to drive back home because all the flights were canceled. Luckily, they all had rental cars already, so they at least didn't have to fight to get a car.

There were oodles of IU students hanging around in lobbies and hallways watching TV's with shocked looks on their faces. Most were spellbound. I have to admit, that I kinda envied them on the ability to just sit back for the entire day and soak it in -- we had to work through the day, and it was quite distracting to think of the tragedy unfolding; work seemed rather trite at some points during the day.

Heading home after the meeting, I was walking off the edge of the IU campus and saw a lot of film cameras and TV gear. I saw some guy doing promos for a special episode of "America's Most Wanted" for this Saturday evening. Why the AMW crew was in Bloomington, I have no idea. But apparently there's going to be some special episode (about the WTC attacks, of course) this Saturday evening on FOX. There you go...

I had only about a half tank of gas; it takes roughly about a half tank of gas to get from Louisville to Bloomington with the AC on. So I figured that I'd have to stop at a gas station on the way out of Bloomies. I turned on the radio (it was still tuned to Bob-n-Tom's native station in Indianapolis) and head more updates on what had been happening during the day (remember: I had pretty much been limited to internet information all day). I heard Bob (from the Bob-n-Tom show) talking about it, and reporting on various effects across the country.

He mentioned about how there was already some price gouging on gas going on in Indianapolis and how there were really long lines in some places. He strongly said (paraphrasing), "Look folks, there's no need for this. Getting gas is not going to help you. The prices are not going to go up, and if they are, it's temporary greed."

Sure enough, all throughout Bloomies, there were lines at all the gas stations. The prices seemed normal, though.

"No problem," I though. "I have to drive through a lot of nowheresville, IN. Surely there will be a gas station in one of these small towns that doesn't have super long lines." So I headed out of Bloomies and started on my normal back roads way home.

Amazing. Even in Nowheresville, IN, there were lines. At the first 3 gas stations that I passed in random small towns in Indiana, big lines. One of them had prices over $2.

I finally found one with no lines. So I pulled up behind someone, waited for them to finish, and then drove up to the pump. The station had some kind of problem with their pay-at-the-pump system because my credit card failed to work twice in a row (even though it worked for the woman in front of me). By this time, lines were starting to grow. Amazing.

So I went inside to pay and saw that, indeed, their credit card system was down. So there was a long line to pay as well. I heard some interesting conversation while in that gas station in good old, rural Indiana. Suffice it to say that I consider most of what I heard to be uninformed, racially- and anger- motivated, and generally completely ignorant. I won't even dignify repeating what I heard here.

I finally drove out of there (fortunately, I had cash) and drove the rest of the way home without incident.

I found out more information from the radio than I had gotten all day from the internet (granted, much more time had now elapsed, and there was generally more information available anyway). Tracy's day had been similar to mine -- she didn't find out until sometime after 10am (Eastern time), but they pretty much tried to keep working throughout the day. TVs were on, but people tried to keep working.

Lots of companies (including GE) have made today an "optional" day, and GE has organized a blood drive here in Louisville. I'll probably wait for the lines to die down a bit and head over (might actually be tomorrow) because I have O- blood, which is the universal donor.

September 15, 2001

You got fired from Lucky Burger? How humilating!

The previous owner of my phone number -- Shelby R. -- is the bane of my existence. I keep getting calls for him.

Bah!

Got several new CDs:

The Crystal Method: Tweekend

The Chemical Brothers: Dig Your Own Hole

Juno Reactor: Bible of Dreams

ATB / George Acosta: Trance Nation / America Two

I haven't had a chance to listen to them all thoroughly yet, but they all sounds pretty promising. Tweekend, in particular, is pretty cool.

I still get one to two dozen hits of Code Red each day.

Gotta seed my lawn this weekend. Ugh.

I talked to Kevin Barker this week, which was cool. We talked about all kinds of things, like his upcoming Ph.D. proposal, C++ Friday lunch, linksys DSL routers, and lots of random Notre Dame and William and Mary gossip to include lots of laughs. It was good to talk to Kevin again.

I had a long chat w/ Brian about our plans for fault tolerance in LAM/MPI. It seems that I never filled him in on some of the discussions that Lummy and I have had, as well as some of the thoughts that I've had on the subject. Whoops; my [big] bad. I think that this happened because of the whole distance thing -- imagine three corners of a triangle. Communications along one leg does not imply communications along the other two legs.

When each of the corners are physically separated from the others, it makes it difficult to keep track of communications that occurred between any pair of corners. We'll have to work on that.

We have to spend the rest of the weekend working on a document about Brian's work at Sandia; gotta get it to Brian's boss very early next week.

The OSCAR meeting at IU went very well. While much remains to be decided, many topics were hashed out, thrown out, new ideas introduced, etc. It was a good working meeting. It very much reminded me of the MPI forum meetings, but with less people.

We might have another meeting at IU in about 2 weeks, but it's not completely clear yet.

An unexpected effect from the WTC attack this past week...

My church here in Louisville has DSL; I helped them set it up a few months ago. On Thursday evening, they dropped off the air; I sent something to one of their mailing lists and wondered why it hadn't showed up 30 minutes later. I then checked the web site, and saw that the name wasn't even resolvable. Doh!

After a while, it dawned on me that the DSL company, Intercom Online, is located somewhere in New York City. A whois confirmed this, which I cross-referenced with mapquest. Sure enough, they're only a few blocks away from the WTC.

Even worse, on Friday morning, I got the LSC mail server nightly report that said that it rejected a bunch of mails from intercom.com because the domainname didn't resolve (a common anti-spam tactic). Doh! I assume that this (or something similar) had happened because they sent mails after their DNS servers had failed.

I had heard stories of ISPs in lower Manhattan who were dropping off the air on Thursday because their backup generators ran out of gas, and the lack of civilian traffic in lower Manhattan prohibited them from getting any more. I assumed that this is what had happened to Intercom Online.

A few panicked calls to Dog and Curt later, I had that anti-spam tactic temporarily disabled on the LSC mail server so that if the mails eventually tried to be resent again (e.g., if they were already delivered halfway to me to some intermediary server that was not down, and would eventually be retransmitted), they wouldn't be rejected.

This morning (Saturday), I got all the back e-mails in addition to one more status message from Intercom Online. It seems that a water pump failed in their backup generator on Thursday evening, and they had to have a new one shipped in. Given the state of disarray in NYC, it took about 36 hours to get a new one shipped in, setup, and functioning. They restored services very early on Saturday morning.

On the one hand, they're a business. My church paid for services, and we should expect nothing less than exemplary service from them (I'll tell you -- those services aren't cheap!). Indeed, none of the church staff has internet access on Thursday evening or Friday, and it was an impediment to their normal business. So going to extraordinary measures to keep service running is what we should expect.

But then again, given all that is going on in NYC right now, I think one day of inconvenience and lack of internet is absolutely nothing compared to what those folks must be going through. Indeed, with all the tragedy and strife in NYC at the moment, these people are professional enough to go to extraordinary measures and spend even more time away from their families in order to restore services to their customers.

And that's amazing.

So major props to Intercom Online. Thanks. While we certainly greatly appreciate your efforts, we're more happy that you're all apparently safe.

I went to give blood yesterday at the GE blood drive. Here's the results:

Time elapsed: 5 hours

Needle holes in my right arm: 2

Needle holes in my left arm: 1

Bruises on my arms: 2 (around the needle holes)

Blood given: much less than 1 pint

Free dinner: consumed

So I tried to do my citizen-ly duty, but my blood somehow wouldn't flow into the darned bag. It apparently flowed out into the rest of my arm (i.e., the bruises). Doh. :-(

Go give blood.

Pine 4.40 is out. It built fairly easily, in contrast to previous attempts. I credit this to the fact that SSL is built into the source code itself -- for Linux, I just did "./build slx SSLTYPE=unix". But I did have to edit imap/c-client/OSCFLAGS to have the right -I flags for where the OpenSSL include files live.

September 18, 2001

Once you envision Lord Vader as McNemara, it all falls into place.

Checking my logs since May 4th, Telocity has provided me with a total uptime of 88% (this is with 214,056 data points).

This means that I was able to ping my DNS server (a Telocity machine), Excite, and Notre Dame 88% of the time.

Adding in the times where only 2 of those conditions were met, it's closer to 95% (for example, there have been many times when ND wasn't reachable, but that was through no fault of Telocity).

95% is pretty darn good for a home-subscriber DSL, actually. 95%
is actually a total of 18.25 days down out of every year, and that's not really all that good, but for a home-based service, that's a heckuva lot better than what most other companies provide!

Spent most of the weekend writing the Sandia LAM doc. Woof. Had a big pow-wow w/ Brian and Andy to hash out more of the plan on Monday afternoon.

Found a possible culprit for my web server going bonkers -- power failures in Louisville. It happened again yesterday -- DSL worked fine in the morning, but when I got to Bloomies, I got no love when trying to connect to squyres.com. Doh!

When Tracy got home, queeg had been rebooted, and the server was hung. So I think it was a "quickie" power blip that caused queeg to reboot properly, but only caused the server to go catatonic and not fully reboot properly. Hmm. Might need to buy a cheapie UPS or something.

As a result, I'm downloading a bunch of mp3's to my laptop from my server. This will allow me to listen to my mp3s:

When squyres.com is down

When I'm not connected to a network

Tunes for life!

Had a pow-wow for the OSL today about what the vision for the lab was, etc. It was a good synchronization point for all of us. For any of you outsiders reading this, go read our web page: http://www.osl.iu.edu/. :-)

September 25, 2001

Who is Nixon? Yoda. Yoda was a muppet.

I had a fairly lengthy journal entry about Terry's wedding this past weekend, and it got lost. :-(

It was a great weekend; Tracy and I flew to Baltimore last Thursday (no problem), and then drove to Philly. We met with friends and family all weekend. If I tried to explain all the inside jokes and family one liners, it wouldn't work at all.

Suffice it to say that the ceremony was great, Terry and Alan are now married, and a new family legend was born: my teenage cousin Patrick was bawling his eyes out on the altar.

Tracy and I caught Terry and Alan on Sunday morning before they left for their honeymoon, and then drove back to Baltimore and flew back to Louisville (again, no real problems).

A good weekend. Now I have another brother.

Brian and I tied up a bunch of loose ends in LAM. Including the infamous "set_stdio" problem that has been plaguing us for weeks --
we've just never had time to look into it properly. A helpful LAM user supplied us with a key hint that I don't know how long it would have taken us to figure out if he hadn't told it to us. The problem was a faulty socket.h in RedHat 7.1, such that the setup for a STREAMS control message was broken for non-gcc compilers because it was missing a key preprocessor protection block. Blech.

September 26, 2001

Reactions to WTC

Nimda! Nimda! Nimda!

My web sever is getting hit with it all the time. I've modified my code red script to also look for Nimda signatures (all 16), and to mail the owner back (if they're running an SMTP server). It will also mail abuse@telocity.com if they source is in telocity.com. I could probably expand it to mail "abuse" (and others) at any two-level domain, but I'm not that ambitious. :-)

10854 hits of Nimda and counting.

This dwarfs the total number of Code Red (and variant) hits that I've gotten: 3283.

Some of my new CD's contain beeps that sound identical the beep from my mail program (pine) indicating that there is new mail.

It causes many false positive indications of mail.

I am quite concerned about some of the proposed legislation in Congress in response to the WTC attack. This is my particular field of expertise, so I'll keep it to the facts.

The administration has taken the opportunity of the WTC attacks to bring back proposed legislation for mandatory "backdoor" access to all cryptography products. Essentially, this means that the FBI will have a "backdoor" to be able to instantly decrypt anything that is encrypted with cryptography software that is made in the US. From a law-enforcement standpoint, this is not a bad thing. You can always see what the Bad Guys are saying -- right now, there are most likely all kinds of Bad Guys using encryption to hide their nefarious plans. Even if the FBI (legally) intercepts their communications, they can't decrypt them, so they can't get a lead on what the Bad Guys are planning. This is obviously Bad.