Wednesday, August 30, 2006

When puchasing computer hardware, unless you get your hands on the equipment before you purchase, you have to rely on past experience, experience of others, or benchmarks, to see if they have enough capacity for your users needs.

There is a large wealth of information you can use to judge server capacity. There is also plenty of information for storage capacity. One place that is almost always overlooked is network devices. I have seen many people in the past, blindly using the vendors advertised specifications to determine their needs. The assumption that they have Gig ports on a switch and the vendor claims the backplane is "HUGE", does not mean that you can assume that the actual performance will be "wire" speed.

If you are in the market for network hardware at the moment, before you go too far you should have a sober look at Simon Bullen's blog on Cisco's 3750. I have always had my doubts about Cisco equipment in the past, and the reasons why people choose it over other vendors. It would be nice if we could have somewhat neutral benchmark standards body like www.spec.org to produce a set of network equipment benchmarks. My searching so far only finds vendors (or their proxies) competitive benchmarking their opposition, and of course whipping their ass. You would not expect less.

While independent results from people like Simon are fantastic. It would be nice to see network vendors put their crown jewels on the benchmarking chopping block for all to see. The nice thing about this is if they don't post results for certain equipment, they may be selling you a brumby.

Thursday, August 17, 2006

Is there a breakout of foot-in-mouth disease at IBM lately, or have do they just have a drug problem? I know the Solar System is currently being redefined, but what planet are they on?

Their recent comments on Sun open source creditials do make you wonder just what are they thinking. Due to their work on the Linux kernel, Apache, Eclipse etc, IBM have been able to enjoy the admoration of the open source community, while at the same time keeping their products very closed. To now it has been very clever marketing. The problem they have now is that some commentators are starting to see through this, and are asking why IBM does not open source its software and hardware like Sun Microsystems has been doing.

The obvious reply to this to avoid the focus on IBM's closed products, would have actually been to acknowledge Sun's effort, and focus on how long it takes to change a product of the size of Solaris and AIX from multiple closed source licenses to an open source license. Instead they have decided to attack the Open Solaris and Open Sparc projects stating that they are not truely open. This is just total bullshit!

Rather than hosing down the question, giving the commentators little to write about, they have decided to give the press a juicy story. This would be fine if they had a large swag of products based on open source. The fact is that they have not, and the press have now alerted to a renewed battle between IBM and Sun with IBM standing on very shakey ground, surrounded by mountains of closed source and hardware.

IBM's current statements are probably an indication of they think Sun's push to open source not only it's entire software stack, but also hardware, is a medium to long term threat to IBM position. It looks from what they are saying, is at the moment they are scratching themselves to find a solution. Using verbal attacks on Sun is just plain counter productive. I think the future may be tough for companies that dabble in open source while at the same time keeping a very closed product line. Hypocrisy does not further their cause.

What can IBM do to get themselves out of this mess? I think a good start would be to avoid opening their mouths, and start work on a timetable to start opening up their products. They state that their customers are not interested in AIX enough, but I find this very hard to believe. When Solaris was re-released onto x86 hardware, they stated that the was not enough interest from their customers to release software for the platform. Their stance changed very quickly when they got a tap on the shoulder from some of their largest customers. I think if they are seeing little interest from their customers, then this is a more of a worrying sign of the future of AIX. I hope this is not the case.

Now, that is really nice. The C program is listed with the assembly code for the corresponding C code. It makes a very nice assembly language tutorial. Now, lets do the same, but compile using the "-fast" flag. (-fast is actually a macro for several other flags. It is known to generally give the best optimized code for your system with the least effort)

As you can read from the comments, both of the functions were inlined. Therefore they are now part of the 'main' function. The 'er_src' program is really neat app. Lets see the comment change when we tell it not to inline.

Many Solaris system admins or developers would know that Solaris has some very good debugging tools. Most sysadmins would know there is a command called mdb. Sadly most would have either never used it, or was scared off when they scanned through the documentation. While using mdb does require a good knowledge of the Solaris internals, and some assembly language skills, there are times where it is probably the only (or best) tool for the job.

Consider the case where you have an application that your company has been using for a long time. Something has changed on the system, and now it crashes when it is run. Since the person who wrote the application now does not work for your company anymore and nobody knows where the source code is, you have a problem. To make things worst, when you do a pstack on the core file, you find that they have “stripped” the binary of its symbol table to save a few bytes. Your options are now really limited to do any useful debugging. Enter 'mdb'....

Now to simulate this I have created a small C program, with a null pointer buried in a couple of functions. I compile the program (not using any optimizations as the compiler will inline all of the functions as they are very small), and then run the strip command on it. During running the program we get not a very useful error message, and a core dump. Argggg!

Running a pstack on the binary, because it was stripped, pstack returns an address with “????????”, as the function name. Ah, it is now turned into a challenge.

Since there is nothing in human readable form, at this point most people would look elsewhere or through it in the too hard basket. If you know a little assembly language (32bit x86 in this case), you should probably continue on. A good starting point would be the assembly listing of the function where it bombs out. The first address “0x80506d5” is for the instruction where we bombed out. Doing a disassemble backwards from this address is tedious, especially if this instruction is a long way from the beginning. The address on the next line “0x80506f4” is actually more useful. It is the return address of the function, which should be the next instruction after the function call. The function calling code should be immediately before this. Lets attack it with the disassembler built into 'mdb' byte by byte.

Bingo! We have a winner - 0x80506c0. You will probably notice, the "call" op-code (1 byte) was followed by a 4 byte address, so we could have first tried the addess – 5. In my case the command inside of mdb would have been “80506f4-5::dis”.

Now we have an address, we can now easily list the function from the start.

From a quick look at my disassembled code, it is clear that some idiot created a null pointer , and then tried to copy a byte to there. Not very bright eh! In a real world example you would probably need to run the command in mdb and set the breakpoint to the start of the function. From there you could step through the code to see what is does. It would go something like this -

Thursday, August 10, 2006

If you have been reading the zfs-discuss on OpenSolaris.org recently, you would have read that Robert Milkowski has been doing some benchmarks using Sun's StorageTek 3510 FC diskarrays. He has been getting some interesting results that suggest that using ZFS and the 3510 without the hardware RAID controllers is faster than using it with. This is very interesting because the cost of hardware raid controllers can be expensive. If it suits your needs, you can save some cash by using Solaris 10 and ZFS, as both are free!

Since I don't have have a 3510 sitting around to test on, I decided to do a quick benchmark on a spare partition of my laptop to compare ZFS and UFS. We have all been told that ZFS is faster than UFS, but by how much and when is a interesting question.

Using filebench as Robert did, I have started with the varmail workload using the average of three runs to produce the graph below. For each run I created the pool (ZFS) and filesystem, did the three benchmark runs for 60 seconds each, and then destroyed the partition for the next benchmark test.

As you can see ZFS is indeed faster for this benchmark than UFS. To be fair, and to compare apples to apples, I should have combined UFS with the Solaris Volume Manager (SVM). This most likely seen a greater gap between ZFS and UFS. One thing it shows, is that a Acer Ferrari 4005 maybe a nice laptop, but it makes a horrible mailserver :(