November 2005 Archives

Considering that I went to a session at BS05 on this topic, you'd think I'd remember it. But no, I didn't. I've carped before about ncpfs not being cluster aware, and I'm right. It isn't. I had just spaced the fact that Novell has fixed this problem with novfs, which released (or went public-beta, not sure which at the moment) in July.

I have it on an OES-Linux server right now, and it is working pretty good. I haven't had a chance to test out cluster failovers, but I strongly suspect that it works better than NCPFS did.

So far, things are looking really spiffy. I've already proven that I can move NetStorage (a.k.a. 'myfiles') over to an OES-Linux box. The next trick is getting myweb over. I'm almost there. My Linux hacking skills are decidedly lacking, but I have a path of attack now. It keys on Apache running in a specific user-space, but I understand that's possible.

I also recently learned that UserDir is multi-valued. So something like this...

would work. It goes down the list until it hits the directory that returns 'found'. Nifty, and saves a lot of grief. The trick has always been how to get NCP shares mounted on a Linux box, and novfs, by way of the NCL, appears to be the answer to that.

11:20pm on the Friday of Thanksgiving break, and we have about six students actively working on documents. So no starting early for me. Part of me wonders what the heck is going on, but then I remember that professors have a nasty habit of assigning lots of work before large weekends like this.

We've had some curious events happen on the cluster lately. The situation described here is pretty close to what we've had happen. The MyFiles service for students (a.k.a. NetStorage for you non-WWU people) has been crashing lately, forcing abends. Apparently our nodes have learned a new trick related to this, in that they're flushing a bit of I/O to the mounted volumes when we leave the abend screen. This is bad since when we leave the abend screen the volumes had been housed on other servers already, and the error gets thrown.

So we have a chance of random file-system corruption! Whee! So we need to fsck/PoolRebuild the things, and that takes time. We did some last night during our normal Tuesday night maintenance window (I be tired), but didn't get all of it. As ATUS just mailed out, we'll be finishing it off Friday night. Last night's fun didn't discover anything significant, just the normal file-system entropy of a few corrupted file-names and some files missing their parent links[1].

Friday night starting at midnight, the U: drives for two thirds of our students will go away. This being a Holiday weekend, I expect police-dispatch (our off-hours 'helpdesk') to only get a couple calls as a result. We're also doing the big shared volume on the Fac/Staff side (our largest single volume), and that'll be down probably from Midnight to pretty close to the 10am mentioned in the mail.

[1] NSS is different than POSIX file-systems in that each node has both child and parent links. This is nifty in that it allows inherited permissions to work easier. The files in question could still be accessed since their parent node, a directory, had a child-link to them. If that child-link was also missing, they'd be a truly orphaned file.

The Infocon level has been raised to Yellow. A critical exploit in IE has been discovered that as yet has no patch. The details are emerging now, but at the very least it is possible to use it to launch an executable locally, and it is unclear if it is possible to pass parameters to it. If so, this is a very major problem, as it is possible to script ftp to draw files down and the a later run of the exploit can run it. The most likely vector for this will be ad-buys, which adds another weapon to the spyware toolkit.

Not a whole lot going on that I can talk about, and not a lot that I can't. We're in the process of completing the migration of one of our web servers to new hardware, and it is running into a few snags. This is one of those projects that has been in the state of, "deply in two weeks," for coming on two months now. We're a go for next Monday. We'da done it today, but we had a snag late Friday and pushed it back (but didn't need to).

Free space on one of our exchange servers is at 4.93% for the data-store volume, so we're handling that one.

Our Netware cluster has started throwing NSS errors when migrating volumes, and that is concerning. We'll be dealing with that one over the break, since it'll require multi-hour downtime to fix, and that ain't happening this time of quarter without either smoke or clear indications of imminant smoke. It only happens on failover, and that's not something we do a lot.

I learned that Apache 2.2 is probably releasing in the next couple weeks. I won't go to it since all of my Netware stuff keys off of Novell-supplied modules and they'll need to recompile and distribute those first.

When used in conjunction with their VMWare Player it allows a person to browse the internet in a way that greatly shields you from the ravages of spy/malware. The VM may get infested, but rebuilding it takes a matter of clicks, not a half day of system reinstalls. Not that it's at high risk from getting infested with an OS and browser listed as, "Mozilla Firefox 1.0.7; Ubuntu Linux 5.04". A great time-saver for those who lurk the seamy side of the internet.

In something of a stealth upgrade, Titan was migrated to new hardware this weekend. By all reports, no significant problems were encountered and things are ticking along just peachy. Things are running somewhat faster on it now.

I came across an interesting theory of what the future of NetWare could look like. What if NW7 or NW7.1 were designed to run in a Xen virtual-machine on SLES?

What the heck? I hear you say?

There is a good reason for doing so. By running in a VM you greatly reduce the number of drivers you have to have to run NetWare. Or more directly, if Novell provides the drivers for the VM interfaces, the rest of us won't have to worry about when Adaptec will stop producing .HAM drivers, or Dell will stop producing PERC drivers.

It is an interesting concept, and a way to support NetWare in a world where vendors stop providing Netware drivers. This is key for some systems that even today still have NetWare dependancies and their replacements are either too expensive or don't quite meet the business need.

I do have qualms about the idea since it by definition requires a different operating system to run NetWare. Unlike DOS, this different operating system is still accessible when NetWare is running. Patching of said might provide for increased downtimes. As you can imagine, I'm still a little dubious on the topic of virtual machines in general.

An item on our project list for some time is to see if we can convert our Solaris logins (titan) to use our eDir information. Through the use of PAM modules, this is quite doable. The trick would be to map current usage to new usage, which would require somehow transferring the UID/GID info from the existing NIS database into eDir. Among other things.

It looks like that anyone with Update access to the "uidNumber" attribute can set it to an arbitrary integer. Such as, oh, zero. As you can well imagine such awesome power needs to be used wisely. I can tell that this information will likely scuttle the project, since the Solaris admins and the Novell/MS admins are different folks who each guard their respective administrative interfaces zealously. The ability of the Novell/MS admins gaining Root to the Solaris boxes at the flick of a checkbox will not be met gladly.

On the other hand, when we start setting up OES-Linux boxes managed by the Novell/MS guys having such information will be peachy. In fact, it'll be nifty. With a bit of work we can use eDir groups to manage access on these OES-Linux application servers. The functionality on Titan will stay there, the folks that'd end up working on OES-Linux are most probably developers working on databases and web-servers. Regular old end-users wouldn't get involved in this.

So while the original intent of the project probably won't happen, there is some good that'll come out of it anyways.

I received word that two of the developers on the LibC team have gotten the axe. According to the guy on my support ticket, the total carnage is half the team so there may be more than just the two I know about. Either that or the whole team was four people. Hard to say. But still, this does not bode terribly well for getting my problem fixed in a reasonable amount of time.

From my own experiments, NTP clients quickly determine that Timesync machines are bad sources of time. The convergence algorythum used by Timesync is piss-poor as far as NTP is concerned. Historically NDS timestamps on the second, so it was only important to have time accurate to the second. Timesync does that. NTP provides time accurate to the .010 second in most cases, which is a very different animal. A downside to using NTP on NetWare is that NTP takes a lot longer to get 'in sync' than Timesync does.

After reading an AnandTech article today, I'm very happy with how things are moving in the PC world these days. One of my key obsessions in college was parallel computing, which was pretty neat. It was my candidate for a Masters if I went there, but unfortunately at the time parallel computing was extremely math-heavy and I was a solid C student in my math classes; hardly Masters material. So when the PentiumPro came out and could do dual CPU setups I was intrigued and very happy to see it.

These days parallelism is very much alive and kicking, though not in the form we were working on back in college. Back there the assumption was that you'd have n identical CPUs working on a problem. These days in gaming purpose computing you have a specialized CPU doing video computations, a specialized CPU doing physics computations, and a general purpose CPU (or two) doing traffic directing and other unspecific computations. Some systems even have a specialized CPU doing network I/O computations on the LAN card itself.

I know some rendering programs are figuring out how to use the GPU on high-end graphics cards to perform the renders at speed that the Intel/AMD general purpose CPU can't dream of yet. I have to expect that the same is true of the new Physics Processing Units that are going to come out in the not too distant future, though my imagination balks at what things besides games such a nifty toy can be repurposed for. Sound-cards were an early processor off-load, with the digital sound processors they bring into the mix, and that continues to this day. This is all nifty stuff.

And yet, in budget machines it is ALL handled by the one CPU. Perhaps there is a slim GPU on the graphics card, but on a budget machine that's most likely to be built into the chipset than a add-in card. Sound? Chipset. Graphics? Chipset. LAN? Chipset. And if you own a budget machine, chances are real good that you also own a budget printer that also has no internal number crunch capacity and relies on the over-worked CPU to do it all. The time it takes for a dumb ink-jet and a smart laser to start printing is night and day.

However, in the Enterprise server market, parallelism comes in a different flavor than it does in the Gaming PC market. Here the focus is on I/O performance to and from peripherals, mostly the LAN and disk channels. The disk-controllers themselves have significant amounts of intelligence in order to improve efficiency in the I/O channel. The LAN chipsets are intelligent in order to handle such complex things as adapter teaming or failover. Here things are designed so that the Gig-E LAN card can run full tilt at the same time as the Fibre Channel card and not worry about I/O contention on the internal busses. This is one reason why gaming motherboards don't always make good high-performance server motherboards. Memory latency is important in both models, though in the Enterprise Server you're much more likely to run into boards with enough memory slots to get up to and well past the 4GB line.

So yes, parallelism has come! But it isn't the vision of an 8-way box on everyone's desk like it was in 1996. More like a couple of general purpose CPU's and a handful of special purpose CPUs for offloading. Still works!

The problems I've been having with mod_edir since last June apparently are not unique to me. I've learned, definitively, that there are others out there. Not only that, other educational institutions. And an institution that is a bit larger than us, no less! It is nice to know you're not alone.