Re: Celebrating!!!

Willem,

we are trying to do the session for PinkRoccade internally before going to Muenchen as a try-out. If and when we succeed in doing that, I will ask if we can invite you there (argument: have someone that can ask the relevant questions). And maybe the VMS-SIG can be interested? That's all I can promise for now.

Re: Celebrating!!!

Jan,

VMS-SIG: Ok for the technical details, but the actual implications ("we had to convince management that the applications did NOT have to be stopped") are of higher importance OUTSIDE this SIG. Contact chairman and coordinator!

Re: Celebrating!!!

Hello Jan,

Congratulations from here too! One of the obvious questions here is how much of the infrastructure has been exchanged here over time. OS upgrades will be the trivial part, as well as adding/removing single nodes. How about the cluster interconnect, is this still the same?

Re: Celebrating!!!

> "we had to convince management that the applications did NOT have to be stopped"

He He - one of my former managers asked me to reboot the systems because they had been running for so long that even processes like ERRFMT had accumulated a noticable amount of CPU time. Unfortunately the whole cluster locked up after 497.5 days of uptime because some 32-bit counter inside the MicroVAX-3400 wrapped around and VMS was not able to deal with this at that time. :-(

Re: Celebrating!!!

Uwe,

I happen to know Jan and the environment. He made this remark one day (in a VMS-sig meeting) Jan published a description of the environment on OpenVMS.org: http://www.openvms.org/stories.php?story=03/11/28/7758863 (dated Nov. 28th 2003). In that description, you'll see that EVERYTHING has been upgraded during that period and the application was NOT brought down (unless fail-over, I presume).

Re: Celebrating!!!

for your entertainment...---VMS internal timekeeping is generally 64bits. However, elapsed time for aprocess is kept () as a longword. These times are in 10ms intervals, not100ns like a lot of timekeeping, so 31 bits lasts maybe 10ish months. Then itwraps into bit 31 and becomes an abs time, as you see.

----- A workaround/fix has been submitted to at least use 32 bits, not 31, expanding the range to 500 days or so. Yeah... Yeah... not enough for some, but how many other OSes do you know that would come close to this problem huh? (Plus... you can sort of back-track to see what the real value is based on the current + wrap-around :-).------

Re: Celebrating!!!

So now,

time to try and give some answers.

Ian: no plans yet, but: Que sera, Sera.

Willem: as you know, my co-presenter is also SIG-coordinator; let's see how things devellop.

Martin: yes, EVERYTHING was replaced. The oldest piece of hardware now present is aged just over half the cluster uptime.Has the interconnect changed? From 10Mb ethernet early on quite soon to mutli-mode fiber FDDI to single-mode FDDI (both with 10MB E fallback) to 100MB E with FDDI as fallbackLocation from 3 meters separation to 7 KM,OS from 6,2 to 7.1-2 to 7.2-1 to 7.3-1.All DBMS's were upgraded (most of them more than once), as were all applics.External communication changed from X25/DECnetPlus to TCP/IP (the most regretted change of all).Enough?

Uwe: never met that one, although I seem to remember that we DID have some VAX with node-uptime approaching 2 years (5.5-2) Hein: in view of your story: Is that possible, or is my memory exaggerating things?

Terry: my answer to Martin should show that for a "mere" patch we don't go down. It's called "rolling upgrade".

The issue of uptime in the future will be a political one: Big reorganisitions are being prepared, the current viewpoint being to throw it all away and start from scratch.Obviously not everyone concurs. :-)

All. Cluster uptime being a nice statistic, of course the real importance is application uptime AND reachability.Those statistics vary widely, although for the total running time we never lost ALL applics at the same time.Most users nowadays have a (ugh) MS desktop, usually via Citrix, with terminal emulation for VMS (and *IX) access. If that breaks down once again, obiously those users loose their VMS apps also. That's why some department insisted on staying on VT's. The call-room and the car-to-callroom-communication also don't use Billyware.The app with the best statistics is an RMS app (~ 4 M records in the various 'tables') of the VT-using department that was NEVER down... Except for the month Januari 2002.The app has a large financial aspect, and the supplier succeeded in NOT having the Euro version available until one month late..Second best is a DBMS app which has been out for a day three times during upgrade conversions.

Re: Celebrating!!!

Hello Jan,

I think I speak also for the rest of us here when I thank you for sharing your story with us. I actually did expect that you changed all the bits and pieces in your environment, but as FUD spreaders tend to belittle such achievements by claiming "its just a static box sitting in a corner" it is very nice to have this on record.

Re: Celebrating!!!

Antoniov,Well. -- I -- definitely am one of those guys who insist on using a VT!! It's much less tiring for my eyes, for one thing.

But you have to admit Billy G has achieved a lot, and not only financially:- "Everybody" now knows that from using computers you get RSI.- "Everybody" is used to 'computers' failing regularly, and then "you just hope you have not lost much"- "Everybody" knows that a computer or an operating system older than last year "is from the Stone Age", and not fit for today's use.

Execept of course those crazy people who keep telling that that's NOT true if you stay away from M$s*t...

Re: Celebrating!!!

Jan,> Uwe: never met that one, although I seem to remember that we DID have some VAX with node-uptime approaching 2 years (5.5-2)

It was a problem in a specific version of OpenVMS that only happened on the MicroVAX-3400 as I recall.

I have now read your story and join the club: "well done". It is a bit over 7.5 years now that I have worked as a full-time system manager, but at the last position I was responsible for a few small clusters scattered over Germany, so I think I have an idea what your job is about.

Re: Celebrating!!!

Ian,I'm not fully sure, but I --think-- they are measuring NODE uptimes, ie, time since last reboot. THERE we obviously don't score that high: Alpha ES series didn't even EXIST 7 years ago! (Neighter did VMS 7.3-1). They even state you are considered bogus if you report boottimes for OSses or hardware that didn't exist.All that will be in Muenchen: we don't plan a formal party, but wouldn't it be great to organise a "Now at least I know your faces" gettogether (maybe in a Biergarten)?

Re: Celebrating!!!

Keith,

sorry to miss you in Muenchen. And the HP World Chicago question was asked before, but that is impossible without a sponsor for travel & accomodation expenses.We have already needed special permissions to go over the allocated budget as it is now!