Re: Intermittent RWSCS state - VMS

This is a discussion on Re: Intermittent RWSCS state - VMS ; Marty Kuhrt wrote on 09/09/2008 12:38:20 PM:
> Richard Brodie wrote:
> > wrote in message news:OF1D8E67E9.
> 7DD1DEB5-ON852574BF.00508FDB-
> >> Okay, thanks so now we know what they are, but not if things are
> good or bad.
> ...

Re: Intermittent RWSCS state

Marty Kuhrt wrote on 09/09/2008 12:38:20 PM:
> Richard Brodie wrote:
> > wrote in message news:OF1D8E67E9.
> 7DD1DEB5-ON852574BF.00508FDB-
> >> Okay, thanks so now we know what they are, but not if things are
> good or bad.
> >
> > I don't think the CR_WAITS in the VMS$VAXcluster sysap are good news.
> > That's sort of locky rather than bulk traffic. LOCKDIRWT too high on
an old
> > VAX? Beyond that I would grab one of Keith Parris' presentations on
cluster/
> > lock manager performance.
> >
> >
>
> I ran into something similar once that was triggered by my cat slightly
> pulling the cat-5 cable (cats and cat5 don't mix). Not so much that the
> network connection was completely lost, but this VAX, in a cluster of
> Alphas, was joining and leaving the cluster as fast as it could.
> Everybody else in the cluster was stalling waiting for the transition to
> finish, which would restart immediately. When I finally got logged into
> a machine console via a VT and did a reply/enable to see OPCOM stuff, I
> saw the endless stream of joining/leaving messages from the culprit
> VAX. Pulled the network cable, reseated it, and all was well when the
> storm subsided.
>
> I've also seen something like this as well when a hub/switch went "kinda
> bad".
>
> Since the VAX in OP's question is probably only talking to the cluster
> via its 10M network cable, it might be as simple as a that.

No such luck. Talking on FDDI for SCS traffic. 10MB network for other.
> Bad card,
> bad cable, bad hub/switch, etc. Babble, babble, babble, and the next
> thing you know you're knee deep in RWSCS.
>
> Since this is a home cluster I use for porting and development work, I
> don't normally have OPCOM enabled, or do much logging. VMS machines
> just run right, right? ;^)

Re: Intermittent RWSCS state

norm.raphael@metso.com wrote:
>
> Marty Kuhrt wrote on 09/09/2008 12:38:20 PM:
>
> > Richard Brodie wrote:
> > > wrote in message news:OF1D8E67E9.
> > 7DD1DEB5-ON852574BF.00508FDB-
> > >> Okay, thanks so now we know what they are, but not if things are
> > good or bad.
> > >
> > > I don't think the CR_WAITS in the VMS$VAXcluster sysap are good news.
> > > That's sort of locky rather than bulk traffic. LOCKDIRWT too high
> on an old
> > > VAX? Beyond that I would grab one of Keith Parris' presentations on
> cluster/
> > > lock manager performance.
> > >
> > >
> >
> > I ran into something similar once that was triggered by my cat slightly
> > pulling the cat-5 cable (cats and cat5 don't mix). Not so much that the
> > network connection was completely lost, but this VAX, in a cluster of
> > Alphas, was joining and leaving the cluster as fast as it could.
> > Everybody else in the cluster was stalling waiting for the transition to
> > finish, which would restart immediately. When I finally got logged into
> > a machine console via a VT and did a reply/enable to see OPCOM stuff, I
> > saw the endless stream of joining/leaving messages from the culprit
> > VAX. Pulled the network cable, reseated it, and all was well when the
> > storm subsided.
> >
> > I've also seen something like this as well when a hub/switch went "kinda
> > bad".
> >
> > Since the VAX in OP's question is probably only talking to the cluster
> > via its 10M network cable, it might be as simple as a that.
>
> No such luck. Talking on FDDI for SCS traffic. 10MB network for other.

Are you certain? IIRC, the default cluster configuration is to enable all
SCS-capable circuits, and normally all the traffic would end up on the
fastest one (FDDI), but if there was a momentary failure or excessive
congestion on the FDDI, it might have failed over to the ethernet, thus
hitting the VAX's 10Mb bottleneck, and then never failed back. I
think the show cluster circuit counters should reveal if this has
happened. (I think the 2nd example shows circuit counters by circuit,
but not circuit names, so I can't tell which is which, though possibly a
cluster expert could.)

There is a way to force it to use *only* the FDDI, and I think there's
a way to force to fail back to FDDI if for some reason it has failed
over to the Ethernet.

HTH.
>
> > Bad card,
> > bad cable, bad hub/switch, etc. Babble, babble, babble, and the next
> > thing you know you're knee deep in RWSCS.
> >
> > Since this is a home cluster I use for porting and development work, I
> > don't normally have OPCOM enabled, or do much logging. VMS machines
> > just run right, right? ;^)

--
John Santos
Evans Griffiths & Hart, Inc.
781-861-0670 ext 539

Re: Intermittent RWSCS state

"John Santos" wrote in message
news:shYxk.16$ia.0@nwrddc02.gnilink.net...
> There is a way to force it to use *only* the FDDI, and I think there's
> a way to force to fail back to FDDI if for some reason it has failed
> over to the Ethernet.
> --
> John Santos
> Evans Griffiths & Hart, Inc.
> 781-861-0670 ext 539

Look for SYS$EXAMPLES:LAVC$STOP_BUS

Re: Intermittent RWSCS state

John Santos wrote:
> norm.raphael@metso.com wrote:
>>
>> Marty Kuhrt wrote on 09/09/2008 12:38:20 PM:
>>
>> > Since the VAX in OP's question is probably only talking to the cluster
>> > via its 10M network cable, it might be as simple as a that.
>>
>> No such luck. Talking on FDDI for SCS traffic. 10MB network for other.
>
> Are you certain? IIRC, the default cluster configuration is to enable all
> SCS-capable circuits, and normally all the traffic would end up on the
> fastest one (FDDI), but if there was a momentary failure or excessive
> congestion on the FDDI, it might have failed over to the ethernet, thus
> hitting the VAX's 10Mb bottleneck, and then never failed back. I
> think the show cluster circuit counters should reveal if this has
> happened. (I think the 2nd example shows circuit counters by circuit,
> but not circuit names, so I can't tell which is which, though possibly a
> cluster expert could.)
>
> There is a way to force it to use *only* the FDDI, and I think there's
> a way to force to fail back to FDDI if for some reason it has failed
> over to the Ethernet.
>
> HTH.

Now that I think on it, was there a FDDI interconnect for VAXen? I
vaguely remember that Nemonix was making an after market one, but I
don't remember a "native" one. Of course, that doesn't mean too much,
since I occasionally forget I have my glasses on my head. ;^)

Re: Intermittent RWSCS state

"Marty Kuhrt" wrote in message
news:r8Gdnf9Z3fma2FTVnZ2dnUVZ_q_inZ2d@speakeasy.ne t...
> Now that I think on it, was there a FDDI interconnect for VAXen?

There certainly were; there was the DEFQA. It was 100Mb
Ethernet that Digital didnt sell for VAXen. Admittedly running
lock traffic over Q-bus with disk IO over Fibre Channel seeems
a bit of a mismatch.

Assuming it's not big iron, that is. It could be a BI or XMI FDDI
adapter.

Re: Intermittent RWSCS state

On Sep 11, 5:19 pm, Marty Kuhrt wrote:
> John Santos wrote:
> > norm.raph...@metso.com wrote:
>
> >> Marty Kuhrt wrote on 09/09/2008 12:38:20 PM:
>
> >> > Since the VAX in OP's question is probably only talking to the cluster
> >> > via its 10M network cable, it might be as simple as a that.
>
> >> No such luck. Talking on FDDI for SCS traffic. 10MB network for other.
>
> > Are you certain? IIRC, the default cluster configuration is to enable all
> > SCS-capable circuits, and normally all the traffic would end up on the
> > fastest one (FDDI), but if there was a momentary failure or excessive
> > congestion on the FDDI, it might have failed over to the ethernet, thus
> > hitting the VAX's 10Mb bottleneck, and then never failed back. I
> > think the show cluster circuit counters should reveal if this has
> > happened. (I think the 2nd example shows circuit counters by circuit,
> > but not circuit names, so I can't tell which is which, though possibly a
> > cluster expert could.)
>
> > There is a way to force it to use *only* the FDDI, and I think there's
> > a way to force to fail back to FDDI if for some reason it has failed
> > over to the Ethernet.
>
> > HTH.
>
> Now that I think on it, was there a FDDI interconnect for VAXen? I
> vaguely remember that Nemonix was making an after market one, but I
> don't remember a "native" one. Of course, that doesn't mean too much,
> since I occasionally forget I have my glasses on my head. ;^)

There was FDDI from DEC for TURBOCHANNEL (the DEFTA). And there were
VAXes (eg VAXstation 4000s?) with TURBOCHANNEL. I'm pretty sure there
was DEFTA support, on VAX, in at least some versions of VMS, though I
don't recall actually ever seeing that combination (whereas I knew of
lots of DEC 3000s with FDDI, especially where resilience was/is of
interest). A more definitive answer would be the VAX/VMS SPDs
themselves.

Re: Intermittent RWSCS state

Marty Kuhrt writes:
>Now that I think on it, was there a FDDI interconnect for VAXen? I
>vaguely remember that Nemonix was making an after market one, but I
>don't remember a "native" one. Of course, that doesn't mean too much,
>since I occasionally forget I have my glasses on my head. ;^)

Here's an oddball.

Around 1990, there was actually an FDDI interconnect being developed that
sat on a SCSI bus, intended for VAXstation 3100 series systems. My
baptism by fire in VMS drivers was to write a driver for this thing! I
got it to run in a LAN Cluster, and shortly thereafter the project was
cancelled.

Re: Intermittent RWSCS state

Marty Kuhrt skrev:
> John Santos wrote:
>> norm.raphael@metso.com wrote:
>>>
>>> Marty Kuhrt wrote on 09/09/2008 12:38:20 PM:
>>>
>>> > Since the VAX in OP's question is probably only talking to the
>>> cluster
>>> > via its 10M network cable, it might be as simple as a that.
>>> No such luck. Talking on FDDI for SCS traffic. 10MB network for other.
>>
>> Are you certain? IIRC, the default cluster configuration is to enable
>> all
>> SCS-capable circuits, and normally all the traffic would end up on the
>> fastest one (FDDI), but if there was a momentary failure or excessive
>> congestion on the FDDI, it might have failed over to the ethernet, thus
>> hitting the VAX's 10Mb bottleneck, and then never failed back. I
>> think the show cluster circuit counters should reveal if this has
>> happened. (I think the 2nd example shows circuit counters by circuit,
>> but not circuit names, so I can't tell which is which, though possibly a
>> cluster expert could.)
>>
>> There is a way to force it to use *only* the FDDI, and I think there's
>> a way to force to fail back to FDDI if for some reason it has failed
>> over to the Ethernet.
>>
>> HTH.
>
> Now that I think on it, was there a FDDI interconnect for VAXen? I
> vaguely remember that Nemonix was making an after market one, but I
> don't remember a "native" one. Of course, that doesn't mean too much,
> since I occasionally forget I have my glasses on my head. ;^)

There was atleast one FDDI controller for the Q-bus from DEC. Can't remember the
name of it.
(I wonder if anyone every tried using that on a PDP-11... :-) )

Re: Intermittent RWSCS state

Michael Moroney wrote:
> Marty Kuhrt writes:
>
>
>>Now that I think on it, was there a FDDI interconnect for VAXen? I
>>vaguely remember that Nemonix was making an after market one, but I
>>don't remember a "native" one. Of course, that doesn't mean too much,
>>since I occasionally forget I have my glasses on my head. ;^)
>
>
> Here's an oddball.
>
> Around 1990, there was actually an FDDI interconnect being developed that
> sat on a SCSI bus, intended for VAXstation 3100 series systems. My
> baptism by fire in VMS drivers was to write a driver for this thing! I
> got it to run in a LAN Cluster, and shortly thereafter the project was
> cancelled.

One of my customers had some VAX 6000-series systems with FDDI, (XMI-based
I think, not BI), so it was definitely real and supported.

--
John Santos
Evans Griffiths & Hart, Inc.
781-861-0670 ext 539

Re: Intermittent RWSCS state

Am having problem with IMAP server.

This runs on a node that has direct access to all the necesasry disks.
So no clustering features should be needed, right ?

SHOW SYS reveals it is in RWSCS state. (everytime I do show sys).

But SHOW PROC/CONT never shows it in RWSCS. It does show it in MWAIT as
well as normal COM/HIB/LEF states.

This is alpha VMS 8.3

The process serves about 200 message headers then dies. The client then
restarts at message 1. Perhaps some message at the end which causes imap
to crash, and the client then restarts to download the messages database
from sratch.

However, it is interesting to see the discrepency between SHOW SYS and
SHOW PROC/CONT in terms of process status.

Re: Intermittent RWSCS state

"JF Mezei" wrote in message
news:48cad299$0$12384$c3e8da3@news.astraweb.com...
> This runs on a node that has direct access to all the necesasry disks.
> So no clustering features should be needed, right ?

You still need to co-ordinate lock operations for file open/close etc.
> But SHOW PROC/CONT never shows it in RWSCS. It does show it in MWAIT as
> well as normal COM/HIB/LEF states.

RWSCS is a subtype of MWAIT.

Re: Intermittent RWSCS state

In article , "Richard Brodie" writes:
>
> "JF Mezei" wrote in message
> news:48cad299$0$12384$c3e8da3@news.astraweb.com...
>
>> This runs on a node that has direct access to all the necesasry disks.
>> So no clustering features should be needed, right ?
>
> You still need to co-ordinate lock operations for file open/close etc.
>

I had a fellow set up to HP-UX systems with their sendmail accessing
a shared data store via NFS mount (i.e. no locking). Lasted abot
20 minutes until someone broadcast an email that both those systems
saw.