University of California at Berkeley
Department of Electrical Engineering & Computer Science
Instructional Systems Support Group
/share/b/pub/reports/manager/Spring_1994
Report on EECS Instructional Computing Facilities
-------------------------------------------------
Spring Semester 1994
by: Kevin Mullally, Manager of EECS Instructional Systems
week of May 30-April 3, 1994:
This was the week of Spring Break. Systems work was done to create more
user disk space, and to reduce the loads on Ara and Cory. A new file
server, Bard, was added. Bard is a Sun4 that was given to Instruction
by the Robotics group (it was formerly Zeus and Zephyr).
The /share/a, /share/b and /cad partitions were moved from Ara to Bard,
and a second file server (Congo) was set up for the Ara clients. The
central mail service for the Instructional system in Cory Hall was moved
off of Cory to a decicated mail server machine (Pasteur).
A new 670 MB filesystem was provided for sole use by CS162. The /cad
filesystem was increased by 300 MB for use by CS250.
Danube crashed on 4/3 at 11pm, was back in service on 4/4 at about 8am.
Cory crashed and rebooted on the 4/3 at 5:30pm.
week of April 4-10, 1994:
Danube crashed and rebooted 4/7 at 2pm, 4/10 at 8pm.
Cory crashed and rebooted on the 4/9 at 6pm.
A number of users complainted about email forgeries.
We received a complaint from outside the university about an offensive
posting to net news. Legal action was threatened. We disabled the
account of the student who was responsible.
week of April 11-17, 1994:
Bard was down from 11am-7pm on 4/11 due to a hardware failure. These
symtoms resulted:
- X11 not available for Ara clients (fixed around 3pm)
- Workview not available (fixed about 6pm)
- scm not available (fixed about 6pm)
- some users' logins would fail (fixed about 6pm)
- Hspice not available (fixed about 7pm)
Volga crashed and rebooted itself on the afternoon of 4/12.
Danube crashed and rebooted itself on 4/14 at 4am.
The Snake clients have been experiencing reboots (from crashes or from
users?) frequently at night. This was observed after the fact to have
happened on 4/6 at about 10:05am, 4/9 at about 11:10am and 2:05pm, and
4/16 between 12am and 11pm.
The Internet address for Moby was changed, and that broke the license for
"codecenter" for a couple of days.
There were a number of stolen mouse balls; most were returned. This is
happening with increasing frequency.
5-6 users complainted about people hogging the workstations, either by
playing games or running screen locks. These behaviors are prohibited,
but it is difficult to police against them.
week of April 18-24, 1994:
Danube crashed and rebooted:
4/17 at 1am, 4/19 at 1am, 4/20 at 7pm, 4/22 at 3pm
Volga crashed and rebooted 4/22 at 11am.
Po crashed and rebooted 4/23 at 4pm.
There were a number of "borrowed" mouse balls, one stolen mouse.
The /home/e filesystem (contains the CS9E accounts) was full for about 24
hours between 4/20 and 4/21. The sys admins cleared some space.
week of April 25-May 1, 1994:
Danube crashed and rebooted:
4/25 at 2am, 4/25 at 12:30pm, 4/27 at 1am, 4/29 at 2pm
Snake crashed and rebooted 4/26 at 4pm (caused by a full system disk).
Parker started denying much of its NFS service on 4/29 at about 3pm.
At 4:30pm it was rebooted to clear the problem. Users with home dirs
on Parker were affected: their attempts to compile (gcc) and run
HSPICE were thwarted by "NFS time out" and "can't start a new shell"
error messages. The problem was cleared by 4:45pm.
Po crashed and rebooted: 4/30 at 1pm, 5/1 at 8:30am
Users are complaining increasingly about other users who leave screen
lock programs running on idle workstations. There is no safe way for
the sys admins to detect this and prevent it, so we must rely upon
peer pressure and reports to prevent it.
week of May2-May 8, 1994:
This is the final week of the semester, before exams, and the workstation
labs are extemely crowded. There have been dozens of complaints to
"root" about other students who leave screen locks running on our work-
staions for hours. We have temporarily turned off some accounts for
that when we catch them.
Volga crashed and rebooted: 5/3 at 5pm, 5/4 at 10:30pm.
Danube crashed and rebooted: 5/1 at 8:30am and 4pm, 5/2 at 10am.
On May 2, we disabled further logins to Danube, which should reduce the
crashes (to 0, we hope). The users will get an explanation when they try
to login. They can still access their directories on Danube over the net
from other systems. Danube is running an older version of the operating
system, and until we upgrade that, the vendor is unable to diagnose the
cause of the crashes.
Po crashed and rebooted: 5/3 at 3pm, 5/4 at 11pm.
We are pushing DEC to look closely at the diagnostic kernel that we have
installed on Po, in the hopes that they will identify and fix the cause
of these crashes. We suspect that they are triggered by socket-related
coding in the final CS162 project, and we have asked those students to
avoid executing their "nachos" code on Po.
Torus started denying much of its NFS service on the evening of 5/1. It
failed to reboot properly the next morning; it seems to have a bad system
disk. We replaced it with another disk (taken from a workstation) and
the CS184 and CS284 home directories were available again by about 4pm
on 5/2. The Iris workstations were available again by about noon on
5/3.
week of May 9-May 15, 1994:
Po crashed and rebooted: 5/11 at 8am.
Volga crashed and rebooted: 5/09 at 4pm, 5/14 at 8:30pm.
Danube crashed and rebooted: 5/09 at 10pm. Users had been allowed to
login there again, by accident. While logins have been denied, Danube
has not crashed. Logins are now denied again.
We disciplined perhaps a dozen students for leaving screen locks on the
workstations. We have posted larger signs saying that screen locks are
prohibited.
week of May 16-May 22, 1994:
Parker started denying much of its NFS service again and was rebooted
on 5/17 at 11:30am. Users with home dirs on Parker were affected.
The problem was cleared by 11:45am.
We disciplined perhaps a dozen students for leaving screen locks and
playing games on the workstations. We have posted larger signs saying
that these pehaviors are prohibited.
Improvements, Spring '94:
- converted Parker HPs to dataful (improved performance)
- freed 2 GB for home directory disk space
- freed 670MB for CS162
- freed 400MB for CS250
- improved password file distribution routines
- installed a Sun4 file server
- decreased load on Ara file server; added second cluster server (Congo)
- moved mail server off of Cory onto a dedicated system
- worked with individual instructors to customize setups (CS61A, CS162)
- decreased NFS dependency: duplicated some critical software, moved
some disks to extra servers
- decreased NFS network load: installed automouters ("amd")
- set up X11 (xdm) default user interface on HPs; Vue is an option
- set up keymappings for CS clases on HPs; provide help files for
users about keyboard and windowing on each architecture
- obtained software for new CS2 course
- obtained HSPICE
- ported Berkeley scm, SPIM, etc to the HPs
- ported Berkeley scm to PCs and Macs (Gambit)
Problems, Spring '94:
- user disks filling up; downtime to fix that
- system crashes (mostly Danube, once Bard)
- performance delays (mostly from server and network bottlenecks)
- failed to survey the faculty in December about Spring computing needs
Improvements (pending), Fall '94:
- expanding to 4 nets (2 in Soda, 2 in Cory)
- adding 100+ HP workstations
- moving CS61B and CS61C from WEB to Soda labs
- CAP improvements to 199 Cory, adding networking for wkstns
- adding "Instructional Reports" documentation of events and performance
- porting nachos to HPs
University of California at Berkeley
Department of Electrical Engineering & Computer Science
Instructional Systems Support Group
Report on EECS Instructional Computing Facilities
-------------------------------------------------
April 1994
by: Kevin Mullally, Manager of EECS Instructional Systems
week of May 30-April 3, 1994:
This was the week of Spring Break. Systems work was done to create more
user disk space, and to reduce the loads on Ara and Cory. A new file
server, Bard, was added. Bard is a Sun4 that was given to Instruction
by the Robotics group (it was formerly Zeus and Zephyr).
The /share/a, /share/b and /cad partitions were moved from Ara to Bard,
and a second file server (Congo) was set up for the Ara clients. The
central mail service for the Instructional system in Cory Hall was moved
off of Cory to a decicated mail server machine (Pasteur).
A new 670 MB filesystem was provided for sole use by CS162. The /cad
filesystem was increased by 300 MB for use by CS250.
Danube crashed on 4/3 at 11pm, was back in service on 4/4 at about 8am.
Cory crashed and rebooted on the 4/3 at 5:30pm.
week of April 4-10, 1994:
Danube crashed and rebooted 4/7 at 2pm, 4/10 at 8pm.
Cory crashed and rebooted on the 4/9 at 6pm.
A number of users complainted about email forgeries.
We received a complaint from outside the university about an offensive
posting to net news. Legal action was threatened. We disabled the
account of the student who was responsible.
week of April 11-17, 1994:
Bard was down from 11am-7pm on 4/11 due to a hardware failure. These
symtoms resulted:
- X11 not available for Ara clients (fixed around 3pm)
- Workview not available (fixed about 6pm)
- scm not available (fixed about 6pm)
- some users' logins would fail (fixed about 6pm)
- Hspice not available (fixed about 7pm)
Volga crashed and rebooted itself on the afternoon of 4/12.
Danube crashed and rebooted itself on 4/14 at 4am.
The Snake clients have been experiencing reboots (from crashes or from
users?) frequently at night. This was observed after the fact to have
happened on 4/6 at about 10:05am, 4/9 at about 11:10am and 2:05pm, and
4/16 between 12am and 11pm.
The Internet address for Moby was changed, and that broke the license for
"codecenter" for a couple of days.
There were a number of stolen mouse balls; most were returned. This is
happening with increasing frequency.
5-6 users complainted about people hogging the workstations, either by
playing games or running screen locks. These behaviors are prohibited,
but it is difficult to police against them.
week of April 18-24, 1994:
Danube crashed and rebooted:
4/17 at 1am, 4/19 at 1am, 4/20 at 7pm, 4/22 at 3pm
Volga crashed and rebooted 4/22 at 11am.
Po crashed and rebooted 4/23 at 4pm.
There were a number of "borrowed" mouse balls, one stolen mouse.
The /home/e filesystem (contains the CS9E accounts) was full for about 24
hours between 4/20 and 4/21. The sys admins cleared some space.
week of April 25-May 1, 1994:
Danube crashed and rebooted:
4/25 at 2am, 4/25 at 12:30pm, 4/27 at 1am, 4/29 at 2pm
Snake crashed and rebooted 4/26 at 4pm (caused by a full system disk).
Parker started denying much of its NFS service on 4/29 at about 3pm.
At 4:30pm it was rebooted to clear the problem. Users with home dirs
on Parker were affected: their attempts to compile (gcc) and run
HSPICE were thwarted by "NFS time out" and "can't start a new shell"
error messages. The problem was cleared by 4:45pm.
Po crashed and rebooted: 4/30 at 1pm, 5/1 at 8:30am
Users are complaining increasingly about other users who leave screen
lock programs running on idle workstations. There is no safe way for
the sys admins to detect this and prevent it, so we must rely upon
peer pressure and reports to prevent it.
University of California at Berkeley
Department of Electrical Engineering & Computer Science
Instructional Systems Support Group
Report on EECS Instructional Computing Facilities
-------------------------------------------------
May 1994
by: Kevin Mullally, Manager of EECS Instructional Systems
week of May2-May 8, 1994:
This is the final week of the semester, before exams, and the workstation
labs are extemely crowded. There have been dozens of complaints to
"root" about other students who leave screen locks running on our work-
staions for hours. We have temporarily turned off some accounts for
that when we catch them.
Volga crashed and rebooted: 5/3 at 5pm, 5/4 at 10:30pm.
Danube crashed and rebooted: 5/1 at 8:30am and 4pm, 5/2 at 10am.
On May 2, we disabled further logins to Danube, which should reduce the
crashes (to 0, we hope). The users will get an explanation when they try
to login. They can still access their directories on Danube over the net
from other systems. Danube is running an older version of the operating
system, and until we upgrade that, the vendor is unable to diagnose the
cause of the crashes.
Po crashed and rebooted: 5/3 at 3pm, 5/4 at 11pm.
We are pushing DEC to look closely at the diagnostic kernel that we have
installed on Po, in the hopes that they will identify and fix the cause
of these crashes. We suspect that they are triggered by socket-related
coding in the final CS162 project, and we have asked those students to
avoid executing their "nachos" code on Po.
Torus started denying much of its NFS service on the evening of 5/1. It
failed to reboot properly the next morning; it seems to have a bad system
disk. We replaced it with another disk (taken from a workstation) and
the CS184 and CS284 home directories were available again by about 4pm
on 5/2. The Iris workstations were available again by about noon on
5/3.
week of May 9-May 15, 1994:
Po crashed and rebooted: 5/11 at 8am.
Volga crashed and rebooted: 5/09 at 4pm, 5/14 at 8:30pm.
Danube crashed and rebooted: 5/09 at 10pm. Users had been allowed to
login there again, by accident. While logins have been denied, Danube
has not crashed. Logins are now denied again.
We disciplined perhaps a dozen students for leaving screen locks on the
workstations. We have posted larger signs saying that screen locks are
prohibited.
week of May 16-May 22, 1994:
Parker started denying much of its NFS service again and was rebooted
on 5/17 at 11:30am. Users with home dirs on Parker were affected.
The problem was cleared by 11:45am.
We disciplined perhaps a dozen students for leaving screen locks and
playing games on the workstations. We have posted larger signs saying
that these pehaviors are prohibited.