Nuclear Computer Safety Fears

KPMG - Antony Upward,IVC
14 Oct 91 11:03 GMT

>From the London Independent on Sunday, October 13, 1991
Computer Watch On Nuclear Plant Raises Safety Fears
by Susan Watts, Technology Correspondent
Fears are growing that computer software designed to protect the Sizewell B
nuclear reactor from serious accidents is too complex to check.
Sizewell B will be the first nuclear power station in the UK to rely so heavily
on computers in its primary protection system. A computer-controlled safety
system was seen as superior to one operated by people because of the risk of
human error.
But Nuclear Electric has told the Independent on Sunday that the system for
Sizewell B, based around some 300-400 micro-processors, is made up of modules
which in total constitute more than 100,000 lines of code.
A number of software engineers close to the project are known to have suggested
that the software is now so unmanageable that is should be scrapped, and the
whole system built again.
The Nuclear Installations Inspectorate (NII), the public's watchdog on the
nuclear industry, has taken the unusual step of publishing the safety
requirements it is asking of Nuclear Electric, the company that will operate
Sizewell B, before the utility can expect a licence to go ahead with the power
station.
This is an attempt to calm the mounting anxieties of specialists in
"safety-critical" programs such as the protection system at Sizewell - where
lifes are at risk should the software fail.
Two senior inspectors describe the watchdog's requirements in a paper in the
latest issue of the trade journal Nuclear Engineering International. The paper
is unusual because the NII traditionally keeps its options open when decided
whether to grant a nuclear power station a licence. The onus is on the
operator to prove that its system is safe. Publishing this description of its
requirements gives a clear idea of what the NII expects of Nuclear Electric.
Independent experts in safety-critical software are not happy with the NII's
safety requirements. They say the paper shows the inspectorate is not asking
Nuclear Electric to use the most stringent testing procedures currently
available to prove that the software will work as specified.
They also criticise the inspectorate for not insisting on the most up to date
mathematical analysis that could give an indication of the software's
reliability.
These critics what Nuclear Electric to publish the results of its own internal
assessments and those of independent consultants whose date would give the rest
of the industry a chance to see just to how reliable the protection software is
meant to be.
The British Computer Society says it would welcome the chance to comment of the
safety case for the software. It is concerned about what it sees as "the
secrecy which surrounds the safety-critical software in the Sizewell B control
and production systems".
David Parnas, an advisor on a similar project at a nuclear reactor in
Darlington in Canada, agrees. "If somebody is introducing a technology with as
bad a reputation as software, then they are obliged to show that they have done
a really thorough analysis." Given the public interest, he says, the results
should be published.
The main worry is that the software, being produced by Westinghouse, an
American company, is so large and complex that it is impossible to verify that
is would react as it should if the reactor behaved dangerously.
A simpler system would be easier to verify and to maintain. But it would be
difficult, though not impossible, to find a politically acceptable route
whereby the existing system could be scrapped and development started again
from scratch.
The software is thought to have reached its size because it has many extra
features which, although desirable, have complicated its structure and blurred
the distinction between the software which controls the nuclear reactor and
that which protects it.
David Hunns, superintending inspector at the NII, and one of the authors of the
recent paper, says that this distinction is "fundamental". Nuclear Electric
insists that independence between the two systems "is fully maintained".
Mr Hunns adds: "We don't think it [the system] should be scrapped. We believe
a safety case can be made. But it has to be proven. We have made a judgement.
It's an honest and it's heart-searched judgement. We've looked hard at the
technology - all the aspects that we possibly can. We've developed a rationale
that is expressed [in the paper]. Providing the elements of that frame work
are fulfilled that [Nuclear Electric] will make it. If they are not fulfilled,
they won't. Then maybe they will wish they had started in another direction."
[Subsequent to the above, the following appeared — giving what I thought to
be a very good introduction to the complex problem of testing, and how it
relates to the Sizewell B Nuclear Power Station.]
>From the London Independent on Sunday, October 13, 1991
A Complex Problem of Tests, by Susan Watts, Technology Correspondent
Software engineers agree that it is not safe to assume that software will
always operate correctly - especially in systems such as nuclear power stations
where people's lives are at risk if the software fails.
But computer programs are notoriously difficult to verify. The chief anxiety
over the protection system for Sizewell B is that it is very large and complex.
This bucks the trend in safety critical software, which is to built small,
simple systems.
It is usually impossible to test software for all combinations of the inputs it
may receive - say from the reactor in a nuclear power station. It would take
thousands of years to test the most simple system.
An alternative approach is to rely on statistical analysis of the system to
give probabilities of its reliability. This would involve running the system
for long enough to get an idea of the chances of it failing. But even this
could take many years for a large system.
A more recent technique is to use so-called "formal methods". This involves
converting the specification for a piece of software into a precise
mathematical model of the requirements to "prove" that it works. But formal
methods are a new idea; the software for Sizewell B began life some eight
years ago, when the concept of formal methods was embryonic.
David Hunns, superintending inspector at the Nuclear Installations Inspectorate
and co-author of a recent paper on protection systems for Sizewell B, says the
case for insisting on formal methods is not clear cut. In his paper Mr Hunns
says "the NII has accepted that is not reasonably practicable to achieve their
incorporation for Sizewell B".
He adds that the viability of a second best approach, which involves
"backfitting" the mathematical techniques of formal methods to a completed
piece of software, also "remains unresolved".
Independent software engineers disagree. This method of backfitting is being
used to verify software at a nuclear reactor in Darlington in Canada with a
similar, although far smaller, computer-based protection system.
Nuclear Electric has designed its system such that is should fail no more that
once in every 10,000 "demands" (a demand is when the reactor is in a state
where the software should shut it down).
It is just about feasible that a system with such a failure rate could be
simulated by building another computer program to replicate the reactor,
hooking that up to the protection software and physically testing whether the
protection system behaves as it is designed to. This approach would be
time-consuming, and expensive.
Nuclear Electric is building a scaled down version of such a test system, and
expects results next summer. The test involves a prototype of one channel of
the protection software, but will not run for long enough to be described as
exhaustive. Mt Hunns insists that if the simulation shows the software
contains an unacceptable number of errors, NII will ask Nuclear Electric to
extend the test.
[Antony Upward, KPMG Management Consulting, Software Development Group,
8 Salisbury Square, London, UK EC4Y 8BB Phone: +44 71 236 8000]

Computer Error by Policeman

KPMG - Antony Upward,IVC
14 Oct 91 11:03 GMT

>From the London Guardian Friday October 11, 1991
Computer Error by Superintendent
A police superintendent discovered his former wife had a new man after checking
a car through the Police National Computer, Bow Street Magistrates court,
London, heard yesterday. Leslie Bennett, aged 44, based at Chelsea, west
London, asked another officer to access the computer about the car which his
former wife said belonged to a friend of their daughter's But the vehicle's
details related to the wife's firm. His daughter, Jane, then told him the man
was her mother's "new friend". Mr Bennett was found guilty of an offence under
the Computer Misuse Act 1990 and fined =L150 with =L250 costs. The case
followed a complaint by Mrs Bennett.
[ADDED NOTE: We learned on 13 Jul 2009 that Mr. Bennett's case was later
appealed and upheld; Mr Bennett received his fine and costs back. PGN]

Thermostat failure mode

<bukys@cs.rochester.edu>
Mon, 14 Oct 91 12:05:48 EDT

I have a typical electronic setback thermostat installed. A couple of nights
ago it failed "on", causing my furnace to run and run, until my three-year-old
woke up and came to tell me that she was hot. The temperature had reached 92
degrees(F).
The thermostat itself had decided that it was still 68 degrees(F). Rebooting
the thermostat by removing and re-inserting the batteries made it get back in
touch with reality. I replaced the batteries too, but, considering it had
enough power to run the LCDs, that's probably not it.
There are electronic setback thermostats that mount over existing mechanical
(mercury switch) thermostats. I always thought it was silly to have a little
motor move the arm up and down to cycle the furnace. But now I have to wonder
about what temperature my electronically-controlled furnace would be driven my
house to before either reaching thermal equilibrium or igniting or melting
something, especially if left to itself over a vacation. (I don't know yet
whether the furnace itself has its own thermal shutdown, but I doubt it.) At
least with a mercury switch in there, the most extreme setting is probably
still below where my good old all-electronic device would have taken my house.
Liudvikas Bukys <bukys@cs.rochester.edu>

I really like banks — world wide!

Boyd Roberts
<boyd@prl.dec.com>
Mon, 14 Oct 91 11:09:33 +0100

I'm Australian, but I now live in France. In the past few months I have had
some really gnarly problems with banks in Sydney, Paris, London and the West
Coast.
My `new' bank refuses to give me a cheque book, but they don't bother to tell
me why. I got this missive from one of the administrators here today:
Hi Boyd,
My telephone discussion with your bank this morning was another
piece of surprise !
Your checkbook request was refused because records showed Mr.
Roberts was under national bank interdict !!! I certainly
refused such a statement and discussed your personal data
with them. What actually happened is that a confusion
was made between two different Mr.Roberts (I do not understand
why, as nationality, birth date, address, are totally different ...)
Mrs.[deleted] is requesting today interdict removal on your name
and getting a checkbook for you. Checkbook should be ready
by next Monday. I'll keep you posted, as usual.
Boyd Roberts boyd@prl.dec.com

I'm sorry, the computer says your credit is bad

David Bremner
Mon, 14 Oct 91 14:29:03 PDT

>From an article on financial software in DEC Professional:
"Expert systems is [sic] another rapidly growing area in financial software,
notably in real-time applications found in banking, insurance and accounting
venues. Inference, for example, has developed expert systems credit approval
software Dun & Bradstreet and Swiss Bank"
Ah yes, I can see it being a major competitive advantage to be able to
propagate data-entry errors in "real-time" :-)
Reference: p. 54, Oct. 1991 DEC Professional ubc-cs!fornax!bremner

"Who Flies the Plane?"

<ken@minster.york.ac.uk>
14 Oct 1991 16:23:59 GMT

I recently caught a television programme on Channel 4 in the UK called "Who
flies the plane?" (part of a consumer affairs series of programmes). The
programme dealt with some of the issues of software in fly-by-wire commercial
aircraft, particularly the `human factors' problem. A number of notable people
in the aviation field were interviewed.
I wrote to Channel 4 and asked if I could have a transcript of the programme,
and permission to post excerpts to RISKS ("RISKS is a highly regarded
international forum on software safety .. blah blah .. welcome comment .. blah
blah .. "). The terse reply was along the lines of "The transcripts are too
costly, and Channel 4 owns the copyright, so no". And the RISK of this story?
I believe these programmes to be sensationalist, interested in viewing figures
rather than rational discussion of the issues.
Ken Tindell, Computer Science Dept., York University, YO1 5DD UK
+44-904-433244

Risks of Enterprise-Wide Phone Systems

David Fiedler
<david@infopro.UUCP>
Sat, 12 Oct 91 15:19:14 PDT

The other day, my wife called our local bank to discuss refinancing a
loan. Her call was transferred to a loan officer, and she made an
appointment for us to meet at the bank to discuss matters. When we got
there, nobody at the bank had ever heard of the loan officer. It finally
developed that the loan officer was based at another branch 35 miles away.
When transferring phone calls within a company's phone system is so easy,
customers have no way of knowing that "the bank" they were talking to was
somewhere else. Perhaps phone systems could be designed for the office
personnel to be notified when a call has been transferred from another
location, by a special ring or tone on the line.
David Fiedler UUCP:{ames,bytepb,mrspoc}!infopro!david
USMail:InfoPro Systems, PO Box 220 Rescue CA 95672 Phone:916/677-5870

AT&T Outage

Jerry Schwarz
<jss%summit@lucid.com>
Mon, 14 Oct 91 16:06:01 PDT

It is easy to focus on the proximate causes of accidents and even on the
general system level causes. But the recent AT&T outage in NY is a perfect
opportunity to ask a question about very high level causes that has bothered
me for a while. Can the recent rash of failures in the phone system be traced
to divestiture and price competition in the phone business? When new
technology (such as software of fiber optic cables) fails it is hard to
address this question. But here we had a failure in one of the oldest
technologies in the business. So we can ask the specific questions: Have
procedures or staffing levels in "power" changed since divestiture? Would
pre-divestiture procedures or staffing levels have prevented the recent
outage?
I have no particular knowledge of this area. Perhaps someone who does would
care to address my question.
Jerry Schwarz jss@lucid.com

Why have an all-or-nothing user interface? If you're at 105% of design
tolerance for G force, it isn't the same thing as exceeding the design by
200%. Tell the pilot by how far the design is being pushed, and let them
decide if the risk is warranted by the situation. The machine's job is to
make sure all the information necessary to produce a good decision is
available to the pilot, not to suddenly at some arbitrary cut-off point start
making the decisions for them.
Flint Pellett, Global Information Systems Technology, Inc.,
1800 Woodfield Drive, Savoy, IL 61874 (217) 352-1165 uunet!gistdev!flint

> If you don't what catastrophic failures, you need to arrange things so that
> the inevitable failures aren't catastrophic. (P.Rose in RISKS-12.47)
Despite the possible consequences of system design (including software) faults,
it is still the case that fire, flood, and other "natural" disasters can be
far more disastrous for a computer system, particularly a centralised one.
Earlier this week, the London Evening Standard carried a very small paragraph
reporting a fire at Hitchin College of Further Education. This drew my
attention, since it is a few miles from my home, and my ex-wife works there as
a secretary.
The fire started at 2 a.m., and was not discovered for an hour. By then the
building complex housing the total computing facilities of the college, both
educational and administrative, had been gutted. The back-up tapes, needless to
say, were stored on-site, close to the computers, in a non-fireproof cupboard.
Total damage to computer equipment was quoted in the paper as 10 million
pounds. This seemed a bit high to me (for a moderately-sized college with
no large mainframe), and the figure probably includes other damage (library,
dance studios, video equipment, etc., etc.) and a finger-in-the-air guess at
consequential loss. The last item seems to be incalculable, however, given
that the college's financial and student records have all been wiped out.
It is on the cards that Hitchin College will cease to exist as a separate
institution.
My ex-wife was surprised that the college had not been regularly exchanging
back-up media with its associated institutions in Letchworth and elsewhere.
I replied that this is exactly what I recalled ICL doing: 3-level back-up
cycled every day between sites, and all tapes stored in locked fire-proof
safes. I then recalled why ICL adopted this admirable policy. In the early
70s they lost an entire installation, including all back-up, when the night
operator dropped a cigarette into a waste-paper basket.
Do we always have to learn the hard way?
Peter Mellor, Centre for Software Reliability, City University, Northampton Sq.,
London EC1V 0HB +44(0)71-253-4399 Ext. 4162/3/1 p.mellor@uk.ac.city (JANET)

>We could make the humans the prime operators, and use the computers as a
>back-up...
This is exactly how the monorail system at Walt Disney World is
operated. There is a human driver who controls the speed of the train. There
are speed zones along the lines where, if the operator fails to keep the speed
under a prescribed value, the train will be shut off automatically. However,
the human is the primary operator.
George W. Leach, AT&T Paradyne, Largo, FL 34649-2826 USA 1-813-530-2376

> Most people reading this use a machine running Unix. Somewhere in its file
> system (usually /usr/spool/cron or /var/spool/cron) there is a directory
> `crontabs' containing files which describe actions to be executed regularly
> without explicit user action.
Granted --- but the system administrator should check these actions both when
installing the system and periodically thereafter. (I do so once a month; a
more secure environment would require more frequent checks.)
That many novice Unix sysadmins do not check their crontabs (by the way, you
neglected to mention the V7 and BSD /usr/lib/crontab) is a RISK, but is in
essence their fault. That modern "plug-and-play" Unixes do not provide an easy
means to do this for the novice sysadmin is a contributory factor; this can be
likened to the off-and-on "fly by wire" discussion on this list ("system
administration by wire"?). Modern Unix systems do not take this into
consideration, mainly because the systems of the past did not --- but those
systems required competent system administrators anyway, so it was not a
problem then.
Brandon S. Allbery allbery@NCoast.ORG uunet!usenet.ins.cwru.edu!ncoast!allbery

I think you are taking me a bit TOO literally. True, there are things which
run on 'my' systems which I don't explicitly know about each instance of.
There are also things (user programs) which I generally don't know about
at all, but which are necessary to our service.
The point is that (as an admin), for every 'proper' thing running on the
system I can, if necessary, point a finger at who is responsible for it,
who DOES know about it (if I don't), and who is responsible for making
sure it behaves sensibly (if that's not me). It's not that I know everything
that is happening, but I can, without needing to use heroic or unusual
procedures, find out what I need to know about it, and where I can get
help if required, in order to ensure continued service.
>It also seems that an automatic or semi-automatic bug correction service,
>working somewhat in the style of mail and news (that is to say, updating remote
>files in controlled conditions) wouldn't be such an absurdity as he suggests.
The salient points here are your 'in controlled conditions', and (in a bit
I've cut) providing that the machine owner/operator/administrator has
`subscribed' to such a service. (And that the group providing the service can
control it to ensure that only those sites which want it get it. We don't
even install official manufacturer-provided upgrades without first evaluating
them under test conditions, to make sure they don't interact unfortunately
with other things we run. It's surprising how often something can't be put up
exactly as supplied, without requiring other work. I've long been in favor of
automagic DISTRIBUTION of bugfixes, rather than having to wait for the
semi-annual release tape. But, with the present state of the art, I want to
look at them before I put them in.)

So far as I know no one is required by law to buy the products of Mr.
Mitchell's company. If mature adults wish to buy buggy software I do not see
why this should be any concern of Mr. Parnas.
A real risk is that laws will be passed requiring people to use
certain crackpot programming methodologies which purport to be better than
existing practice but which for some strange reason people refuse to adopt
voluntarily.
James B. Shearer

> What, exactly, is the fundamental difference between a
> time-sharing system and (say) a heterogeneous network?
> Answer: there isn't one...
In a closed, time-shared system, if the kernel is secure, all of the kernel
mode communications are secure. In a network environment, kernel messages
must travel over communication paths which are not guaranteed to be secure.
How long would a time-shared OS remain secure if user programs could monitor
queries made by kernel mode system calls to kernel databases?
By this line of reasoning, public key encryption is essential to the
development of reliable network based computing systems. If you can't rely on
secure communications to distribute a key, there is not an alternative.
Distributed networks are almost by definition insecure.
David States

Re: Software migration at Johnson Space Center

Richard H. Miller
Sun, 13 Oct 1991 20:18:07 CDT

This is starting to get off the the strict area and into computer religion was
but I do need to make the following points: [We run both 2200 and VAX systems.]
1) The high end VAX does not even begin to compare to the high end Unisys
systems. A 2200/644 is a large mainframe system and the new 2200/900 is even
more powerful. A large VAX 9000 will not provide the same level of performance
as a fully configured 2200/600 or 2200/900.
2) A Unisys machine will run absolutes created 15 years ago. There is no
requirement for recompiling or relinking.The same application code will run
and most of the same system control software across all processors.
Richard H. Miller, Asst. Dir. for Technical Support, Baylor College of Medicine
One Baylor Plaza, 302H Houston, Texas 77030 Voice: (713)798-3532

Where are the Silicon Graphics advocates??? From my recent research, SGI has
a line of equipment much better suited to simulation than anything in the
VAX line (more horsepower, better graphics)- at a small fraction of the price.
Tim Parker - Independent Consultant

Informatik journal available

Duane
Fri, 11 Oct 91 10:23:52 CDT

Announcing the first issue of 'Informatik,' a journal of free information.
Currently available by FTP from: uunet.uu.net /tmp/inform1.Z
ftp.cs.widener.edu /pub/cud/misc/inform-1.1.Z
Here is an excerpt from the introduction:
/* Introduction */ By the Informatik staff
Welcome to the inaugural issue of Informatik, an electronic periodical
devoted to the distribution of information not readily available to the public,
with a particular emphasis on technology and the computing world. First and
foremost, this publication is dedicated to the freedom of information. This
journal is made possible by The First Amendment of the U.S. Constitution which
states:
Congress shall make no law respecting an establishment of religion,
or prohibiting the free exercise thereof; OR ABRIDGING THE FREEDOM
OF SPEECH OR OF THE PRESS; or the right of the people peaceably to
assemble, and to petition the Government for redress of grievances.
In this and coming issues, we plan to exercise our First Amendment rights to
the best of our ability. We will print feature articles on hacking, phreaking,
and various other illicit activities. We also plan on bringing you recent news
and gossip from the underground, anything news of interest to hackers,
phreakers, grifters, cyber-punks, and the like. Informatik will also provide a
plethora of information on the inner workings of corporate America and the U.S.
Government.
DO distribute this freely! Remember this is not illegal, this is information.
*Please send submissions and comments to duane@shake.tamu.edu. (for now)*
Mack Hammer & Sterling [Editors]