Building Expert Systems

This module explores the development of expert systems (ES).
Much of the material contained in this module is summarized
from:
Jones, D.D. and J.R. Barrett. 1989. Building expert
systems. In J.R. Barrett and D.D. Jones (eds).
Knowledge Engineering in Agriculture.
ASAE Monograph No. 8, ASAE, St. Joseph, MI.

When to Use Expert Systems

ES are not suited to all types of problems.
Initially, many developers actively sought problems
amenable to ES solution or tried to solve all problems
encountered using ES. As experience has been gained,
attention has become more properly focused on the
problems to be solved rather than on the solution technique.
Note that in this course, we are focusing on systems engineering
techniques and tools and thus have been and will be quite
concerned about the solution technique.

Some problems can be described using existing algorithms,
or by using a statistical evaluation method.
Other problems, that are not as well defined, that are
ill-structured and that currently require the help of a
human expert, may appropriately be solved
using an ES. In effect, the techniques are rapidly
becoming, along with simulation and other conventional
programming, important tools available to solve a wide
range of problems. Incompleteness of information
is characteristic of problems suitable for solution with ES.

The "telephone test" can often be used to help determine
if a problem that cannot readily be solved using traditional
methods is amenable to ES solution.
If the domain expert can solve the problem via a telephone
exchange with the end-user, an ES program can probably
be developed to solve the problem. On the otherhand,
if the user is unable to describe the problem verbally,
or if the expert is unable based on the telephone interview
consistently to conclude a reasonable solution,
then ES development will likely be unsatisfactory.
The telephone test assures that the expert is not
gaining additional information about a problem
from other senses and insures that the user is able
to adequately describe the problem in words (important
since the user of an ES will be required to describe
the problem adequately).

ES Development Stages

If an ES solution is appropriate, one should
approach the development in a systematic fashion
much like the systems methodology steps
and the model development steps examined
earlier in the semester.
The process is largely one of refinement and expansion
of a prototype.
The knowledge base increases in both depth and breadth
with organizational and representational improvements
while helping guide successive stages of development.
The prototype becomes the basis for further development,
whether it is refined or discarded and the process restarted.
It helps identify approaches that have the most merit and
others that should be discarded.
These decisions can be made early, minimizing the cost of
development.

Rapid prototyping provides a glimpse of what the completed
product will be like. It is important to communicate
and follow progress in any project, not only for funding
agencies and supervisors, but also for the domain expert
who is interested in making best use of valuable time.
Prototypes should be documentable indicators of progress.
This is a primary strength of ES in comparison to
conventional programming approaches.

Several general approaches for developing ES have been proposed.
Waterman (1986) has provided the most widely accepted approach:

These stages are highly interrelated and interdependent.
An iterative process continues until the software
consistently performs at an acceptable level.
Note that the above steps are essentially those
of model development and
methodology of systems analysis.

Identification is the requirements analysis step
carried out in traditional software development. It involves
a formal task analysis to determine the external requirements,
form of the input and output, setting where the program
will be used and determines the user (Very important!).
The participants, the problems, the objectives,
the resources, the costs and the time frame need to
be clearly identified at this stage.

The participants are the group sponsoring the effort, the domain
expert and the knowledge engineer. Choosing an appropriate
domain expert is essential to the success of the project.
The domain expert should be a
legitimate authority
in the subject matter area, as software must possess
high quality knowledge, and this person must have time
and interest to commit to the project. Not only is a
personal commitment by the expert required, but the
administrative
support of the employer is needed to relieve the person
of some existing duties. Development is not trivial,
especially when attempted on top of an already full-time
job assignment.

Although use of human domain experts is the typical
method of development, it should be noted that several
successful programs have been produced using reference
materials only, or with minimal involvement of a human
domain expert. This may seem to contradict the more or less
accepted ES definition, however, these programs do make
use of programming techniques such as backward chaining
to find values for program parameters, explanation of program
logic, etc. It is subjective whether a first-hand expert
per se must exist, or whether an interpreter of knowledge
can suffice. Programs that do not rely on an expert
are commonly referred to as knowledge-based systems, knowledge
systems or rule-based systems.

Most of these interpretations are as database queries,
used to enhance the finding of relevant information in a
thick reference manual such as of weed or chemical information,
or perhaps to locate specific information in a large diagnostics
manual. It can be argued that ample human expertise was
involved, not only in preparing the initial reference material,
but also by a programmer knowledgeable in the subject matter
area in converting the information into the
format and sequence needed to solve the problem or answer
the question. This blurring of definition of the
traditional ES development process will continue
as knowledge engineering techniques pioneered by AI
researchers are incorporated alongside conventional
programming languages into programs and database management
software.

To justify the time and cost of development,
the problem must be important to a funding organization
and be clearly defined. Although a developer can't ignore
interactions between the problem and the rest of the
subject matter domain, efforts should be made to limit
the problem domain so that the recommendations
of the program will be specific and valuable instead
of generally educational. Choosing depth
over breadth not only makes the program more
powerful and useful, but also more efficient
by minimizing the amount of information that must
be obtained from the user before a recommendation can be made.
For example, it would be more efficient for a user
who has a problem with soybean pests to run a program
dealing with pests in soybeans, rather than a program
dealing with soybean production in general,
being forced through a lengthy series of questions,
or menus, before finally arriving at a subset
of the program that deals with pests.

Specific goals or purposes of the software must
be accepted by all parties. Objectives need to be
more than problem solving. It is essential to carefully
consider the background and needs of the end-user.

As important and as obvious as a properly designed user
interface may seem to be, it is often neglected.
Often the struggle to complete the knowledge base
is so difficult and time consuming, developers have
little energy left for the user interface.

Funding and time are major resources to be considered.
Additional resources to be identified include the
knowledge sources, computer hardware and development software.
As with all programming projects, these estimates are difficult,
but they must be realistic.
Budgeted costs should include the cost of lost
productivity by the expert and the programmer who will
be devoting time to the effort and the ongoing cost
of maintaining the knowledge base.
By the same token, the expected benefits must
include an estimate of the savings of valuable
time in future years.

Some estimate of the useful life
of the program should be made.
Additional questions include how frequently the
expertise will be needed, the cost and availability
of alternate methods of solving the problem and the likely
acceptability in the workplace.
A realistic appraisal of the costs and benefits
can help establish the level of program detail that can be
justified.

The hardware available for delivery can greatly affect
the choice of computer used for development, since the
developer must determine the extent of help messages,
graphics, the form of question asked, the extent
and format of output and the need to interact with other
programs and databases.
Many troubleshooting and classification problems
require input based on results of sensory examination
(visual, smell, feel, etc) of an environment.

High resolution color graphics should be especially
useful in agriculture troubleshooting or classification
applications. High quality, inexpensive PC graphics
as well as high resolution color scanners and
video capture devices should be used where advantageous
to reduce potential confusion on the user's part
in answering questions posed
by the program or in interpreting program output.
The less experienced the end-user is with computer
hardware and software, the more effort must be taken
in the deign of the user to machine interface.
ES have the added advantage of being more transparent
(program flow can be presented to the user on demand)
than conventional programs, an ability that should
be exploited if the user is likely to be
skeptical of "black box" computer output.

The second stage of ES development, conceptualization, involves
designing the proposed program to ensure that specific
interactions
and relationships in the problem domain are understood and
defined.
The key concepts, relationships between objects and processes
and control mechanisms are determined.
This is the initial stage of knowledge acquisition.
It involves the specific characterization of the situation and
determines
the expertise needed for the solution of problem.

The following questions may be used by the knowledge
engineer to help understand what the expert does:

At what point after exposure to influential inputs is a decision
made?

Given the particulars of a specific case, will the outcome
predictions
of the knowledge engineering team be consistent with those of the
expert?

One of several or combinations of several knowledge acquisition
methods
are used. Additional details are provided
in the Knowledge Acquisition module.

A typical approach would be to characterize the questions the
end-user might pose to the domain expert and the range of
possible
solutions. One method of getting started is to begin with a
range
of final recommendations, and then build pathways to these.
For example, in ES development to troubleshoot environmental
problems in animal production facilities (simplified for the
example),
the top level of programming might involve the following typical
symptoms and recommendations:

The development process beyond this point is mainly
one of refinement and addition of detail once this top level
is in place. For instance, in number one above, additional
information would be added to help determine whether
the hypothesis "animals too cold" is true. This is not as simple
as it might seem on the surface, since the temperature of the
building alone is not an accurate index of animal comfort.
Other considerations include whether the floor is dry and well
bedded,
the flooring material in use, whether the building is drafty,
where in the pen the animals tend to stay, whether all
animals in the building have similar symptoms or if the
problem is an isolated occurrence,
whether animals are stretched out or huddled next to one another,
if their hair is laid back or on end,
or if they are noticeably shivering.

Additionally, greater detail is needed to determine a specific
remedy. The final recommendation in item number one will depend
on the answers to questions that prove or disprove the hypothesis
that the animals are too cold, and if they are cold, what is the
cause.
For example, if it is established that there are low insulation
levels in the building, final recommendations will depend on the
type
and age of animal housed, climatic conditions in summer and
winter for
the building location, whether the animals will be in physical
contact with the wall containing the insulation material, and
on state and local building regulations and fire codes.
Similarly,
the type of heater recommended depends on the type and age of
animals
housed, the type and condition of building, local regulations,
type and cost of fuel available, climatic conditions, type of
ventilation system used, etc.
As can be seen, the knowledge base evolves during this
refining process to provide a recommendation as accurate
as that made by the human expert.

The job of the knowledge engineer is to identify the knowledge
sources
required by the domain expert when making a specific
recommendation,
i.e. determine the reference books to be consulted, calculations
to be made (or other computer programs executed) and what
rules-of-thumb (heuristics) come into play.
Information the user will likely not know should
be determined and represented by additional rules
or other knowledge structures.
Additional information needed to apply these rules can then
be obtained from the user or additional rules created.
This structure is typically created through frequent
and intensive interview sessions with the domain expert.

Opportunities to group, rank and order knowledge should be
sought.
In the ventilation problem for example, once the expert knows
that the housed animal is farrowing or nursing, he automatically
discards large portions of the knowledge base dealing with larger
animals, thus narrowing the search space.
Often, the expert is presented with 3-5 potential problem
scenarios at each interview session with the knowledge
engineer who poses as the end-user,
perhaps as an inquisitive user who continually asks the expert
the purpose of his question and detailed justification
of his answers.
This is somewhat like a persistent child asking Why?

The information that is collected an analyzed forms the basis
of the scenarios to be presented in the next session with the
expert.
Correctly and completely describing the expert's problem
solving logic is difficult because true experts usually
do not know exactly how they reach a decision and are therefore,
often unable effectively to verbalize their own problem solving
process.
The careful study of detailed cases often reveals consistent
patterns
in the solution process that are still obscure. Needed
refinements
to the concepts and relationships will become apparent
during in-depth analyses.
In addition, tape recordings of interviews between the
expert and clients can be useful, if all parties agree to the
taping.
This can identify points that might normally be overlooked by a
controlled session between the expert and the knowledge engineer
acting out the role of the user.
It can also help prevent the process from becoming an academic
exercise and ensure that the needs of the end-user are met.

It is easy to document everything that is known about a
subject and in the process lose sight of the original problem
intent.
For example, to develop a system to make recommendations on weed
control, it is tempting to create a program that specifically
identifies the genus of all possible weeds found in a region.
This would require extensive amounts of input from the user
that may not be necessary.
Perhaps the only relevant information is whether the weed
is a broadleaf or a grass, whereon one of two herbicide
types approved for the specific crop would be recommended.

Several ES development tools have inductive features that
allow the creation of rules based on examples created
by the expert. Such approaches to development are often
useful for classification problems. Neural networks
also function in a somewhat similar manner and will
be explored in future assignments.

Formalization involves organizing the key concepts,
subproblems and information flow into formal representations.
In effect, the program logic is designed at this stage.
It is often useful to group or modularize the knowledge
collected, perhaps even attempting to display the problem
solving steps graphically.

In effect, it is the job of the knowledge engineers to build a
set
of interrelated tree structures for representing the knowledge
base.
They must decide the attributes to be determined to solve the
problem and then which of these attributes should be asked
of the user or represented by an internal set of decision trees.
While decision trees are appealing in their simplicity and are a
good
way to begin formalizing knowledge into a knowledge
representation scheme that can be visualized, things are rarely
this simple in practice and rigid adherence to a tree structure
is seldom satisfactory.

The representation of knowledge is important for credibility
and acceptance by the user. The questions asked and the rules
examined should be in the same sequence as used by the human
expert.
The questions and their order are determined
by presenting the expert with several detailed scenarios.
The granularity and structure of the concepts, including how
the concepts relate into a logical flow and how uncertainties
are involved, are coordinated in making recommendations.

The problem domain is analyzed to uncover obscure behavioral
and mathematical models that may exist within the decision
making process. The characteristics of the information needed
are recognized. It follows that as the uncertainties are defined
and explained, the relationships involved become better
understood and ultimately may be explained using conventional
programming techniques in a more expedient manner.
Correspondingly, the program development process functions
as a knowledge gatherer that can be used to explore poorly
understood relationships.

It is difficult to separate the conceptualization phase
from the formalization phase and, in
reality, knowledge-base
design proceeds almost in parallel with knowledge acquisition.
The two items that are the most important in the formalization
stage are: (1) refinement of the knowledge pieces into
their specific relationships and hierarchy and (2) more
accurate determination of the expected user interaction with the
system.

During the next stage, implementation, the formalized
knowledge is mapped or coded into the framework
of the development tool to build a working prototype.
The contents of knowledge structures, inference rules
and control strategies established in the previous stages
are organized into suitable format.
Often, knowledge engineers will have been using
the program development tool to build a working prototype
to document and organize information collected during the
formalization stage, so that implementation is completed
at this point. If not, the notes from the earlier phases
are coded at this time.

Consideration must be given to long-term maintenance.
Modifications to the knowledge base over time must be
anticipated. The knowledge base should be extensively
documented as it is coded.
The potential for later misunderstanding and confusion should
be minimized wherever possible. Furthermore,
extensive justifications and explanations should be
included to assist the end-user
in fully understanding questions posed to them by the program,
so that the user can effectively use the program output,
and to show the user, on demand, how the recommendation was
logically
derived.

The amount of help to be incorporated will depend on the
ability of the anticipated user. While a consultant
may be interested in quickly obtaining an answer to a question,
an ES intended to be used by those who must accomplish the
recommendation is different. Typically, to believe the
recommendation
the end-user needs access to the assumptions underlying
the recommendation and desires a credible justification
for program recommendations.

This is also the point where the developer must decide how the
program will interact with other computer programs and databases.
The first generation of ES were stand-alone programs.
Many had no facilities to communicate with
the operating system or to read from, or write to databases.

The last stage, testing, involves considerably more than finding
and fixing syntax errors. It covers the verification of
individual
relationships, validation of program performance and evaluation
of the utility of the software package.
Testing guides reformulation of concepts, redesign of
representations
and other refinements. Verification and validation must occur
during the entire development process. Verification proves that
the models within the program are true relationships.
It ensures that the knowledge is accurately mimicked by having
the
domain expert operate the program for all possible contingencies.

Perhaps the most difficult aspect of testing is accurately
handling the uncertainty that is incorporated in most
ES in one way or another. Certainty factors are one
of the most common methods for handling uncertainty.
Verification of the certainty factors assigned
to the knowledge base is largely a process of trial and error,
refining the initial estimates by the domain expert
until the program consistently provides recommendations
at a level of certainty that satisfies the expert.
To ensure program accuracy, all possible solution
paths must be painstakingly evaluated.

An effective validation procedure is critical to the success
and acceptance of the program. During validation the following
areas are of concern:
(1) correctness, consistency and completeness of the rules;
(2) ability of the control strategy to consider information in
the
order that corresponds to the problem solving process;
(3) appropriateness of information about how conclusions
are reached and why certain information is required; and most
critical,
(4) agreement of the computer program output with the
domain expert's corresponding solutions.

How the sequence of questions and output are presented
to the end-user may have as much to do with acceptance and use
as does the accuracy of the recommendations. The lessons
learned from human engineering cannot be ignored
if the program is to be successful.

Validation is an ongoing process requiring the output
recommendations
be accurate for a specific user's case.
Validation is enhanced by allowing others to review
critically and recommend improvements.
A formal project evaluation is helpful to establish whether
the system meets the intended original goal.
The evaluation process focuses on uncovering problems
with the credibility, acceptability and utility. This can be
determined from the program accuracy
that is determined from comparisons with the real-world
environment. Included are the understanding and flexibility of
the
program, ease of use, adaptability of the design and the
correctness of solutions.