Apologies if i get a bit misty-eyed in the middle, this is a subject dear
to me.

My first job involved writing an expert system in VAX Pascal.

It used a list-oriented code/data structures that could be easily serialised
for fast store/load.

My main role was writing the search engine part. Initially, we
worked hard to build a 'proper' depth-first and breadth-first
search engines, but found over time that often, only one rule
could fire, or a few independent rules could fire. In these cases,
a lot of time and effort was wasted in the 'proper' search
mechanisms, and the best (=fastest) way to the result was a combined
approach which executed as many independent rules as possible at each stage.

The problem areas were the usual with these sorts of languages,
namely Input. You want to minimise the number of questions asked, but what if
you backtrack over a question already asked ? Do you ask it again,
because the context (to the user) might be different in a
different search order. The system also allowed backward-chaining
rules if that resulted (as it did sometimes) in a faster result.

P.S. The result was IMHO pretty good. It was used for designing
telephone exchanges, which had a certain number of constraints, e.g
"given that we want 100 trunk lines and 1000 subscriber lines,
how do we layout the exchange in this room ?" Constraints/Rules included
trying to make sure units with lots of interconnections were placed
close together.

Summary of key issues:

You can write a rules-based system in some surprising languages.

Input/Output of code/results needs careful design.

Handling of input of data/questions needs care.

The 'proper' searches are often less efficient than more intuitive
searches for many real-world operations.

Multiple control-schemes(search-engines) are needed for different problems.

Develop the basic Rules/Fact classes and the like. This will start by using simple code blocks as the criteria units, but eventually will use things like Array::PatternMatcher that princepawn posted recently. (I've been talking to him on more details). The rules system would currently be explicitly defined in perl and would require perl calls to start it going.

Develop a way to test the rules system by simply iterating over all possible rules/facts until a rule is fired.

Develop a way to cache facts vs rules such that the above step is more efficient

Build up a standalone text format that can be used to simply have a perl-based rules system but without using any perl code to initiate it.

Modify the system from there.

My current thoughts is that the key data item, facts, are only arrays; they may be arrays of mixed data, or just scalars, but they are arrays. Of course, from the practical understanding of a rules-system, you shouldn't be using very complex data structures in any case, as you can always alter them to be replaced by multiple facts, but I want to leave that open in case someone wants to take the concept to advanced levels. The only time that data is introduced to the system would be by facts; while perl input can be used to generate data, the data would go out of scope at the end of the rule unless specifically put into a fact and stored away. Standard perl output can be used for any other generation, while backend stuff can be directed to another file (ie which rules have fired, with which facts, etc).

The trickiest part, for me, is making the rule triggering more efficient. Rules will have two values associated with them: priority (higher priority rules will trigger first) and probability (rules of the same priority and that all can be triggered at the current step; probability will select a random rule from this to trigger). These could change as the rules are fired, though this can lead to bad rule-based systems. In addition, some criteria for rules are dynamic, for example in my pseudo code...:

That is, the first criteria takes a rule that matches the format of "value <id> <number>". The second is a negative criteria; this is, we don't want any rule where a different ID has a larger number. Now, to check this, one must take all N facts, and do N^2 checks with the criteria (though it can be assumed that this might be closer to 0.5 * N^2, since once a fact denies the second criteria, the first rule obvious cannot match. In small cases (few rules & facts) this isn't bad, but it won't scale well to larger systems. I'd like to have a side strucutre that notes when facts match the criteria of rules in order to improve this efficiency; this is easily done for the first criteria of the example above but the second step may still require O(N^2) checks. In general, for N rules with an average of K criteria per rule, and M facts, the system scales as N*(M^K), and any possible reduction of this would help.

At this point, I've mostly got ideas, and it's a matter of implementing things in a perl-ish way (eg transferring variables from matched rules to the 'body' of the rule).

Re. The rule efficiency, you are right, this is the hard part.
To some extent the responsibility lies with the knowledge engineer
(read ruleset creator), since you can write bad (==slow) programs/algorithms
in any language.

Giving the engineer tools (such as the priority / weighting mechanism you
mention can help, as can other options such as choice of
search mechanism and direction.

In my experience, a lot of knowledge engineer time is spent tuning the
knowledge base, and the system should be proactive in helping
that process, providing timing / coverage analysis.
--Brovnik

At this point, it's merely for the purposes of "because it (the lack of perl&rules) is there". Mind you, yes, I'm thinking down the road and making this professional quality, which means that I have to consider the audience at that point. But the initial proof-of-concept would be a working rules engine, in which simply flow control is directed by what facts; anything beyond that , such as how facts are formatted, how rules are defined, etc, is mostly the 'user interface' part of the problem. And by providing a working engine, weaknesses and strengths of the interface can be figured out faster.

I guess my overall goal is to provide a way to run a rules-like system (which in my previous experience has had weak interaction with the OS outside of input and output to terminal) using perl (which has strong OS interaction including networking, etc). An ultimate use would to place a perl script with the rules engine at the backend of a network socket, such that new facts can be introduced remotely and actions on local and remote systems taken as such (a so-called business logic server); so the language and depth would have to be simple enough to write the rules in but still allow for taking full advantage of using perl. There's obvious approaches to doing this, but determining how to draw the line between perl-isms and language elements I can introduce is way down the road.

Once I get moving on this, I'll be posting the code as I've done with other modules of mine here for feedback and such, at least until it's stable and usable enough for CPAN inclusion.

When putting a smiley right before a closing parenthesis, do you:

Use two parentheses: (Like this: :) )
Use one parenthesis: (Like this: :)
Reverse direction of the smiley: (Like this: (: )
Use angle/square brackets instead of parentheses
Use C-style commenting to set the smiley off from the closing parenthesis
Make the smiley a dunce: (:>
I disapprove of emoticons
Other