Useful Links

Search form

You are here

INFORMAL NOTES ON PROGRAMMING-LANGUAGES

Note: This article was scanned using OCR from the Fall 1992 CCCE Newsletter. Please contact us if you identify any OCR errors.

This is a follow-on from Part One (Comp. Chern. Educ. Newsletter, Fall, 1991), which covered the 'classical' languages Fortran, Cobol, Pascal, etc. See Part One also for an introduction to the topic overall. The emphasis is on languages that can be used on personal computers, especially under DOS.

PART TWO: NEO-CLASSICAL LANGUAGES

This deals with well-defined languages that are not yet widely used by chemists. Each shows some interesting features and some useful capabilities not easily available in the classical languages.

Prolog:

The language was defined by A. Colmerauer ( et al.) in the early 70's at the University of Marseilles. It grew at the University of Edinburgh and became widely known in the 'Edinburgh syntax' from Clocksin and Mellish's text. Prolog came to world notice in 1981 when it was selected by the Japanese government for their 'Fifth Generation Computer Systems Project'. It is often used in artificial intelligence work, as an alternative to Lisp. Prolog exists in many dialects and is not yet standardized. (This discussion uses mainly the Edinburgh syntax.)

'Prolog' is short for 'programming in logic'. Prolog programs define data relationships, but specify few procedural details. Instead, Prolog uses a generalized and powerful strategy of recursive backtracking, to discover all allowable combinations of database facts that fit the declared relationships.

.Suppose we have an aliphatic organic chemistry data-base, e.g.:

ketone( acetone).

ketone(' 2-propanone').

primary _alcohol( ethanol).

sec_alcohol(' 2 -propanol').

Each line of the above is a typical Pro log program predicate (or database fact), ending in a period and asserting a relationship for one or more arguments. Predicate names and argument values require lower-case letters (this can be over-ruled by use of singly- quoted values, e.g. 'Br' or '2-propanol'). A predicate is not a function and does not have a value, e.g. to provide the boiling point of acetone, we would have to assert:

bp( acetone, '56.5Celsius'). (These examples are simplistic; in a useful system such predicates could be more informative, and could perhaps be generated automatically from a standard structural and reaction dataset.)

The database could then be queried with predicates containing variables (beginning with upper- case letters), e.g.

?- ketone(X).

... inquiry

X = acetone ... response

X • 2·propanone ... another response

... and so on

The database is scanned exhaustively for all values of X such that the inquiry is true. (If no match could be found, the system would respond: "NO".)

Reactions could be added to the database as assertions of general rules, e.g. reduce( X, 'NaBH4', Y) :ketone(X), sec_alcohoi(Y).

meaning, X can be reduced by NaBH4 to Y if X is a ketone and Y is a secondary alcohol. This rule would then allow queries such as: ?-reduce( acetone, 'NaBH4', X) ... inqui<y X s: 2-propanol .. response

meaning, to what product would (Of course, the above reduce() predicate is a necessary but not sufficient condition; structural rules would have to be added to preclude e.g. the reduction of acetone to 2-butanol, but that is beyond the scope of this brief introduction.) The above rule could be applied in other ways, e.g. 1?-reduce(X, 'NaBH4', '2-propanol'). X m acetone ~-reduce(ketone(_), X, sec_alcohol(_ .. X= NaBH4 and so on. In the last query above, the underline arguments (_) are generalized Prolog place-holders for arguments whose values are currently unimportant.

In general, Prolog rules can be written of the form:

p1 (a1...) :- p2( ... ), p3( ... ) ..... meaning the relation p 1 is asserted for matching values of its arguments if predicates p2 .... can be verified for their arguments. Of course, verification of p2 .... may require further backtracking through the database. A predicate becomes an inquiry by giving one or more of its arguments as variables, instead of values. The important thing is that recursive backtracking will find every combination of argument values (explicit and implied) for which an inquiry can be verified.

Recursive backtracking can be very powerful, but can also lead to very long run-times and very large numbers of solutions (the 'combinatorial explosion') for a large database. Matching can be made more efficient by use of a 'cut' predicate, which can preclude search down some logical paths, if such further search is not wanted or unlikely to be useful. acetone be reduced by NaBH4?

A Prolog database is a file of assertions and rules, in predicate form. The database is dynamic, because assertions and rules can be added, deleted and altered freely during the run of a program. Like Lisp and Snobol, Prolog programs are capable of generating and executing new code during the run, so the programs can be self-modifying and adaptive.

The above gives only a very brief taste of Prolog capabilities. Other operations are possible. Early Prologs could do only 1 6-bit integer arithmetic (in the range +32768), but newer versions can do rational, complex and double-precision floating-point arithmetic. Standard predicates exist to process data lists, e.g. ['F','CI','Br','l'], trees (as lists of lists), arrays (as lists), strings (as lists of symbols) and graphs (as lists of nodes, with connections). Molecules could be coded and processed as atomic or functional-group graphs. There are also predicates to handle input, output and file manipulations, including Prolog data-base editing. Values (e.g. '2-propanol', 39.06) can be converted to and created from strings of characters. Prolog libraries are available to supply pre-written predicates for most common operations.

Prolog is particularly suited to the expression and application of logical relationships, (including fuzzy logic). In chemistry, it would be useful for exploring structure/property relationships, deducing structures from spectra, and generating chemical structures. Because it can handle logical inference, it would be usable in computer-assisted learning.

AWK was never intended as a full language, but became one anyway. It was invented by Aho, Weinberger and Kernhigan (hence the acronymic name) as a utility for the Unix system. They wanted a 'tiny language' that could be used to write one-line programs to be used as text-filters and data-file scanners. All three of them being superb programmers, they created better than they knew. They were surprised when AWK began to be used as a serious programming language. DOS and Mac versions of AWK are now available. AWK has inspired a descendant called Peri (a sort of super-AWK). AWK is an interpreted language. It resembles C but without the elaborate declarations and data structures. The syntax is simpler. AWK has a number of special features that makes it very versatile and powerful in scanning and manipulating text files.

The simplest AWK program has the structure: { action } where 'action' is one or more AWK commands in free form layout. An input file is read, line by line, and the action clause is executed repeatedly using each line in turn as input, until end-offile.

A more general program structure is:

pattern 1 { action 1 l pattern2 { action2

This would process the input line by line. The first line would be scanned in turn for each of pattern1, pattern2, and so on. Whenever a pattern was matched, then the corresponding action would be executed, with the current line as input. When all patterns had been applied, the next line would be read and the whole pattern-matching cycle repeated. This would continue, until all the file had been read, and all patterns had been applied to every line. It is possible to match patterns to chosen words or phrases, instead of the whole line. Patterns can contain arithmetic expressions and can test numeric values. If any pattern has no action, then the matched line is printed by default.

AWK patterns can contain 'regular expressions', e.g I hex[aey]ne/ would match 'hexane' or 'hexene' or 'hexyne', I [A-Z][a-z]+/ would match any word with an initial capital letter, and so on. Very complex patterns can be created, if desired.

A WK input and output can be very easy, because of the 1/0 defaults. There is often no need for loops or formats. Pattern matching allows selective processing of input lines. It is possible to alter the defaults so that the unit of processing is a word, file, paragraph or page, rather than a line. AWK data structures are very simple and flexible. There are no data declarations. Any variable can contain either numeric or text values. A numeric value is treated either as a number or as a string of numeric characters according to context. All arithmetic is floating point. Variables are automatically initialized to a zero and/or an empty string before first use. Arrays are one-dimensional and dynamic (i.e. of varying length) and can contain mixed numeric and string values. An array can be indexed with either numeric or text values, e.g. bp["acetone"] to obtain a boiling point from a bp data array. It is possible to test for use of a subscript, e.g.O if ("acetone" in bp) print bp("acetone"]O AWK should not be considered an 'only language', but is a very valuable 'supplementary language'. It is useful for writing short programs, for transforming files, and for preparing file excerpts and data-base reports. Many experienced programmers now use it for 'program prototyping'. They try out algorithms in AWK, then translate the tested code into a more efficient, strongly typed language for production use. In some colleges, students are taught AWK as their first computer language, because of its simplicity and power.

The Forth language was created in the early GO's by C. H. Moore. It had its first major application in the early 70's as a data-control language for a radio-telescope. It is often used as a realtime instrument control language, but has spread to other applications. The Forth community is small but enthusiastic, with something of the 'truebeliever' attitude once common among APL users. There are many dialects and two commonly used standards (Forth-79 and Forth- 83). Forth is a readily extensible language, so dialects often differ greatly in the details of their floating-point arithmetic, string handling, file manipulation, numeric 1/0 and system interfaces.

At its heart, Forth is a postfix (or reverse polish notation) calculator language, based on the manipulation of an abstract stack. Numbers, strings and memory addresses can be pushed onto the stack. The value currently at the stack-head can be inspected and removed. Data can also be moved between the stack and addresses in memory, and this in turn allows the definition of named variabies, arrays, matrices and strings. All such data structures are manipulated through the stack. This insistence on a stackbased operation gives Forth its characteristic style, very different from most other languages.

In the following one-line examples, entered numbers are pushed onto the stack top when encountered. Each operator acts on the one or two numbers at the stack top; the operands are popped from the stack and the result is pushed back onto the stack; e.g. ... pop and print.

3 5 + . . •. print (3+5)

3 5 + 2 I .... print ((3+5)12).

OUP ••. duplicate stack-top.

3 5 + DUP * . . .. print (3+5)2.

Named variables are created by reservation of an addressed memory segment, then values can be assigned to and read from the named address, e.g.

Arrays can be handled in the same way, but by reserving multibyte memory space. e.g. CREATE AC 20 ALLOT ... AC = 1 0-integer array. Manipulation of array elements requires explicit array- element address arithmetic, but usually there are library commands designed for this.

Forth can very readily be extended to supply capabilities lacking in the primitive language. Each of the operators we have seen above (such as . + DUP and ALLOT) are called 'words'. A word is the name of a pointer (i.e. a memory address) to a brief segment of machine-code. The machine code segments use stacktop and memory-addresses as operands. New words can be readily defined, e.g.

: SQUARE DUP * ; ... define 'square'.

: OJBE DUP SQUARE * ; ... define 'cube'.

12 CUBE . . .. print 1 23

Both of these new words operate on the stack-top.

As a result of the 'word' -based structure, a Forth program is a sequence of pointers (or jumps) to brief primitive segments of machine-code . This is called 'threaded code'. Forth programs do not need to be compiled, and yet are much faster in execution than interpreted languages. Because the primitive word-segments are accessed by address, they do not need to be copied each time they are supplied to a program sequence, so Forth programs are usually very much more compact than the equivalent machine-code created by a compiler. The word-based organization lends itself very well to program development by a bottom-up approach (rather than the more widely used topdown approach of functional programming).

Forth resembles assembly language in many of its operating details and its efficiency of execution, and yet it is high-level and transferable. At first sight, Forth source-code seems dense and cryptic, but it becomes intelligible on acquaintance. Its compactness and efficiency make Forth a valuable language for numerical real-time instrument control with small computers. Forth has string and text processing capabilities and has been used as an implementation language for interpreters, compilers and text-editors.

If you consider acquiring a Forth system, pay great attention to the systems-library of defined words, to ensure that it has the range of capabilities that you need. The Kelly and Spies text covers most of the principal Forth dialects and has an good bibliography of Forth texts and sources of software and information.