How do I use it?

dendrol provides an interface for parsing STIX2 Pattern Expressions much like cti-pattern-validator, with the dendrol.Pattern class. This class has a method, to_dict_tree(), which converts the ANTLR parse tree to a dict-based tree structure, PatternTree.

Brief

A PatternTree begins with a 'pattern' key. Below it is an observation expression, with an 'observation' or 'expression' key (which may contain more observation expressions joined by AND/OR/FOLLOWEDBY). Below 'observation' keys are comparison expressions, marked by a 'comparison' or 'expression' key (which may contain more comparison expressions joined by AND/OR). 'comparison' keys denote a single comparison between an object property and a literal value.

An Observation Expression is a dict with a single key of either 'expression' or 'observation'. An 'expression' SHALL contain two or more observation expressions joined by AND/OR/FOLLOWEDBY, whereas an 'observation' SHALL contain only comparison expressions.

An 'observation' is analogous to square brackets in STIX2 Pattern Expressions, e.g.: [ipv4-addr:value = '1.2.3.4']. Children of an observation (in the 'expressions' key) SHALL only be comparisons or comparison expressions.

An 'observation' MAY have qualifiers, but its children MUST NOT.

An 'observation' MAY have a join method, which denotes how its child comparison expressions are to be joined. This method MAY be AND or OR, but MUST NOT be FOLLOWEDBY, because the join method applies to comparison expressions, not observation expressions. If there is only a single child comparison expression, 'join' MAY be None.

An 'observation' SHALL contain a set of all the object types of its child comparison expressions. This is mainly for human consumption. A STIX2 observation is allowed to contain comparisons on disparate object types, provided they're joined by OR— this is why 'objects' is a set, not a single string.

If 'objects' contains only a single object type, it MAY be compacted into set literal form:

The 'start_stop' qualifier constrains the timeframe in which its associated observation expressions MUST occur within to evaluate true. Unlike WITHIN, START ... STOP ... denotes absolute points in time, using datetime literals.

Example STIX2 expression:

[a:b = 12] START t'2018-10-07T00:00:00Z' STOP t'2018-10-08T23:59:00Z'

In STIX2 Pattern Expressions, all datetimes MUST be in RFC3339 format, and MUST be in UTC timezone. datetime literals resemble Python strings with t as their modifying char (like an f-string, or a bytestring). Because they must be in UTC timezone, datetime literals MUST end with the Z char.

When parsed into Python, they SHALL have a tzinfo object with a dstoffset of 0.

The 'within' qualifier constrains the timeframe in which its associated observation expressions MUST occur within to evaluate true. Unlike START ... STOP ..., WITHIN denotes relative timeframes, where the latest observation expression MUST occur within the specified number of seconds from the earliest observation expression.

Example STIX2 expression:

[a:b = 12] WITHIN 600 SECONDS

SECONDS is hard-coded into the STIX2 Pattern Expression grammar, and MUST be included in pattern expressions. However, to avoid ambiguity for the reader, and to allow for future STIX2 spec changes, the unit is also included in the Pattern Tree.

The 'repeats' qualifier REQUIRES that its associated observation expressions evaluate true at different occasions, for a specified number of times.

Example STIX2 expression:

[a:b = 12] REPEATS 9000 TIMES

TIMES is hard-coded into the STIX2 Pattern Expression grammar, and MUST be included in pattern expressions. However, since there aren't any other obvious units of multiplicity, other than "X times", it has been omitted from the Pattern Tree output — unlike SECONDS of WITHIN.

A Comparison Expression is a dict with a single key of either 'expression' or 'comparison'. An 'expression' SHALL contain two or more comparison expressions joined by AND/OR, whereas a 'comparison' contains no children, and only marks a comparison of one variable to one literal value.

An 'expression' is a container for other comparison expressions, joined by either AND or OR in 'join' — comparison expressions do not have FOLLOWEDBY, as they are intended to reference a single object at a single point in time.

An 'expression' MUST NOT have qualifiers.

Its children are in 'expressions', whose values SHALL be dicts with single keys (of either 'comparison' or 'expression').

A 'comparison' represents a single comparison between a STIX2 object property and a literal value. A single string object type SHALL be placed in the 'object' key.

'path' SHALL be a list beginning with a top-level property of the object type denoted in 'object', as a string. Following this MAY be any number of child properties, as strings, or list index components/dereferences, denoted as Python slice() objects, where [1] is equivalent to slice(start=None, stop=1, step=None). The special match any list index from STIX2 (e.g. file:sections[*]) is equivalent to slice(start=None, stop='*', step=None).

'negated' SHALL be a bool denoting whether the operator SHALL be negated during evaluation. STIX2 allows a NOT keyword before the operator: file:name NOT MATCHES 'james.*'. If the operator is not negated, 'negated' MAY be None. (This allows for a more compact YAML representation — where the value may simply be omitted.)

'operator' SHALL be a string representing the operator, e.g. '>', 'LIKE', or '='.

'value' MAY be any static Python value. Currently, only strings, bools, ints, floats, datetimes, and bytes are outputted, but this could change in the future (e.g. if compiled regular expressions are deemed useful).

If 'path' contains only a single property, it MAY be compacted into list literal form: