.. _operators:
Operators
=========
.. index:: unsupported operand type, operators, OperatorMatcher, SyntaxError
.. _caveatsandlimitations:
Caveats and Limitations
-----------------------
It is unfortunate, but realistic, that the chapter on operators should start
with some warnings to the user.
Operators --- things like ``&`` and ``|``, used to join matchers --- can help
produce grammars that are easier to read, easier to understand, and so less
likely to contain errors. But their implementation pushes Python's
boundaries, giving problems with precedence and applicability. This is
exacerbated by the automatic coercion of strings to `Literal() `_ matchers wherever possible.
For example, because operators are effectively methods on *neighbouring
objects*, the following will fail::
>>> name = ('Mr' | 'Ms') // Word()
[...]
TypeError: unsupported operand type(s) for |: 'str' and 'str'
This is because neither ``'Mr'`` nor ``'Ms'`` subclass `OperatorMatcher() `_ (which is where
``|`` is defined, via ``__or__`` and ``__ror__``).
Another example, where precedence is not as we might hope::
>>> name = ('Mr' // Word() > 'man' | 'Ms' // Word() > 'woman')
[...]
SyntaxError: The operator > for And('Mr', Transform, Transform) was applied to a matcher (Or('man', And)). Check syntax and parentheses.
because the expression is parsed by Python as::
>>> name = ('Mr' // Word()) > ('man' | ('Ms' // Word()) > 'woman')
and the SyntaxError was generated by Lepl, in an attempt to detect this kind
of error before the parser is called.
In short, then: use operators with care. Many of the guidelines in the
:ref:`style` chapter are intended to help manage these problems.
.. index:: &, +, /, //, |, %
Binary Operators Between Matchers
---------------------------------
======== ===========
Operator Description
======== ===========
``&`` Joins matchers in sequence. The result is a single list containing the results from all functions. Identical (without separators) to `And() `_.
-------- -----------
``+`` As ``&``, but the results are then joined together with the standard Python ``+`` operator.
-------- -----------
``/`` As ``&``, but with optional spaces (0 or more) between functions. If no space is found, no result is added, otherwise any found spaces are joined together into a single result.
-------- -----------
``//`` As ``&``, but with required spaces (1 or more) between functions. The spaces are joined together into a single result.
-------- -----------
``|`` Matches one matcher from a list. The result is the result of the chosen matcher. Identical to `Or() `_.
-------- -----------
``%`` As ``|``, but without backtracking between functions. Identical to `First() `_.
======== ===========
For a discussion of backtracking see :ref:`backtracking`.
.. index:: ~, []
Prefix And Postfix Operators On Matchers
----------------------------------------
======== ===========
Operator Description
======== ===========
``~`` Discards the result from the matcher.
Identical to `Drop() `_.
-------- -----------
``[]`` Repeats the matcher, with optional concatenation and separator.
Identical to (without separators)
`Repeat() `_
(see :ref:`previous section `).
======== ===========
.. note:
`Lookahead() `_ is an exception
for ``~`` (see :ref:`lookahead`).
.. index:: >=, >, >>, **, ^, args()
.. _ge:
Operators That Apply Functions To Results
-----------------------------------------
======== ===========
Operator Description
======== ===========
``>=`` Pass the results of the matcher (left) to the given function (right) and use the result as the new result. Identical to `Apply(raw=True) `_.
-------- -----------
``>`` Pass the results of the matcher (left) to the given function (right) and use the result, *within a new list*, as the result. If the function is a string a ``(string, result)`` pair is generated instead. Identical to `Apply() `_.
-------- -----------
``args`` Not an operator, but used with ``>`` to expand the list of results to be arguments (like Python's ``*args`` convention). For example ``> args(myFunc)`` invokes ``myFunc(*results)``.
-------- -----------
``>>`` As ``>``, but the function is applied to each result in turn (instead of all results being supplied in a single list argument). Identical to `Map() `_.
-------- -----------
``**`` As ``>``, but the results are passed as the named parameter *results*. Additional keyword arguments are *stream_in* (the stream passed to the matcher), *stream_out* (the stream returned from the matcher) and *core* (see :ref:`resources`). Identical to `KApply() `_.
-------- -----------
``^`` Raise a Syntax error. The argument to the right is a string that is treated as a format template for the same named arguments as ``**``.
======== ===========
.. _replacement:
Replacement
-----------
Operators can be replaced inside a ``with`` context using `Override()
`_::
>>> with Override(or_=And, and_=Or):
>>> abcd = (Literal('a') & Literal('b')) | ( Literal('c') & Literal('d'))
>>> print(abcd.parse('ac'))
['a', 'c']
>>> print(abcd.parse('ab'))
[...]
lepl.stream.maxdepth.FullFirstMatchException: The match failed in at '' (line 1, character 3).
(think about it).
It is also possible to provide a separator that is used for ``&`` and ``[]``.
With a little care (define matchers for characters before, and matchers for
sentences after, the *with* statement) this can handle the common case of
space--separated words in a transparent manner:
>>> word = Letter()[:,...]
>>> with Separator(r'\s+'):
>>> sentence = word[1:]
>>> sentence.parse('hello world')
['hello', ' ', 'world']
Note that there was no need to specify a separator in ``word[1:]``, and that
this the argument of `Separator()
`_ is a rare example of a
string being coerced to something other than a `Literal()
`_ (here `Regexp()
`_ is used).
The use of separators to handle spaces is discussed in more detail below.
.. index:: Separator(), SmartSeparator1(), SmartSeparator2(), DroppedSpace()
.. _spaces:
Spaces
------
There's a wide variety of ways to handle spaces in Lepl. A large part of the
:ref:`Tutorial ` is spent discussing this, and it's probably the
best place to look for a basic understanding.
The main conclusion of the :ref:`Tutorial ` is that the :ref:`lexer`
(ie using `Token() `_) is the
best approach in most circumstances. It usually hits the sweet spot between
flexibility and simplicity.
Alternatively, to handle optional spaces (zero or more), without tokens, use
`DroppedSpace() `_::
with DroppedSpace():
addition = value & "+" & value
But sometimes these are not the right solution. One case is
:ref:`table_example`, when the `Columns()
`_ matcher is a good fit.
Another is when spaces are *required*.
It is something of a "beginner's mistake" to enforce the use of spaces in the
grammar --- it makes the parsing more complex (and more fragile, even to
"good" input), and typically doesn't help the end user much. But even so, it
is sometimes necessary.
In such cases, the only real solution is to specify all the spaces by hand.
One option is to use the ``/`` and ``//`` operators (which match zero-- and
one--or--more spaces respectively). Alternatively, to save typing, Lepl
includes various *separators* (`DroppedSpace()
`_, above, is a
separator). The :ref:`Tutorial ` introduced the basic
`Separator() `_ (as
described in the previous section, above), which requires a user--specified
space wherever `&` is used (and also in `[]` repetition).
But even this is often not sufficent when optional matchers are used, because
the spaces remain even when the optional matcher is ignored.
So, to help automate the (rare) case of *required* spaces, *automatic*
addition of spaces for each `&`, and *optional* matchers, two "smart"
separators are also available. The first, `SmartSeparator1()
`_, checks whether
a matcher is used by seeing whether it consumes input; spaces are only added
when `&` is between two matchers that both "move along" the input stream. The
second, `SmartSeparator2()
`_, takes a more
pro--active approach and examines the matchers to see whether they inherit
from the base class used in Lepl to implement "optionality".
All separators are implemented using :ref:`operator replacement
`, described above.
If you really, really need such functionality, the best thing to do is try
these various separators and see which has the behaviour you require (but
please first consider whether you absolutely need to check that spaces are
present, or whether you can do what you want more simply and reliably with the
:ref:`lexer`).
The following tables show the results of some simple tests for different
separators, spaces, and functions. They also illustrate two separate, but
related, issues: the difference between `And()
`_ and ``&`` when separators are
present; and how matchers like `Eos()
`_ function (which is not optional,
but consumes no input).
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|Optional('a') & Optional('b') |
+----------+-----------------------------------------------------------+-----------------------------------------------------------+-----------------------------------------------------------+
| |Separator |SmartSeparator1 |SmartSeparator2 |
| +-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+
| |And(..., Eos()) |... & Eos() |And(..., Eos()) |... & Eos() |And(..., Eos()) |... & Eos() |
| +--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
| |' ' |' '[:] |' ' |' '[:] |' ' |' '[:] |' ' |' '[:] |' ' |' '[:] |' ' |' '[:] |
+==========+==============+==============+==============+==============+==============+==============+==============+==============+==============+==============+==============+==============+
|'a b ' | | |yes |yes | | | | | | |yes |yes |
+----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|'a b' |yes |yes | |yes |yes |yes |yes |yes |yes |yes | |yes |
+----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|'ab' | |yes | |yes | |yes | |yes | |yes | |yes |
+----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|' b' |yes |yes | |yes | | | | | | | | |
+----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|'b' | |yes | |yes |yes |yes |yes |yes |yes |yes | |yes |
+----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|'a ' |yes |yes | |yes | | | | | | |yes |yes |
+----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|'a' | |yes | |yes |yes |yes |yes |yes |yes |yes | |yes |
+----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|'' | |yes | |yes |yes |yes |yes |yes |yes |yes |yes |yes |
+----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|' ' |yes |yes | |yes | | | | | | | | |
+----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
Each table has a "yes" when the parser (at the top of the table) matchers the
input stream (on the left). Pay careful attention to spaces in the input.
Different columns of results correspond to the different spearators, whether
they are matching a single space or "zero or more" spaces, and whether the
final `Eos() `_ matcher is added with ``&`` (which will include the spaces
from the separator) or `And() `_ (which won't).
So, for example, the final column on the right, below, has results for this
parser::
with SmartSeparator2(Literal(' ')[:]):
parser = Optional('a') & Optional('b') & 'c' & Eos()
(where `Literal( ) `_ is missing from the column heading to save space).
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|Optional('a') & Optional('b') & 'c' |
+----------+-----------------------------------------------------------+-----------------------------------------------------------+-----------------------------------------------------------+
| |Separator |SmartSeparator1 |SmartSeparator2 |
| +-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+
| |And(..., Eos()) |... & Eos() |And(..., Eos()) |... & Eos() |And(..., Eos()) |... & Eos() |
| +--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
| |' ' |' '[:] |' ' |' '[:] |' ' |' '[:] |' ' |' '[:] |' ' |' '[:] |' ' |' '[:] |
+==========+==============+==============+==============+==============+==============+==============+==============+==============+==============+==============+==============+==============+
|'a b c ' | | |yes |yes | | | | | | |yes |yes |
+----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|'a b c' |yes |yes | |yes |yes |yes |yes |yes |yes |yes | |yes |
+----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|' b c' |yes |yes | |yes | | | | | | | | |
+----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|'b c' | |yes | |yes |yes |yes |yes |yes |yes |yes | |yes |
+----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|'ab c' | |yes | |yes | |yes | |yes | |yes | |yes |
+----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|'a c' | |yes | |yes |yes |yes |yes |yes |yes |yes | |yes |
+----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|'a c' |yes |yes | |yes | |yes | |yes | |yes | |yes |
+----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|'c' | |yes | |yes |yes |yes |yes |yes |yes |yes | |yes |
+----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
|' c' | |yes | |yes | | | | | | | | |
+----------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+--------------+
Finally, note that offside (significant whitespace) parsing is only supported
with tokens. If you want to do it without, you need to somehow work out how
to track the level and match the spaces yourself.