1 2# The contents of this file are subject to the Mozilla Public License 3# (MPL) Version 1.1 (the "License"); you may not use this file except 4# in compliance with the License. You may obtain a copy of the License 5# at http://www.mozilla.org/MPL/ 6# 7# Software distributed under the License is distributed on an "AS IS" 8# basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See 9# the License for the specific language governing rights and 10# limitations under the License. 11# 12# The Original Code is LEPL (http://www.acooke.org/lepl) 13# The Initial Developer of the Original Code is Andrew Cooke. 14# Portions created by the Initial Developer are Copyright (C) 2009-2010 15# Andrew Cooke (andrew@acooke.org). All Rights Reserved. 16# 17# Alternatively, the contents of this file may be used under the terms 18# of the LGPL license (the GNU Lesser General Public License, 19# http://www.gnu.org/licenses/lgpl.html), in which case the provisions 20# of the LGPL License are applicable instead of those above. 21# 22# If you wish to allow use of your version of this file only under the 23# terms of the LGPL License and not to allow others to use your version 24# of this file under the MPL, indicate your decision by deleting the 25# provisions above and replace them with the notice and other provisions 26# required by the LGPL License. If you do not delete the provisions 27# above, a recipient may use your version of this file under either the 28# MPL or the LGPL License. 29 30''' 31Rewrite the tree of matchers from the bottom up (as far as possible) 32using regular expressions. This is complicated by a number of things. 33 34First, intermediate parts of regular expressions are not matchers, so we need 35to keep them inside a special container type that we can detect and convert to 36a regular expression when needed (since at some point we cannot continue with 37regular expressions). 38 39Second, we sometimes do not know if our regular expression can be used until we 40have moved higher up the matcher tree. For example, And() might convert all 41sub-expressions to a sequence if it's parent is an Apply(add). So we may 42need to store several alternatives, along with a way of selecting the correct 43alternative. 44 45So cloning a node may give either a matcher or a container. The container 46will provide both a matcher and an intermediate regular expression. The logic 47for handling odd dependencies is more difficult to implement in a general 48way, because it is not clear what all cases may be. For now, therefore, 49we use a simple state machine approach using a tag (which is almost always 50None). 51''' 52 53fromloggingimportgetLogger 54fromoperatorimport__add__ 55 56fromlepl.matchers.coreimportRegexp 57fromlepl.matchers.matcherimportMatcher,matcher_map 58fromlepl.matchers.supportimportFunctionWrapper,SequenceWrapper, \ 59TrampolineWrapper,TransformableTrampolineWrapper 60fromlepl.regexp.coreimportChoice,Sequence,Repeat,Empty,Option 61fromlepl.regexp.matchersimportNfaRegexp,DfaRegexp 62fromlepl.regexp.intervalimportCharacter 63fromlepl.regexp.unicodeimportUnicodeAlphabet 64fromlepl.core.rewritersimportclone,Rewriter,clone_matcher 65fromlepl.support.libimportfmt,str,basestring 66fromlepl.matchers.combineimportDepthNoTrampoline,AndNoTrampoline 67fromlepl.matchers.errorimportError

79self.matcher=matcher# current best matcher (regexp or not) 80self.regexp=regexp# the current regexp 81self.use=use# is the regexp a win? 82self.add_reqd=add_reqd# we need "add" to combine values (from And)?

128'''129 Construct a container or matcher.130 '''131ifuseandnotadd_reqd:132matcher=single(alphabet,node,regexp,regexp_type,wrapper)133# if matcher is a Transformable with a Transformation other than134# the standard empty_adapter then we must stop135iflen(matcher.wrapper.functions)>1:136cls.log.debug(fmt('Force matcher: {0}',matcher.wrapper))137returnmatcher138else:139# not good enough to have a regexp as default, so either force140# the original matcher if it has transforms, or keep going in the141# hope we can get more complex later142matcher=node143ifhasattr(matcher,'wrapper')andmatcher.wrapper:144returnmatcher145returnRegexpContainer(matcher,regexp,use,add_reqd)

167'''168 There is a fundamental mismatch between regular expressions and the 169 recursive descent parser on how empty matchers are handled. The main 170 parser uses empty lists; regexp uses an empty string. This is a hack171 that converts from one to the other. I do not see a better solution.172 '''173(results,stream_out)=matcher()174ifresults==['']:175results=[]176return(results,stream_out)

435'''436 A rewriter that uses the given alphabet and matcher to compile simple437 matchers.438439 The "use" parameter controls when regular expressions are substituted.440 If true, they are always used. If false, they are used only if they441 are part of a tree that includes repetition. The latter case generally442 gives more efficient parsers because it avoids converting already443 efficient literal matchers to regular expressions.444 '''445