Trying to answer this question, I created this Python regular expression to match any egg substring followed by a digit that is not part of a URL preceded by http://:

>>> r = re.compile('(?:\s(?!http://\S*))egg\d')

Then I applied it to the following string:

>>> a = "a egg1 http://egg2.com egg3 http://www.egg4.org egg5"

The result is:

>>> r.findall(a)
[' egg1', ' egg3', ' egg5']

The regular expression is not correct for a lot of other problems but one bugged more: why does the whitespace appears in the result? Since I used a lookahead assertion like (?:\s...), shouldn't it be take out of the resulting strings?

2 Answers
2

(?:...) isn't a lookahead assertion, it's simply a non-capturing pair of parens (i.e. what is matched by the sub-regex inside doesn't do into its own group, it only exists for precedence). (?=...) is a lookahead assertion.