Matching a number in a file with Python

I have about 15,000 files I need to parse which could contain one or more strings/numbers from a list I have. I need to separate the files with matching strings.

Given a string: 3423423987, it could appear independently as "3423423987", or as "3423423987_1" or "3423423987_1a", "3423423987-1a", but it could also be "2133423423987". However, I only want to detect the matching sequence where it is not a part of another number, only when it has a suffix of some sort.

So 3423423987_1 is acceptable, but 13423423987 is not.

I'm having trouble with regex, haven't used it much to be honest.

Simply speaking, if I simulate this with a list of possible positives and negatives, I should get 7 hits, for the given list. I would like to extract the text till the end of the word, so that I can record that later.

It seems you just want to make sure the number is not matched as part of a, say, float number. You then need to use lookarounds, a lookbehind and a lookahead to disallow dots with digits before and after.

To also match the "prefixes" (or, better call them "suffixes" here), you need to add something like \S* (zero or more non-whitespaces) or (?:[_-]\w+)? (an optional sequence of a - or _ followed with 1+ word chars) at the end of the pattern.

Details:

(?<!\d\.) - fail the match if we have a digit and a dot before the current position

(?:\b|_) - either a word boundary or a _ (we need it as _ is a word char)

3423423987 - the search string

(?:\b|_) - ibid

(?!\.\d) - fail the match if a dot + digit is right after the current position.