Day 90 - Goodbye Capture Groups, Hello Word Boundaries

There was an interesting Regex problem to solve today. Insert commas in a number
according to the Indian numbering system. There’s an inherent asymmetry in the
Indian system

The number: 1 million

Indian: Rs. 10,00,000
American: Rs. 1,000,000

The first approach that I took was to use capture groups. Capture the last three
digits, put a comma in front of them, then the next two and so on. This would
have worked if all the numbers that this replacement needed to happen on were of
the same number of digits. That kind of thing never happens in real life.

Scouring the internet, I eventually landed on the Regex used to insert commas
according to the American system. The regex looked like this:

str.replace(/\B(?=(\d{3})+(?!\d))/g, ",")

I know a fair amount of Regex from having used sed inside and outside vim
regularly to replace text. This regex had a lot of things that I didn’t
recognize. \B, ?=, ?! and also some parentheses which weren’t working as
group capturers and a comma appearing in the middle of two characters. I had
never seen that before.

Then, I found this website which has
incredibly detailed articles about all the operators that Regex supports and the
things that can be accomplished with it.

Armed with the knowledge of what “Not a word boundary \B” and Lookahead
assertion using ?= means, I delved into finding a Regex that would solve the
problem.

The first one finds the first NOT a word boundary which is succeeded by 3
digits and the end of the string

The second regex looks for NOt a word boundaries that have exactly two
digits after them and stop matching when a comma is encountered.

All of this magic (not really.) is possible because Lookahead assertions don’t
consume characters. Consuming characters basically means that the regex
pointer moves forward from one character to the other after looking at it.
Lookahead assertions keep the pointer at the character being considered for a
match and simply lookahead and assert or not assert the stuff they have been
asked to.

I would love to have a look at how this is implemented in sed or
even C++ for Node, but I am unreasonably sure I wouldn’t understand the
implementation. This “looking at the implementation and not understanding it”
will become a post inspiration in the future.