A challenge: remove tags from a string.

Posted on September 15, 2015

Recently I have implemented a small piece of functionality, which is suitable to be a small challenge.

Description

Given a string with opening and closing tags (e.g., <mark></mark>), return a string without tags and indices of opening and closing tags.
For example, given a string: "we eat <mark>healthy</mark> and <mark>tasty</mark> food."
We expect to receive a string: "we eat healthy and tasty food.", and an array with pairs of indices: [[7, 14], [19, 24]]. There are two pairs of indices, because there were two pairs of <mark></mark> tags.

Ruby

First, I tried to solve this problem imperative way, which did not work very well. Solving the problem with recursion was easier:

The trick here is to compute position of each subsequent tag after the previous one has been removed. It is needed to account for the shifting of the marked part of the string after removal of tags.
Ruby provides a convenient method to retrieve index of the _n_th match - begin(n) (where n is index of an element in the match array). There is also end(n) method.

Elixir (Ruby-ish)

Then I tried to solve the same problem with Elixir. My first attempt was to use only pattern matching and recursion, but that didn’t look as pretty as expected (next solution), so I went on to try to repeat my ruby solution.

With Elixir I had to write a small function to extract beginning of a match - get_match_index/2.
It was unclear for me what exactly Regex.run/2 returns in [{integer, integer}]. At first I assumed it was beginning and end of a match. Later, when I saw [{12, 7}] it became clear that it probably returns beginning of the match and it’s length.

Elixir

Since Elixir is a functional language I expected to write code without any if\unless statements easily. Even though I managed to do this, I have a feeling that it does look beautiful: