Use regular expressions right

These can be amazingly powerful tools for solving the right set of problems. Regular expressions in particular are very useful for matching regular languages. And there is the crux of the problem: few people know how to describe a regular language (it's part of computer science theory/linguistics that uses funny symbols—you can read about it at Chomsky Hierarchy).

When dealing with these things, if you use them wrong it is unlikely that you've actually solved your original problem. Using a regular expression to match HTML (a far too common occurrence) will mean that you will miss edge cases. And now, you've still got the original problem that you didn't solve, and another subtle bug floating around that has been introduced by using the wrong solution.

This is not to say that regular expressions shouldn't be used, but rather that one should work to understand what the set of problems they can solve and can't solve and use them judiciously.

The key to maintaining software is writing maintainable code. Using regular expressions can be counter to that goal. When working with regular expressions, you've written a mini computer (specifically a non-deterministic finite state automaton) in a special domain specific language. It's easy to write the 'Hello world' equivalent in this language and gain rudimentary confidence in it, but going further needs to be tempered with the understanding of the regular language to avoid writing additional bugs that can be very hard to identify and fix (because they aren't part of the program that the regular expression is in).

So now you've got a new problem; you chose the tool of the regular expression to solve it (when it is inappropriate), and you've got two bugs now, both of which are harder to find, because they're hidden in another layer of abstraction.

Problem solving and supporting problem solving

Regular expressions—particularly non trivial ones—are difficult to code, understand, and maintain. You only have to look at the number of questions on Stack Overflow tagged regex where the questioner has assumed that the answer is a regex has got stuck. In a lot of cases the problem can (and perhaps should) be solved a different way.

This means that, if you decide to use a regex you now have two problems:

The original problem you wanted to solve.

The support of a regex.

Basically, I think he means you should only use a regex if there's no other way of solving your problem. Any other solution is going to be easier to code, maintain, and support.

We’ll be here all night folks

There are some tasks for which regular expressions are an excellent fit. I once replaced 500 lines of manually written recursive descent parser code with one regular expression that took around 10 minutes to fully debug. People say regexes are hard to understand and debug, but appropriately-applied ones are not nearly as hard to debug as a huge, hand-designed parser. In my example, it took two weeks to debug all the edge cases of the non-regex solution.

However, to paraphrase Uncle Ben, "With great expressivity comes great responsibility."

In other words, regexes add expressivity to your language, but that puts more responsibility on the programmer to choose the most readable mode of expression for a given task.

Some things initially look like a good task for regular expressions, but aren't. For example, anything with nested tokens, like HTML. Sometimes people use a regular expression when a simpler method is more clear. For example, string.endsWith("ing") is easier to understand than the equivalent regex. Sometimes people try to cram a large problem into a single regex, where breaking it into pieces is more appropriate. Sometimes people fail to create appropriate abstractions, repeating a regex over and over instead of creating a well-named function to do the same job (perhaps implemented internally with a regex).

For some reason, regexes have a weird tendency to create a blind spot to normal software engineering principles like single responsibility and DRY. That's why even people who love them find them problematic at times.

Find more answers or leave your own answer at the original post. See more Q&A like this at Programmers, a question and answer site for professional programmers interested in conceptual questions about software development. If you've got your own programming problem that requires a solution, log in to Programmers and ask a question (it's free).