This is probably not the greatest example ever, but the idea is that the pattern is built with some sort of semantic meaningful language that can break down into traditional regular expressions. The above would be a lot easier for a developer to modify. I sort of invision it being like SQL/Linq to some degree.

It would make regex a lot more semantic and maintainable. Has this been tried before, and is it a bad/good idea to try this? Could it work?

Edit

Perhaps this is a better example (I know URL's are notoriously difficult to parse and this is over simplified):

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

you don't need regex to parse URLs, that can be done simply with regular old scanning parser. that is a contrived example as well.
–
Jarrod RobersonMay 12 '11 at 18:14

You can use Linq, you can write macros in Clojure. Functional languages naturally lend a hand for this problem. There is no need to reinvent the wheel. You can document Regex, learn it really well, perhaps not use it all the time but combine it with other filter functions. Eitrher way, I do not see a silver bullet ...
–
JobMay 12 '11 at 18:26

3

THAT would be easier to understand or maintain for you??
–
Kilian FothMay 13 '11 at 11:24

@Kilian Well normal code is easier to understand than machine code, SQL/Linq are easier to understand than whatever is beneath them, so yeah I think there's a lot of room for improvement!
–
TomMay 13 '11 at 11:33

We use operators instead of words because they are more easily comprehensible after just a little experience. Do you also prefer "ADD deposit TO current_balance GIVING new_balance" ?
–
kevin clineJul 2 '11 at 6:38

10 Answers
10

It hasn't really taken the world by storm, but it might be a good starting point if you want to write a similar mechanism for another language.

It's not that hard to get the basics of RegExes down; I find switching dialects (Emacs vs Perl Compatible Regular Expressions vs that weird variant in the Visual Studio Find dialog, for example) the biggest issue. I wouldn't be motivated to learn a "plain English" version. It's almost easier to accept the abstraction, because the natural-language translation of the commonly used symbols is imperfect, too.

The main purpose of regular expressions is to provide a terse notation for forming statements that describe regular languages (IOW matching a string based on rules). A verbose notation that generates such a terse notation is basically a Rube Goldberg device.

Also, given the following:

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. — Jamie Zawinski

You seem to be doing the following:

Some people, when confronted with using regular expressions, think "I know, I'll write a language that generates regular expressions."

What you propose is incredibly verbose. Even though regex can be hard to digest if done wrong, and takes some (only a little, I'd claim - I rarely read or write regexes, but I still remember the syntax for the most important features (repetition, characters classes, lookahead) and can read regexes using those features relatively fluently) getting used to, I'd prefer them to something like this which requires me to type out a full pseudo-english sentence for something that can be expressed perfectly well with a few characters. Also consider the complexity (and error-proneness) of an implementation of such a language!

Another issue I have to raise: The checks you use as examples include some things that are completely unreasonable to do with regex - ending with an integer is easy enough, but comparing numbers is a no-go with regex. Also, many of these tests are written more easily with the programming language's nativ string processing tools - checking the length, for instance, or substring checking if the string gets longer or dynamic. The fact that regexes exist and are useful sometimes doesn't mean you have to use them for all string processing. Use them with care and everything's fine.

I will probably go to programmers hell for posting this because it is such an evil hideous anti-pattern, but it shows a terrible practice that you shouldn't try and use. I came across this the other day, RegEx Range Generator
–
Jarrod RobersonMay 12 '11 at 18:08

Sure it could work, but it would be tremendously difficult to implement (imho), when you take into account anything but the most basic of expressions.

Regex is a language all unto its own. Once you have an understanding of how it works, you don't forget it (you might need a refresher on the syntax, but that's the same for all languages) and a wrapper becomes unneeded (and the additional overhead would be unwanted).

What you are describing in almost AppleScript like in its syntax, and AppleScript is universally loathed, even by people who know it well, the syntax may look easy and readable, but its verbosity is its down fall, unless you do it ever day you forget all the grammar and keyword rules and it becomes just as opaque as the regex syntax. It is hard for beginners to understand because of the verbosity and hard for experts because of its verbosity.

Your contrived straw man example:

Rule followedby string endson "."

So how do I remember to use followedby instead of follows or after or before or precedes or preceding or any one of the dozen or so English alternatives to that concept of "coming after" something else. You can apply the same logic to endson which could be endswith or endingwith or ending, you would still have to have a cheat sheet or book to use your proposed syntax.

It's only a proposed solution, more of an illustration. Don't take what I wrote to be the literal solution! It can be different.
–
TomMay 13 '11 at 11:35

different != better that is my point, what appears "simpler" is actually more complex
–
Jarrod RobersonMay 13 '11 at 12:57

+1 A-bleeping-men to loathing applescript. I had to provide what little support end user support apple provided for applescript back in the day and I learned to hate it with a passion. It's pseudo-english code, combined with it's reliance on the details implementation in each individual app, make it bloody nightmare to program. I have to use it but I hate it. Regex is walk in the park compared to applescript.
–
TechZenJul 24 '13 at 12:23

What you want is a DSL for creating regexes. This is not all too hard. It will only get complex/verbose by the flags for capturing groups, special codes for frequently used character classes, anchoring and so on.

The basics are:

a single character is a regex.

if r is a regex, then begin r end is also a regex.

If r1 and r2 are regexes then r1 then r2 is a regex

If r1 is a regex, and n is an integer and m is either an integers or many, then r1 fór n to m times is a regex

If r1 and r2 are regexes, then r1 or r2 is a regex.

Of course, one would want to abbreviate:

'h' then 'e' then 'l' then 'l' then 'o'

with

"hello"

Likewise, after using this DSL for some time, one would want to write

\s*

instead of

begin ' ' or '\t' or '\r' or '\n' end for 0 to many times

The following is interesting: There may be a parallel universe that is exactly like ours, except that there regular expressions had been introduced and were common in a verbose DSL like above. And in the stackexchange.com of this universe, one person could ask why regular expressions have to be so clumsy? He/she had a good idea to make work with regexes far easier by inventing a concise, but equally powerful notation ....

I found that Ruby Regexp Generator http://www.rubyregexp.sf.net is helpful, because you can give it a verbose definition, but it will output a regexp for your code. I use it like a crutch, to help me build my regexp, instead of as a standalone solution.

An alternative to regular expressions is Backus-Naur Form, and some more human variations like EBNF or ABNF. Roughly, each part of the grammar is broken into a 'production rule', with a nonterminal definition on the left and a sequence of terminals and nonterminals describing the rule on the right. your example, in BNF would look something like this:

The best way to do regular expressions is to learn and understand them, or don't use them at all. Using some other tool as an excuse to not learn regular expressions means you have to re-"learn" them every time you encounter them.

Spend a day, just one (full, undistracted) day to deeply study regular expressions and you'll be rewarded with a new tool you can use your whole career. You'll also have a much greater understanding of when they are appropriate and -- more importantly -- when they are not.

Yup, RegExp are a powerful yet dangerous tool - but if we can create a tool that improves readability while keeping the same expressiveness...well, I don't see the bad part of it :)
–
ArtoAleOct 16 '13 at 15:20