Automatically generate regular expressions with Txt2re

A programmer who hates coming up with regular expressions has produced a visual tool for automatically generating them. You paste in a block of text, identify the parts of it you'd like your regexp to catch, and it produces an (admittedly inelegant but absolutely functional) regexp ready for your use. The programmer explains, "It's free because I have been helped in my career so much by the programmers who generated free systems like linux, apache, php and mysql - this is the only free labour that I have ever given back to the community."

So what does txt2re do?
This system acts as a regular expression generator. Instead of trying to build the regular expression, you start off with the string that you want to search. You paste this into the site, click submit and the site finds recognisable patterns in your string. You then select the patterns that you are interested in and it writes a fully fledged program that extracts those patterns from that string. You then copy the program into your editor or IDE and play with it to integrate it into your program.

Thats appalling - where's the subtlety and art of crafting a beautiful regular expression in that?
There is none.

Kind of, but regular expressions can be done any number of ways. While the results from txt2re seem longer than what I usually put together, it seems useful as (1) a check of what you’ve already come up with and (2) a way to get ideas for pattern types you’ve never worked with before.

Of course it would be awful if people relied on a tool like this to make their regular expressions and never actually figured out what they mean.

Kind of, but regular expressions can be done any number of ways. While the results from txt2re seem longer than what I usually put together, it seems useful as (1) a check of what you’ve already come up with and (2) a way to get ideas for pattern types you’ve never worked with before.

Of course it would be awful if people relied on a tool like this to make their regular expressions and never actually figured out what they mean.

As much as I don’t like writing regex at times, I agree. If you don’t know how to write an expression and just auto-magically generate one you don’t really understand what it’s doing. The question is, is that better or worse than banging on a keyboard until something close to what you want happens, then using that?

Regex can be annoying, but can be an incredibly useful time saver to us code monkeys.

I tend to think that this is a positive thing. The regex will need to be tested anyways, and as Guysmiley points out, many people iterate over and over one regex till they get it right – there’s a lot of guesswork as it stands.

Also, I’ve never encountered anyone’s code that has the regex painstakingly explained out – one would simply end up rewriting most of the regex documentation. Certainly the target pattern or function is explained, but not the formulation of it.

But yeah, what all the other guys have said. It might be useful as a way of tinkering and getting a skeleton of what you want, but you really should understand how regexes work sufficiently to be able to tweak and streamline the output of this tool to suit your needs, or not use it at all.

For those who object on the basis that it creates results that are ugly, clunky, inelegant, unmaintainable: I will now relate what I was told by a Real Programmerâ„¢ (one who wrote out his programs in octal in pen on a code tablet and handed it over to a keypunch operator) and which has served me well ever since: Elegance is in the eye of the beholder, and maintainability is produced by documentation. If the code runs the way you need it to run for optimisation for memory usage or processor usage or maintainability or real-time speed, then that’s the elegance. Others don’t have to see it and often will never see it. If it makes sense to unroll a loop, unroll the loop. If the ugly regex works and does not require a maintenance programmer to understand a nuanced regex concept or operator, to debug somewhere down the line, then use it.

At the same time, I will echo what others have asserted: If you don’t know what you’re doing, the code could be doing anything and returning what you want to see some of the time. Please understand the results or it’s just as bad as writing code that never returns the results you want.

the problem is that this machine doesn’t optimize for anything, while being very restricted in what it can do. on the plus side, it is so restricted that it’s almost impossible to be surprised by its output! :)

if you want to understand and extrapolate someone else’s regex, there are, i think, much nicer tools for that which use color-coding to tag which pieces of the string are captured by which parts of the regex. google “regex explorer” for many examples.

I used to be known as “tool” at my former software job I sat around all day creating scripted tools for engineers and artists to facilitate workflow. The essence of my job was writing Perl scripts parsing files. I was a RegEx monkey.. and I loved it and hope to return to being a similar monkey or SQL squirrel (debugging databases).

The programmer explains, “It’s free because I have been helped in my career so much by the programmers who generated free systems like linux, apache, php and mysql – this is the only free labour that I have ever given back to the community.

Unless I missed a link, he’s not sharing his source code. Cool tool and all, but it doesn’t look like it’s something one could take and improve upon.

I only use regex for personal projects – filtering RSS feeds via Pipes, for example. I don’t use it enough to have any interest in learning all the ins and outs – I just want to make it work.
This is perfect for me.