(1 minute later than an edit) and if you want to head down that path of question and explanation, you might want to look over at the cs theory exchange. The pumping lemma is the simplest disproof for "can a regular language match a^n b^n" (which is matchable by a Turing machine).
–
MichaelTSep 21 '12 at 20:54

1

I think he's asking if he can put it on his resume under his "Programming languages" section. The answer in that case is no. That goes under the "Technologies" section.
–
NeilJan 17 '13 at 15:44

3 Answers
3

Regular Expressions are a particular kind of formal grammar used to parse strings and other textual information that are known as "Regular Languages" in formal language theory. They are not a programming language as such. They are more of a shorthand for coding that would otherwise be extremely tedious to implement and even more confusing than the sometimes arcane looking Regex.

Programming Languages are typically defined as languages that are Turing Complete. Such languages must be able to process any computable function. Regex does not fit into this category.

+1, I looked but couldn't find a good discussion/disproof of Turing completeness of regular expressions.
–
FrustratedWithFormsDesignerSep 21 '12 at 20:38

1

@davidk01 - Cellular automata can be turing complete (though good compilers are hard to find), regular expressions are not. You can do non-trivial computations, yes, but there are fairly trivial things you can't do as well. Turing complete cellular automata could be considered as a programming language, since in principle you can write any program with them that you could with any other language.
–
psrSep 21 '12 at 22:30

1

It's also important to note that the regex that performs primality testing (montreal.pm.org/tech/neil_kandalgaonkar.shtml#primality_regex) uses features of perl regexes that are more powerful than "Regular Expressions" in the academic sense - namely, stored groups. Regular languages can't require arbitrary memory.
–
Eric W.Sep 21 '12 at 22:52

3

@WorldEngineer: There are interesting and useful programming languages that are not Turing complete. Datalog, SQL, and ACL2 are a few examples that come to mind, as well as any number of strongly-normalizing lambda calculi used in things like type-theory-based theorem provers.
–
Ryan CulpepperSep 22 '12 at 4:03

1

Not all programming languages are turing complete. For example, purely context-free declarative languages like XML that aren't turing complete without being paired with an interpreter could be considered programming languages. It all depends on your definition of 'programming language'. All you need to transform a 'regular' language to a 'context-free' language is a push-down stack. Then it's turtles all the way down.
–
Evan PlaiceMay 17 '13 at 18:59

It is difficult to answer questions of type "is X a Y", if participants of the debate use different definitions of X and Y. It could be that for some definitions, the answer is "yes", and for some definitions the answer is "no". Especially if the answer depends on technical details where the different definitions differ. Also this discussion contains some misinformation, so please have a patience with a longer answer.

What do we mean by a "programming language"?

A simple answer could be "a language used to create programs". Sure, but: what kind of programs? What about a language that could be used to create some kinds of programs, but not other kinds of programs? Here are two specific examples to illustrate the extreme cases:

1) An imaginary language called M works like this: If the program contains the single letter "m", it creates a game of Minesweeper. Everything else is a syntax error.

Intuitively, this is not what we mean by saying "a programming language". But the marketing department of M could argue that it technically fulfills the definition, because it can be used to create a program. Sure, the compiler does some critical parts for you, but that's what compilers do, don't they? A compiler of the C language also translates some simple words into dozens of processor instructions. The M compiler just goes further and makes your job even simpler.

2) If you install the original version of the famous Turbo Pascal, you can write many kinds of programs. But you cannot write a game that runs in the web browser, because the necessary API is simply not there.

So what exactly is the thing that makes Turbo Pascal a programming language, but M doesn't have it? Simply speaking, you can do more in Pascal than in M. But imagine we have a M.NET, which creates a Minesweeper game running in a web browser. So now we have something that Pascal can do and M.NET can't, but we also have something that M.NET can do and Pascal can't. Why should we consider the advantages of Pascal important, and the advantages of M.NET irrelevant?

The answer is that you can write all kinds of algorithms in Pascal, but you can't write algorithms in M or M.NET. Sure, M compiles your command "m", and C compiles your command "strcmp". But you can put "strcmp" in a larger context, for example compare two files line by line, or read thousand strings and sort them alphabetically, or... well, millions of other things. And it is precisely this ability to use given commands in any algorithm which makes the essence of a programming language.

What exactly is an algorithm, and more importantly, what is "any algorithm"? In computer science we use the words Turing-complete. The idea is that there is a set of computer languages, where each of them is able to simulate all of them. One of those languages is the Turing machine, which is why they are called like that. Pascal is there, C is there, Java is there, Python is there, Lisp is there, Smalltalk is there, even XSLT is there. Our hypothetical M and M.NET are not there. You can learn about this more at any university providing a decent computer science course, but the idea is that a Turing-complete language can do anything that another Turing-complete language can do, if you give them the minimum necessary API. (If you give some web-browser API to Pascal, you can create all kinds of games in the web browser. If you give web-browser API to M, you are still only able to create Minesweeper.) We could say metaphorically that if you remove all APIs from a programming language, the important stuff is what remains.

What do we mean by "regular expressions"?

Different programming languages implement them slightly differently. But the original idea was that regular expressions express so-called regular languages. Note that we do not speak about programming languages here, but about (pseudo-)human languages. Imagine that you find some exotic tribe speaking a language consisting only of words "ba", "baba", "bababa" and so on. You could describe this language verbally as "a syllable 'ba' repeated one or more times" or using a regular expression as "(ba)+".

The regular expressions are supposed to express: "nothing", "this letter", "this, followed by that", "this or that", "this, repeated one or more times", and "not this". -- That is the mathematical definition. Anything else is just a convenient shortcut built from the previous components. For example "this, repeated two or three times" can be translated as "this, followed by this, followed by (this or nothing)", but it could be more convenient to write "ba{2,3}" than "baba(ba)?".

In real life, a typical implementation of "regular expressions" implements more than this. For example, using the mathematical definition, a language of "aba", "aabaa", "aaabaaa" and so on -- any number of "a"s, followed by a "b", followed by the same number of "a"s -- is not a regular language. However, many "regular expressions" used today could detect it, using the additional concept of "the same thing that we found before", written as "(a+)b\1". Using this additional concept, we can do some cool things, for example detect words consisting of prime number of letters. Still, we can't do any algorithm... for an explanation why, please study a textbook on formal languages.

So, back to the original topic: are regular expressions (defined either as: expressions describing regular languages in Chomsky hierarchy; or as: the former, plus the \1 operation) a programming language (defined as: Turing-complete)? The answer is no. No, you cannot implement any algorithm using regular expressions, and the ability to implement any algorithm is what people studying computer science typically understand as the essence of programming language.

Of course, anyone can change the answer by insisting on a different definition. As I wrote at the beginning, the technical details are important here. If you get them wrong, you get a wrong answer.

And if you are not interested in technical details, the answer could be: Can you use regular expressions (and nothing else) to make a program? No. So why call it a programming language? (However, an answer like this was downloaded and deleted here, which is why I wrote this longer version.)

EDIT: Also, anyone can create a library implementing their own new variant of "regular expressions" with some added new features. At some moment, the new features may be enough for the whole system to become Turing-complete. A trivial example would be embedding a Turing-complete language using some new syntax; but it can also happen less obviously. Maybe it already happened.

This, for example, is a small snippet I wrote to retrieve an HTML Table. Unlike other regex engines, this controls the stack of capture collections (push, peek, and pop), and can handle nested objects. I have a more complex one, but it's sorta proprietary.

I think in this example, Regex can be looked at as having all the basic requirements of a programming language. It has variables, inline memory, conditionals, input and output, it compiles using one of multiple regex compile engines (.Net in this case).

In Response to the over-used squawking to (NEVER) Parse HTML with Regex, I went ahead and posted a pre-typed response that I can post: Parsing HTML

This shows a simpler regex performing loops and conditionals (algorithms?). The only thing missing is actual Mathematical computation. This is a more detailed Regular Expression which just pulls a TD Cell more efficiently than the typical "(.*?)" method.

But even as a Regex enthusiast and self-proclaimed master, I wouldn't go around telling anybody Regex is a Programming language. My own argument against myself is that it can't stand alone, it has to be run through its own engine while being supported by another programming language engine.

If you "test" this and it doesn't work, you must realize that most regex engine "testers" don't handle .Net Regex (Balancing Groups). You'd have to actually use this in a .Net program.
–
SuamereMay 17 '13 at 18:14

3

Oh gosh, this is prima facia evidence for why you should never use regexes to parse html. Ever.
–
TacroyMay 17 '13 at 19:00

@Tacroy Nice to see somebody chimed in to parrot advice about parsing HTML with regex. While not for the faint-of-heart, combining regexes like the one above with a stack is a basic (and efficient) recipe for building a context-free parser.
–
Evan PlaiceMay 17 '13 at 19:04

1

In response to the Parrot Squawking. I have created this: Parsing HTML
–
SuamereMay 22 '13 at 1:04