Posted
by
CmdrTacoon Monday November 08, 2004 @01:19PM
from the zztop-wants-a-perl-necklace dept.

An anonymous reader writes "Perl 6 is finally coming within reach. This article gives you a tour of the grammars and regular expressions of the Perl 6 language, comparing them with the currently available Parse::RecDescent module for Perl 5. Find out what will be new with Perl 6 regular expressions and how to make use of the new, powerful incarnation of the Perl scripting language."

Perl chose a keystroke-efficent syntax that makes them unreadable to anybody who doesn't know how to read them. It also made them very compact and easy to write for anybody who does know how to read them. They look very intimidating, but underneath they are usually easier to understand than the C like perl code surrounding it.

They are amazingly useful. Seriously, if you have never learned about Regular Expressions you owe yourself a lesson in how they work and what they do. I've seen people spend days working on stuff that can be written (more efficently!) in a regular expression in a matter of minutes. Pattern matching is the sort of thing that every general purpose language should have, it is a shame that the basic Regular Expression libraries that comes with most Unixes is such a piece of crap. Who wants to deal with the arcance invocation method, the extremely limited syntax, or the syntatic sugar like: "[[:digit:]]{2}:[[:space:]][[:space:]]*[[:alpha:]] *" when you could write "\d{2}:\s+\w*"?

No, it most certainly did not. Regular expressions as they exist in Perl today are a direct descendant of POSIX regular expressions which derive from the original work done by Ken Thompson (which resulted in the grep program, which stands for "global regular expression print"). That syntax further dates back to the giants in the field of computational theory, and was specialized only slightly for text matching.

grep, awk, sed, ed, vi, emacs, and dozens of other programs and languages for Unix used this notation before Perl came along and adopted it, so let's not pretend that this syntax is somehow Perl's doing.

The extended regular expression syntax of today IS perl's doing and in almost all cases it has been a process of making regular expressions both more powerful and more readable, culminating in Perl6's rule syntax which is highly readable by comparison.

There really aren't many choices. The current regular expression syntax is the only form I've seen tried, with only minor variation.

as he has done somewhat with the Perl 6 expressions

Perl 6 regular expressions have almost exactly the same syntax as Perl 5. The parts that are new are not regular expressions. Cosmetic differences (like [] vs <[]> are fairly ignorable syntactically. It would be like saying that Perl 5 will use// as the comment character instead of # (not that it will, just an example).

All of the inline comments and whitespace are part of Perl 5 extended expressions, though the word-matching on whitespace is new to Perl 6.

POSIX on the other hand ignored most of that historic syntax and instead chose their horribly bloated keyword syntax.

That's not really part of the regular expression syntax. Having [[:digit:]] as an alias for Perl's \d is hardly a different syntax so much as sugar. The fundamentals of POSIX regular expressions are the fundamentals of all modern regex syntaxes:

alphanumerics are literals

backslash is a character escape

parens are used for grouping

*, +, ? and {} are repeat count specifications

These are the fundamentals of Perl regular expressions, POSIX, and all of the other modern regular expression engines and in turn have only a few small differences from the basic regular expressions which Unix started with.

I've often thought that the ease with which regular expressions can be accessed within per was a blessing and a curse. So many people like yourself seem to think that Perl championed regular expressions, when in fact it just followed AWK's lead in integration between C and Ken Thompson's regular expression implementation (which in turn inspired the version that was written from scratch by Henry Spencer and used by Larry for Perl).

If you have a new syntax in mind, I suggest introducing it and seeing how it does. Modern regular expressions are an incremental improvement on classical set notations, and have served us well to date, but I'm sure someday someone will see a better way.

When you see a regular expression like that it's a good indicator that the person that wrote it wasn't very familiar with how to write good REs. The above suffers from "leaning toothpick syndrome." If you are trying to match the '/' character, then don't use it as the delimiter of the RE. For example, compare the following REs, which are equivalent:

m/\/\/(.*)\/\//i

and

m,//(.*)//,i

Using ',' for the delimiter of the RE means you don't have to backslash-quote the forward slash to use it in a match.

It is good to see PERL focussing on what makes it great. There is no other language, IMHO, that handles text input as well as PERL does. Adding this level of processing just makes it even more powerful.

Yeah, now that I've RTFA, I realize just how cool these advances are. They've basically taken some of LISP and built it into Perl, but added a few extensions and predefined strings on top of it. Besides looking MUCH cleaner and being MUCH easier to read/maintain, it should be much more powerful for programmers that know LISP.

The idea of:p5 is not just that you can take Perl 5 code and modify it to make it work.

The idea is that if you don't bother to write a zillion-rule grammar to match whatever you're trying to match, you can still use the P5-style regular expressions you know and love. It's another case of Not Swatting A Fly With The Nuke.

Meaning that it is not backward compatible without modifying your source code.

Thus spake Larry Wall in Apocalypse 5:

...we took several large steps in Perl 5 to enhance regex capabilities. We took one large step forwards with the/x option, which allowed whitespace between regex tokens. But we also took several large steps sideways with the (?...) extension syntax. I call them steps sideways, but they were simultaneously steps forward in terms of functionality and steps backwards in terms of readability. At the time, I rationalized it all in the name of backward compatibility, and perhaps that approach was correct for that time and place. It's not correct now, since the Perl 6 approach is to break everything that needs breaking all at once.

And unfortunately, there's a lot of regex culture that needs breaking.

And from Apocalypse 1:

It would be rather bad to suddenly give working code a brand new set of semantics. The answer, I believe, is that it has to be impossible by definition to accidentally feed Perl 5 code to Perl 6. That is, Perl 6 must assume it is being fed Perl 5 code until it knows otherwise.

In other words, it is backwards compatible, it isn't backwards compatible, and when you install Perl 6, you are installing both.

I admit that's an ancient version of Perl, but unfortunately that's what I'm stuck with here. At home it might say perl5.8.5 or so.

I realized a long time ago [perl.org] that I'd better have every program I wrote tied to specific installation of a specific version of Perl, to avoid problems in installing future versions or new modules. Has nothing to do with Perl 6; it's just good configuration management. I can at any time install another

As others have pointed out, Perl 6 interpreters (at least the default one that is Parrot-based) will hand your code off to Ponie [poniecode.org] or something like it by default. You will have to start your program with the module keyword or the use 6 statement to force Perl 6 behavior, or use a special binary (e.g. something like/usr/bin/perl6).

The:p5 modifier is not there for backward compatibility so much as to allow the programmer to choose the model of regular expression to use. There are trade-offs. Here are two Perl 5 regular expressions:

m{[a-z][A-Z]+}m{^(?:\w+\d|\S+(?:\'s)?)$}

which are written in Perl 6:

m{<[a-z]><[A-Z]>+}m{^[\w+\d|\S+[\'s]?]$ }

Note that Perl 5 syntax is actually a bit nicer for the first one, so you can continue to use Perl 5 syntax there. In the second case, the new bracket-operator is very handy for enclosing sub-expressions that don't have to be remembered in the positional variables (the same as the Perl 5 (?:...) operator). You can even mix them:

I know what you mean, and really I haven't read into detail about this particular situation. However, if it truly executes code using perl5 by default unless a perl6 declaration is made then that would be more than just "backwards" compabability

Yeah, Perl 5 hasn't changed that much over time. But it has been around for a while. Perl 6 is just different.

From what I have seen from the announcements, the Perl 6 syntax looks far cleaner, probably more consistent and less ugly. Some of the new tricks look genuinely handy. For example, if it seems like type checking would be a good idea, you can have it if you want it, even on compile time!

Especially the regular expressions side seems pretty interesting, as noted in this article. Regular expressions have always been a poor but effective replacement for grammar-based parsing, and now finally Perl is going to have both integrated. There's probably going to be less whining about line noise.

And then there's something that I find especially interesting, though it hasn't been explained in detail yet: Complete tuning of the object system. In case you haven't noticed, Perl 5's object system is a complete and utter mess that looks and smells like it has been added as an afterthought, and rest assured it's going to be changed radically for better in Perl 6. I'm definitely waiting eagerly to see what Perl 6's take is going to look like - I sure hope it's something like Ruby, only it smells like a camel =)

In case you haven't noticed, Perl 5's object system is a complete and utter mess that looks and smells like it has been added as an afterthought

If you even consider it an object system; I use it daily and I'm still skeptical about calling it object oriented programming. Reminds me really of ADT with C with some new 'features' added to make it slightly easier. Not that I don't like it, I still find it very useful but....

Perhaps you should go look up the difference between infer and imply. As I am making the inference, I can infer whatever I so choose.

Faster means faster than perl 5. smaller means smaller than perl 5. Got it? I have no desire to understand programs written by others. I have a staff for that. I'm only concerned about the ones I write myself.

Call your pharmacist, dude. You seem to be running low on anti-psychotics.

> What does Perl6 offer a satisfied Perl5 user? Is it faster? Smaller?

It features better support for key paradigms, including object-orientedprogramming (finally, a real object model), functional programming (we'regetting continuations), and even some improvements for contextual programming.In other words, Perl6 will be a substitute not just for Perl5 but also forScheme and Smalltalk.

Also, the whole Parrot thingydoo is going to allow software written in onelanguage to seamlessly use libraries written i

Perl 6 is probably producing a GLR parser, as Parse::RecDescent is a GLR parser (it means it would be slower, but more flexible).

Isn't Spirit a LALR parser? Or an LL(1) parser?

It's not going to be faster than Spirit, because GLR parsers are slower than every other kind of parser.

On the other hand, you don't have to do all the wierd stuff you have to do with Spirit because it's mostly just syntactic sugar on top of C++, and therefore uses only C++ syntax (which doesn't look much like the natural pseudoco

The intent is that grammars default to recursive descent, but that it be possible to ask for various kinds of optimizations via pragma. The grammar for parsing Perl 6 itself will be a hybrid between top-down and bottom-up techniques to maximize both speed and flexibility.

Perl 6 will probably not be faster than boost, but keep in mind that you also gain the power of a fully dynamic programming language in Perl 6's rules. Rules act as closures and can also contain Perl 6 code. Hypothetical variables are really going to blow people's minds (I know they took me a while to grasp, and when I did, I just sat around saying "wow" for a while:-)...)

"Perl XS is acknowledged to be a nasty mess. My guess is the Perl guys would drop it like a hot rock for our [Python's] stuff --

that would be as clear a win for them as co-opting Perl-style regexps was for us." [emphasis added]

Maybe I misinterpreted ESR's intended message, but it would be disappointing if hypercompetition prevented Perl's already-influential regex extensions from exerting a positive influence on other platforms. Raymond seems to imply that the Python team only grudgingly included support for Perl-style regex. I understand that developement teams in similar niches each want to make a big splash in the industry, hopefully Python's great increase in popularity has softened the survivalist attitude that seems to characterize this Raymond quote from Python-Dev. Evolving regex can benefit everyone.

Note to those ready to mod me Troll/Flamebait: I'm not trying to pick on Python, I just happened to be acquainted with this candid quote.

Yes, you did misinterpret the message. Eric Raymond was a former Perl programmer, and is now a Python programmer. He was saying that Python's native-code-binding facility is superior than Perl's XS, and it would benefit Perl to adopt it. He mentions that Python benefitted from adopting Perl's regex syntax. Nowhere does he say or imply it was "grudgingly" done.

By the way, not long after he wrote that, Perl coders started using the Inline:: modules like Inline::C [cpan.org] instead of XS, which is very easy to use. I do not know if this was an adoption of Python's technique, but I don't think so.

Yes, you did misinterpret the message. Eric Raymond was a former Perl programmer, and is now a Python programmer. He was saying that Python's native-code-binding facility is superior than Perl's XS, and it would benefit Perl to adopt it.

Thanks for mentioning that. You are absolutely right, and shortly after I posted the message I stuck my foot in my mouth when I saw to my horror that I had gotten it totally backwards and maligned Eric Raymond in the process!! Another casualty of the rush to post while the

Exactly. Python and Perl are not really competitors in the strictest sense. They both build on each other. In many ways, I think Larry would have made some of the choices that Python did, had he started out in the 80s knowing what he knows now, and that's evidenced by how much of Perl 6 draws from Python (as well as Ruby, Scheme, LISP, Smalltalk, C++ and Java).

Of course, the basic approaches to language design follow different philosophies (Perl's is one of inclusion, Python's is one of exclusion... both a

See, now, totally is too strong a word.
The point I was developing is that

And likewise, it would be a clear win for the Perl people to use Python-style C extensions.

is an example of small thinking. When all your language objects are belong to Parrot, swapping out various regex engines, database engines, XML parsers, etc. suddenly enters the realm of the possible.
Someday.

I can understand a desire for adding grammars that are more powerful than
regular expressions in Perl 6 but it opens up a whole new can of worms.

The grammars appear to be in a class called "context free languages"(CFGs). Some CFGs are ambiguous in the sense that a given "sentence" can be derived from more than one set of rules. Traditional tools such as yacc/bison tell you where there is ambiguity in your rules - even then it isn't always easy to remove the ambiguity (trust me on this). If the Perl 6 system doesn't help the programmer debug the grammar he/she will not be happy when the parsing doesn't work as expected.

In addition, the article ends the description of features with "And much more...". It appears that Perl 6 grammars are more powerful than CFGs. If they can simulate a Turing machine...

What bugs me is they don't describe the type of parser being generated. Parse::RecDescent does just what it says... it generates recursive decent parsers. However, recursive descent parsers are not as powerful as the bottom-up parsers generated by, for example, Yacc/Bison (LL vs LR).

However, recursive descent parsers are not as powerful as the bottom-up parsers generated by, for example, Yacc/Bison (LL vs LR).

That's backwards. Recursive decent with backtracking can parse all LL(k) grammars for arbitrary k. OTOH, yacc/bison can only parse LR(1) which, although sufficient for most realistic grammars, definitely is not as general as a full LL(k) method.

Left-recursive grammars are a red herring -- you can always eliminate the recursion, and with backtracking you can deal with arbitrar

Perl 6 grammars are a full citizen of the language on a level with subroutines and classes (loosely speaking, in Perl 6, rule:grammar::method:class, actually). They're effectively Turing-complete as a result, since Perl 6 is obviously Turing-complete.

Perl 5 "regexps", by contrast, are more of a specialized second language bolted onto the side (I use quotes since Perl 5 regexps are already marginally more powerful than "pure" regexps).

I use quotes since Perl 5 regexps are already marginally more powerful than "pure" regexps

Are you sure? I looked into this because my instinct told me you were right and I wanted to know how much more powerful but then I found this line in the Camel Book: "The Perl Engine uses a nondeterministic finite-state automaton (NFA) to find a match" (Programming Perl 2nd ed., page 60). If correct that would suggest that Perl regexps and "pure" automata regexps are equivalent.

I'm studing seriously the posibility of tackling a whorty coding proyect, the rewriting of the entire LINUX kernel on a languaje very much but not unlike C and was considering doing it in
C-INTERCAL but after seing things like this
http://ozonehouse.com/mark/blog/code/PeriodicTable.html [ozonehouse.com] , I changed my mind and will use PERL 6 instead.

I get sick of the 'standard' backlash every time a Perl article is posted. Why do people have such a problem with Perl? It's an excellent, high-level general purpose programming language with a huge range of extension modules available [cpan.org]. I have personally used Perl for many projects, as do TicketMaster [ticketmaster.com], ValueClick [valueclick.com], Morgan Stanley [morganstanley.com] and Ryanair [ryanair.com] and I've also learnt a lot about software engineering and computing through Perl.

Yes, it does include a lot of symbols, but there is payback to learning them, and really most programs won't use much beyond $ % # () [] {}. Unlike some languages [java.com], Perl is not what I would describe as a 'bondage' language. If you want to program sloppy, you can program sloppy. That's fine by Perl. And this generousity is what gives Perl its bad reputation. This is funny since I and most knowledgeable Perl programmers can write perfectly clear and maintainable code. The way we do this is no secret--it's just by commenting appropriately, using meaningful identifier names and following the Perl style guidelines [cpan.org].

People can mock Perl all they like, but it is still a widely used powerful programming language and I am more productive in it than any other language. As a parting comment, a Cisco employee once told me (off the record of course!) that "Cisco would fall apart without Perl".

Why do people have such a problem with Perl? It's an excellent, high-level general purpose programming language with a huge range of extension modules available. I have personally used Perl for many projects, as do TicketMaster, ValueClick, Morgan Stanley and Ryanair

"Much as I hate to say it, the Computer Science view of language design has gotten too inbred in recent years. The Computer Scientists should pay more attention to the Linguists, who have a much better handle on how people prefer to communicate."

a) Is it called GNU/Linux or Linux?b) Emacs vs Vic) "Ok" goes on the left, "Cancel" goes on the right.d) Security is based on market share - NO! Apache is more secure despite bla bla bla!e) 45 RPM LPs sound better than reel-to-reel!

Those of us that use Perl as more than just system duct-tape know it's a programming language. Perl 6 will make that even more clear by being based on OO fundamentals rather than being a procedural language with OO tacked on top of it. This is just another debate that makes the OS community look like a bunch of freaks and zealots... just like the GNU/Linux thing. Get over it and start focusing on what the software does, not how to classify/name it.

I've got a fully multithreaded perl script running under Win32. It wasn\'t too bad to write but some parts sucked. One of the things sucked because Win32 doesn't support alarm() calls and you have to manually poll sockets and I hate that shit, or use vec() and that's just insane (how many people understand how vec() works anyway???) The other big thing that sucked was the crappy mechanism for sharing complex data structures between threads. All's honky-dory if you're just sharing a scalar variable, but

I tried to absorb the syntax docs one afternoon, but it gave me nightmares. Literally. It was as if the C-programming-part of my brain was in conflict with the oddball operators and constructs presented in the perl language. Ever since I've been haunted by perverse unreadbility of it all. I liken the experence to attempting to think in brainfuck [c2.com].

I find that reaction to Perl by people familiar with highly structured languages is common. I think this is because Perl has things like weak typing and overly flexible syntax, things that make experienced programmers vomit in their own mouths. But what's great about Perl is that you CAN have strict grammar, and you CAN have strong typing, if you desire. It's just not required.

This makes Perl very strong as a teaching tool for beginner programmers. They can start out writing loose, messy code that gets

I think you went about things the wrong way. Why would you ever look at the nitty gritty syntax rules first when trying to learn a language. First do some simple examples to get the general feel of the langauage. Then learn the nitty gritty stuff as required.

"IMO, "the right job" for perl is about 2% of all programming tasks out there."

76 percent of all statistics..... You get the point. You really dont have any valid point here, every language is designed to do certain things, and people will use it for those things and more. Trying to say whats the best langague out there is stupid. Trying to say what percent of projects perl should be used on is also stupid.

"It can accomplish this, but not without the reader having to go through the mental gyrations of what could be best called linguistic decompression."

Have you tried to program in a logical language lately? Have you tried to program in a functional language lately? Have you tried to program in anything but your standard imperical/oo language lately? There are tons of styles of languages, and each one required its own linguistic decompression. Which one feels more natural its a matter of opinion.

I don't quite understand where you're coming from, because Perl is one of the few languages that has allowed me to code in the way I think. For most languages, I have to think like the computer does, ie. momentarily turn my brain into an i386 CPU (or whatever arch it is). But not so with Perl. It's been a real joy to be able to write in an almost Enlish-like syntax, eg:foreach $line (@data) { chomp $line; my ($x, $y, $z) = split(':', $line); update_coords($x, $y, $z);}

I tried to absorb the syntax docs one afternoon, but it gave me nightmares. [...] Ever since I've been haunted by perverse unreadbility of it all.

When I started to learn Perl (coming from a C background) I had quite a different experience. I really felt I had "come home", or something like that. Sure, you can write obscure code, but that's no different from C. But you don't have to, it can be very clear.

I'll give credit to the fact that perl is compact, terse, to the point and has a reputation for strin

Perl is a language, so it follows that it is a communication medium. By that it should be able to communicate something to a party outside just the author and the perl interpreter.

Perl source does communicate, with people who know Perl. That's like saying English is a useless language because it is constructed ad-hoc and because the complainer has never been bothered to learn it. The fact that some people find English difficult makes English no less useful to people who most easily express or comprehend ideas in it.

IMO, "the right job" for perl is about 2% of all programming tasks out there.

Nice statistic. Where's your breakdown of all programming tasks, and the reasoning for the other 98% why Perl is not the right tool for the job?

This is evident by the fact that even though perl was the prominent CGI language of the mid-nineties, it lost the overwhelming majority of that interest with alarming speed.

That has nothing to do with Perl the language, and everything to do with the shift towards languages which are designed to execute within a web server process without forking. mod_perl fills this hole, but as a general purpose language it is not as tightly integrated with a web server environment as something like PHP or ASP.

Why don't you chill out? I am a mod_perl developer. What I meant by that comment is that you don't find mod_perl hosting environments as readily as PHP or ASP hosting environments, because mod_perl isn't provided in turnkey server setups that web hosting farms set up for their co-los. The reason for that is because Perl isn't seen as a web development language like PHP or ASP so it's not on the list of prerequisites for a web-server-in-a-box. Wishful thinking won't change t

Sounds to me like you prefer PHP and therefore spent more time perfecting your understanding of it. If you know and understand a language (any language) your work will require less time and will (surprise, surprise) be easier for you.

I wrote at least some code in:BashC/C++Delphii386 ASMJavaPascalPerlPHPPrologPythonRuby Visual Basic

So I have some experiences. But I see this also all around - I'm managing also 5 other SW guys and those who write in Perl produce really unmaintable code while the same ppl in C/PHP produce really nice code (when run through indent;-)) - I know this is very small sample but I haven't seen on my eyes any large project (maybe except/.;-))) written in Perl...