Monday, June 25, 2007

Perl 6: Round 4

Pugs revision: r16657

(To start off: I'm sorry about this taking so long in advance, life caught up.)

I'm pretty sure we're at the act you've probably been waiting to see: Rules and Grammars.

What you may know as perl 5 regular expressions, you now need to know as Perl 6 rules. The change in jargon is not substantial with regards to rules themselves: the new name was simply chosen because 'perl [5] regular expressions' were nowhere near the formal definition of a regular expression.

Grammar's are a new addition. They're basically just "Classes for Rules" (a grammar inherits from the base class Rule) and simply act as a namespace to organize your rules [in general.]

Rules can be tricky. There're a lot of pit falls and whatnot that you can fall into. It's therefore important that you be patient with them; you can build a lot of really useful things with primitive rules, but things have changed. As we go on, I'll try to address these things. For everything else, you will probably want to refer here, here, and here as well.

Important: It's good to note a lot of the things I'll describe here are *not* fully implemented (or even partially implemented) in the way they should be as according to the synopsis' and apocalypses, etc.. This is merely an introduction; full implementation of the rule engine is a milestone for Pugs, however, it is currently not yet completed.If I show anything that does work (as far as I know,) it'll be in the pugs prompt.

Ready, set, begin.

RulesAs I said earlier, Rules are simply regular expressions in Perl 6. They merit their own keyword, rule and can be used one of several ways (when using rules, you are given back an object in the case of construction of them and their usage.)

The first and simplest way to use a Rule is matching. Matching is simple enough; match a string against a rule and give me the result. Matching is done in the form of:

if($str ~~ m/.../)

(Note: ~~ is the 'smart match operator.' It is analogous to perl 5's =~. See S03 for more.)

Rules in the form of m/.../ immediately match. You can also use the substitution form, s/.../.../ which also immediately matches (and, yes, substitutes.) Finally, using the simple /.../ form will immediately match given that context (i.e. used with the smart match operator. This form can also define a deferred match.)Here're a few examples of using these forms, up to this point:

You may want your rules to be a little more flexible than that, however, by using deferred matches. Using the form rx// you can define a deferred match that can be stuck in a variable, ex:

pugs> my $r = rx/^abcd$/;

The same could be expressed without the rx prefix (in relation to what I said earlier about /.../ and context.)Without the rx prefix and defining a rule in this manner, you can also prefix the rule with any unary operator (S03) to force that rule to immediately match in a context it wouldn't normally (it will match in with $_.) For example:

GrammarsGrammars are simply classes for rules. Their declaration is analogous to that of a class; if the grammar keyword is followed by a block:

grammar Dog { ... }

The namespace of the grammar is confined to that block. If that block is absent:

grammar Dog;

It continues until the end of the source file.

The main difference in calling a rule that's defined within a grammar is that you simply have to give the fully qualified name, ex:

grammar Dog {rule bark { ^ < bark woof ruff > $ }}

"woof" ~~ /<Dog.bark>/; # True

Like classes, rules can do things like inherit and the like.

Important Detail #1:I'm going to at this point take some time to tell you something important that is, well, very important: interpolation doesn't exist.

I'll give you a moment to let it sink in.

Whereas in perl 5 you could freely embed scalars and the like into your regexes, you cannot do this any longer in a rule. Rather, they are passed raw to the rule engine, which decides how to deal with them from there. This is because, now, regexes are not strings; they're programs. Wall described this in A05. To quote him:

"The problem with \Q$string\E arises because of the fundamental mistake of using interpolation to build regexes instead of letting the regex control how it treats the variables it references. Regexes aren't strings, they're programs. Or, rather, they're strings only in the sense that any piece of program is a string."

In this fashion, the common misconception is to think of your rule as a string, and therefore letting interpolation come as naturally as it would with any other string. Rather, let the rule engine figure out how to deal with your variables. Now, you will use the general syntax of an assertion with your variable to help the engine determine how things should be treated.

I figure I'd take this time to point something like this out, as it's pretty important. If you really really really need interpolation that badly, you can use the P5 rule modifier to acheive it (see below.)Continuing...

Special characters & Co.In perl 6 rules, like perl 5 regexes, you have a lot of special characters you can use inside your rules for specific purposes. Here I am merely going to list some of them and provide examples, this isn't a definitive reference to them.

MetacharactersThe general metacharacters you can have inside a rule itself are as follows:

. Match any single character, including a newline.^Match the beginning of a string.$Match the end of a string.^^Match the beginning of a line.$$Match the end of a line.|Match alternate patterns (OR).&Match multiple patterns (AND).\Escape a metacharacter to get a literal character, or escape a literal character to get a metacharacter.#Mark a comment (to the end of the line).:=Bind the result of a match to a hypothetical variable.( . . . ) Group patterns and capture the result.[ . . . ] Group patterns without capturing.{ . . . } Execute a closure (Perl 6 code) within a rule.< . . . > Match an assertion.

These are mostly new, however, interpretation of their meaning should not be hard if you've used Perl 5 regular expressions before. Explanation of these are not really needed, just play around with them (we'll cover more on assertions in a moment, however)

Escape sequencesThere are also plenty of escape sequences you can use inside a rule to specify things such as whitespace or group an entire word together. Here's a quick list:

\0[ . . . ] Match a character given in octal (brackets optional).\b Match a word boundary.\B Match when not on a word boundary.\c[ . . . ] Match a named character or control character.\C[ . . . ] Match any character except the bracketed named or control character.\d Match a digit.\D Match a nondigit.\eMatch an escape character.\EMatch anything but an escape character.\fMatch the form feed character.\FMatch anything but a form feed.\nMatch a (logical) newline.\NMatch anything but a (logical) newline.\hMatch horizontal whitespace.\HMatch anything but horizontal whitespace.\L[ . . . ]Everything within the brackets is lowercase.\Q[ . . . ] All metacharacters within the brackets match as literal characters.\r Match a return.\R Match anything but a return.\sMatch any whitespace character.\SMatch anything but whitespace.\tMatch a tab.\TMatch anything but a tab.\U[ . . . ]Everything within the brackets is uppercase.\vMatch vertical whitespace.\VMatch anything but vertical whitespace.\wMatch a word character (Unicode alphanumeric plus "_").\WMatch anything but a word character.\x[ . . . ]Match a character given in hexadecimal (brackets optional).\X[ . . . ] Match anything but the character given in hexadecimal (brackets optional).

Most of these should be fairly self explanatory.

"Extensible metasyntax"From S05: "Both < and > are metacharacters, and are usually (but not always) used in matched pairs. (Some combinations of metacharacters function as standalone tokens, and these may include angles. These are described below.) Most assertions are considered declarative; procedural assertions will be marked as exceptions."

In general, the first leading character after the angle bracket determines the an assertion's semantics. Here're a few of them:

If there is whitespace after the opening bracket, and whitespace before the ending one, the the characters inside are treated 'quote style' and used in a non-capturing group. Ex:

rx/< hello there how are you >/Is equivilant to:rx/[hello|there|how|are|you]/

A leading ? makes the assertion will cause no capture, given that it matches.

A leading $ causes an indirect subrule to be invoked. I'm pretty sure you've seen this before.

A leading :: causes an indirect subrule to be invoked, yet symbolically. What this means is you use this syntax:And the contents of $var will be taken out, and what's inside will be treated as a rule name. If you've ever done php, this is analogous to the double dollar sign convention, ex:

rule z { (\d+) };my $name = "z";

"123" ~~ /<z>/;# above is the same as:"123" ~~ //;

A leading @ makes things act 'array-like.' This:

"..." ~~ //;

Is semantically the same as:

"..." ~~ /[@arr[0] | @arr[1] | @arr[2] | ... ]/;

However, rather than matching as literal, each element of the array will be treated as a subrule. This can be pretty useful, as you can match your text against an array of different rules.

A leading { (also followed by a closing } right before the ending angle bracket) basically allows you to define an in situ closure that is expected to return a rule, which at that point is matched.

A leading & treats a subroutine as if it will return a rule. This:

Is the same as:

<{ foo() }>

It's pretty much just a shorthand.

A leading [ (like the curly bracket, also ending with a ]) indiciates a character class. This class can be negated by instead prefixing your opening bracket with a -. Examples:

pugs> "a" ~~ // # truepugs> "a" ~~ // # falsepugs> "1" ~~ // # true

There is additional flexibility in that you may 'add' and 'subtract' character classes. For example, to check that a string has no vowels:

$str ~~ //

Leading ! indicates negation (naturally.)

Rule modifiersAside from the above, you still have your handy dandy rule modifiers. Their usage is essentially the same, but now they are passed at the front of the rule rather than at the end as it both makes life easier for you and the parser. Here're a few of the modifiers you can use:

:i Ignore case:g Match as many times as possible:s Treat whitespace as 'significant,' i.e. it must occur verbatim:P5 Use Perl 5's regular expression syntax, rather than Perl 6.:Nx Works like :g, however, the N specifies exactly how many times it must match. The general form is :x(N):Nth Find the Nth occurance. Useful for substitutions, i.e. s:5th/lbrary/library/ if you said something wrong in your sentence. The general form is :nth(N)

In the case of just declaring a deferred rule (rx/.../) or a match (m/.../) these modifiers are placed after the rx/m token and delimited by a colon, i.e. rx:i/.../, m:g/.../, et cetera et cetera. There are plenty more, however, I'm leaving them out as I assume if you need them, you'll find them (sue me.)

Built-in rulesAside from your rules, there're naturally plenty of built in ones you can use. Here are a few:

<alpha> Match a Unicode alphabetic character.<digit> Match a Unicode digit.<sp> Match a single-space character (the same as \s).<ws> Match any whitespace (the same as \s+).<null> Match the null string.<prior> Match the same thing as the previous match.

Like other rules, you can change their meanings with assertion semantics.

Hypothetical variablesHypothetical variables are a new feature of Perl 6 rules. In a perl 6 rule, a hypothetical variable allows you to bind a variable within a rule. If your match fails, your hypothetical variables are automatically unbound from what they were (in the case that your match failed after the fact.) However, the variable must be in lexical scope before you may bind to it via a the := operator. This is less complicated than it sounds, here's the example:

my $z;"I am a person" ~~ m/^$z := (\w+)/;$z.say; # should print "I"

Fairly simple.

ConclusionThis has been a nice post. Hopefully, your rule-fu has increased. The changes may need a little time to get used to, however, in time all should be good. :) Like I said, a -lot- of this is not implemented, and I have not even breathed upon the technical surface of rules; I'm not exactly the definitive reference on them anyway. This should give you a taste, however.

Perhaps an unofficial 'Round 4b' is in order. We'll see...Until next time...

Next round: ??(that means I'm open to recommendations. If none arise, macros seem like a good topic)

@anonymous: thanks! The blog was only recently added to planet six, I'd say about two weeks ago; not totally unsuprising you haven't seen anything on it yet.

@thomas: The problem is like I said, a lot of this stuff is not implemented fully or even at all. At least in Pugs; PGE is a fairly complete implementation of Perl 6 rules, however. This makes it worth looking into for further experimentation.Thanks for the feedback though. :)

Perl is the best scripting language for Text processing and handle regex. I have posted few articles related to those at my bloghttp://icfun.blogspot.com/search/label/perlAlso Perl's Cpan has lots of support that I don't even need to think extra while developing project. I didn't find such help on other programming language except Java and .NET

I mostly visits this website[url=http://www.weightrapidloss.com/lose-10-pounds-in-2-weeks-quick-weight-loss-tips].[/url]diveintoperl6.blogspot.com is filled with quality info. Let me tell you one thing guys, some time we really forget to pay attention towards our health. Are you really serious about your weight?. Recent Scientific Research displays that about 80% of all U.S. grownups are either chubby or weighty[url=http://www.weightrapidloss.com/lose-10-pounds-in-2-weeks-quick-weight-loss-tips].[/url] Hence if you're one of these people, you're not alone. Its true that we all can't be like Brad Pitt, Angelina Jolie, Megan Fox, and have sexy and perfect six pack abs. Now the question is how you are planning to have quick weight loss? [url=http://www.weightrapidloss.com/lose-10-pounds-in-2-weeks-quick-weight-loss-tips]Quick weight loss[/url] is really not as tough as you think. You need to improve some of you daily habbits to achive weight loss in short span of time.

About me: I am writer of [url=http://www.weightrapidloss.com/lose-10-pounds-in-2-weeks-quick-weight-loss-tips]Quick weight loss tips[/url]. I am also health expert who can help you lose weight quickly. If you do not want to go under hard training program than you may also try [url=http://www.weightrapidloss.com/acai-berry-for-quick-weight-loss]Acai Berry[/url] or [url=http://www.weightrapidloss.com/colon-cleanse-for-weight-loss]Colon Cleansing[/url] for effective weight loss.

Meteorite that struck today in Russia from entering the atmosphere have a width of 15 meters - NASA said. According to NASA, a meteor which fell in Russia was higher than that in the 8 October 2009 hit Indonesia. Lokaty