Saturday, 30 April 2016

Android : A lesson on Regular Expressions by examples

Every one who has ever worked on Regular Expressions knows tricky it can be (if you don't understand it completely!). And as frustrated and angry as we might be, we can't ignore how powerful regular expressions are. Now let me put this straight first. I don't plan to pretend here that I am an expert on Regular Expressions. In this blog I am going to share with you what I have learned after spending countless hours in frustration. I am going explain Regular Expressions with practical examples that you may face at your work. So if you want to brush up your memory on RegEx characters\symbols you can check my blog on Regular Expressions (although that blog was for PowerShell, symbols have same meaning). Or there are some excellent articles on Regular Expressions that you can go through first like this.Now that you have basics let's see some practical examples :Let's say you have to find a name from a given string. Now we know names start with uppercase character. So our question boils down to this : Write a regular expression to find words that start with uppercase character followed by lowercase characters?Solution: Now there are multiple ways to achieve this. I will show you 2 ways.1- [A-Z][a-z]+2- \p{Lu}\p{Ll}+If you have gone through the 2nd link I shared above carefully which happens to be google's documentation on Pattern you can atleast recognize the strange symbols in second solution.What first solution tells is find all strings with uppercase characters followed by lowercase characters. It's as simple as that. "+" tells preceding character or group may occur one or more times. Now second solution is an advanced and cleaner way of doing same thing. \p let's you select characters based on the class name you provide inside curly braces following it. In above example Lu and Ll mean uppercase letter and lowercase letter respectively. Please note in android you have to use extra escape characters. So regex would be something like \\p{Lu}\\p{Ll}+ while compiling it using Pattern class. So our code would look like this :

String test = "this is some random text to test REGEX
Asutosh Nayak"

Pattern pattern_u =
Pattern.compile("[A-Z][a-z]+");//\p{Lu}\p{Ll}+

Matcher matcher_t = pattern_u.matcher(test);

String res = "";

int c = 0;

while(matcher_t.find())

{

res +=
"Match:"+matcher_t.group();

}

res +=
"\n";

}

textview_test.setText(res);

Also keep in mind Matcher.group() or Matcher.group(0) returns matches for all the groups it found. So if you have groups in your regex and you want to fetch match for only a certain group use Matcher.group(index) where "index" starts from 1. To
understand how group() works let's see this example :

String test = "this is some random text to test REGEX
Asutosh Nayak"

Pattern pattern_u =
Pattern.compile("[A-Z][a-z]+");//\p{Lu}\p{Ll}+

Matcher matcher_t = pattern_u.matcher(test);

String res = "";

int c = 0;

while(matcher_t.find())

{

res +=
"Match:"+matcher_t.group();

}

res +=
"\n";

}

textview_test.setText(res);

It gives result like this :

As you can see “Group No.1” returned “REGEX” which was found
by regex within first pair of parentheses and “Group No.2” returned “Asutosh”
which was our second group. For those of who are wondering what happened to
“Nayak” note that our regex was for a word with all uppercase characters
followed by a word with first character uppercase only.

Neat right? Not so difficult. But this was one of the
simplest regular expressions. Now let’s write some RegEx on numbers. Nothing
is complete without numbers.

What if you had a string and you wanted to find numbers in
it but not just any number. An amount in "Rupees" or "INR". So our regex should be capable of finding numbers preceded by Rs or
INR.

(?i) - This tells
that the regex that follows is case insensitive. So this regex will treat “RS”
and “Rs” the same way. If later you want
to add a group to your regex which is case sensitive just add a (?-i) before it.

(?:)- This is
called non capturing group. What it means is it will search for the patter
that’s inside the parenthesis to determine the overall match but it won’t
include this pattern in any group. As seen in above image.

\.?- ‘?’ is
called optional quantifier. It means the character or group preceding it can
occur at most once(0 or 1). So here it tells that “Rs” can be followed by a
“.”.

(\d+(\.\d{1,2})?)- This
pattern is used to recognize any decimal number. \d+ means one or more number
of digits. So digits followed by pattern for period followed by 1 or 2 (at
most) digits which is optional since numbers may not have decimal portion.

That’s all there is to it. These were the confusing symbols
in this regex.

Let’s make it even tougher. What if your string has now two
numbers and you want to get the ordinary number not the money value. So we have to build a regular expression to
find a number which is not preceded by Rs or INR. Sample String : "this is
some random text to test REGEX for amount Rs. 911.10. Let's see 909.98
Test."

Solution: (?i)[^(Rs|INR\.?\s?)](\s\d+(\.\d{1,2})?\s?)

Tip: while
writing complicated regular expressions always try to start small. Like if you
have to find numbers not preceded by Rs or INR try first finding Rs or INR,
then find numbers with Rs or INR and then finally negate the Rs or INR group.
This will help you find which portion of regex is not working.

Using similar code as above and doing necessary
changes to string and regular expression following result can be found :

All we did was surround the regex for finding "Rs or INR"
with within square brackets and add an "^" to it. "^" is like logical NOT. So
it signifies that we want numbers which are not preceded by Rs or INR.

Regular Expressions can be tricky to debug and
really frustrating sometimes. But it’s a really powerful tool at our hands to
quickly search for a pattern.

No comments:

Post a Comment

About Me

Asutosh is a software engineer. He has experience in SharePoint and Android. He also has some experience in web development. He just loves coding. In his free time he develops small applications (Windows,Android) to make life easier. When he is not coding he can be found writing blogs, reading non-fiction books or listening music (and singing along). He loves road trips. He is also an Artificial Intelligence enthusiast who subscribes to the theory that one day Machines may take over human race ;).