Search This Blog

Friday, May 30, 2008

Programmers need to know how to manipulate strings for a variety of purposes, regardless of the programming language they are working in. This article will explain the various methods used to manipulate strings in Python.Introduction

String manipulation is very useful and very widely used in every language. Often, programmers are required to break down strings and examine them closely. For example, in my articles on IRC (http://www.devshed.com/c/a/Python/Python-and-IRC/ and http://www.devshed.com/c/a/Python/Basic-IRC-Tasks/), I used the split method to break down commands to make working with them easier. In my article on sockets (http://www.devshed.com/c/a/Python/Sockets-in-Python/), I used regular expressions to look through a website and extract a currency exchange rate.

This article will take a look at the various methods of manipulating strings, covering things from basic methods to regular expressions in Python. String manipulation is a skill that every Python programmer should be familiar with.

String Methods

The most basic way to manipulate strings is through the methods that are build into them. We can perform a limited number of tasks to strings through these methods. Open up the Python interactive interpreter. Let's create a string and play around with it a bit.

>>> test = 'This is just a simple string.'

Let's take a fast detour and use the len function. It can be used to find the length of a string. I'm not sure why it's a function rather than a method, but that's a whole nother issue:

>>> len ( test ) 29

All right, now let's get back to those methods I was talking about. Let's take our string and replace a word using the replace method:

Regular expressions are a very powerful tool in any language. They allow patterns to be matched against strings. Actions such as replacement can be performed on the string if the regular expression pattern matches. Python's module for regular expressions is the re module. Open the Python interactive interpreter, and let's take a closer look at regular expressions and the re module:

>>> import re

Let's create a simple string we can use to play around with:

>>> test = 'This is for testing regular expressions in Python.'

I spoke of matching special patterns with regular expressions, but let's start with matching a simple string just to get used to regular expressions. There are two methods for matching patterns in strings in the re module: search and match. Let's take a look at search first. It works like this:

>>> result = re.search ( 'This', test )

We can extract the results using the group method:

>>> result.group ( 0 ) 'This'

You're probably wondering about the group method right now and why we pass zero to it. It's simple, and I'll explain. You see, patterns can be organized into groups, like this:

>>> result = re.search ( '(Th)(is)', test )

There are two groups surrounded by parenthesis. We can extract them using the group method:

>>> result.group ( 1 ) 'Th' >>> result.group ( 2 ) 'is'

Passing zero to the method returns both of the groups:

>>> result.group ( 0 ) 'This'

The benefit of groups will become more clear once we work our way into actual patterns. First, though, let's take a look at the match function. It works similarly, but there is a crucial difference:

Notice that None was returned, even though “regular” is in the string. If you haven't figured it out, the match method matches patterns at the beginning of the string, and the search function examines the whole string. You might be wondering if it's possible, then, to make the match method match “regular,” since it's not at the beginning of the string. The answer is yes. It's possible to match it, and that brings us into patterns.

The character “.” will match any character. We can get the match method to match “regular” by putting a period for every letter before it. Let's split this up into two groups as well. One will contain the periods, and one will contain “regular”:

Aha! We matched it! However, it's ridiculous to have to type in all those periods. The good news is that we don't have to do that. Take a look at this and remember that there are twenty characters before “regular”:

By entering two arguments, so to speak, you can match any number of characters in a range. In this case, that range is 10-20. Sometimes, however, this can cause undesired behavior. Take a look at this string:

Finally, there are a number of special sequences. “\A” matches at the start of a string. “\Z” matches at the end of a string. “\d” matches a digit. “\D” matches anything but a digit. “\s” matches whitespace. “\S” matches anything but whitespace.

On a final note, you should not use regular expressions to match or replace simple strings.

Conclusion

Now you have a basic knowledge of string manipulation in Python behind you. As I explained at the very beginning of the article, string manipulation is necessary to many applications, both large and small. It is used frequently, and a basic knowledge of it is critical.

Wednesday, May 14, 2008

A screen capture / screenshot and edit utility written in java to becross platform. It allows the selection of an area with instant,inplace markup and annotation ability, then copy to clipboard or saveto file.

Perhaps text ads are less annoying than banners, and perhaps they’re more effective, but I still don’t want to see them. No, I don’t care if it’s Google, the daahhhling of the “in” geek crowd, serving them up—they’re still ads.

Even though I use Firefox, I was never able to get rid of these text ads. Via a built-in Firefox function, I could easily ignore domains of images (via the right-click menu), but text was a different animal. I thought I was stuck. I was wrong.Google serves their text ads in an iFrame, which to the non-techies out there, is basically an area of a page in which another page is loaded. It may not be clearly distinguishable from the parent page, but it’s coming it’s an external page. Therefore, the key is to block iFrames coming from ad servers, and in this case, Google’s ad servers. Here’s how to do it:

Download and install Mozilla Firefox. It’s a superior browser, and it should be your default browser. Internet Explorer (IE) just doesn’t cut it anymore.

In the New Filter: input box, paste this: http://*.googlesyndication.com/*

Ensure that in the Adblock preferences window, it is set to Remove ads, not Hide ads.

Never see another Google ad!

UPDATE: This userContent.css technique by Neil Jenkins is probably even better. It will catch and block most ads (including Google text ads). The ones it doesn’t get can be defeated manually by Adblock.

BY THE WAY: Yes, I do run ads on my site, and if you block them, well, more power to ya.

Monday, May 12, 2008

The OpenOffice.org Dictionary Installer for Microsoft Windows is no longer being actively maintained.

Recent versions of OpenOffice.org include a built in dictionary installer called DictOOo which has been developed by Laurent Godard and which has many more features, and is more compatible, than the latest version of DictInstall.

To access DictOOo, choose the File menu from within OpenOffice.org, then choose Wizards and click on Install new dictionaries....

These operators work on bits and not logical values. Take two 8 bitbytes, combine with any of these operators, and you will get another 8bit byte according to the operator's function. These operators work onthe individual bits inside the byte.

A truth table helps to explain each operation. In a truth table, a 1 bit stands for true, and a 0 stands for false.

The OR operation truth table:

0 OR 0 = 0

0 OR 1 = 1

1 OR 0 = 1

1 OR 1 = 1

The AND operation truth table:

0 AND 0 = 0

0 AND 1 = 0

1 AND 0 = 0

1 AND 1 = 1

The XOR operation truth table:

0 XOR 0 = 0

0 XOR 1 = 1

1 XOR 0 = 1

1 XOR 1 = 0

The NOT operator inverts the sense of the bit, so a 1 becomes a 0, and a 0 becomes a 1.

So let's say I have a byte foo that is initialized to 0:

Code:

unsigned char foo = 0;

To set bit 0 in foo and then store the result back into foo:

Code:

foo = foo | 0x01;

The OR operation is used between the variable that we want tochange and a constant which is called a BIT MASK or simply the MASK.The mask is used to identify the bit that we want changed.

Remember that we write the constants in hexadecimal because it'sshorter than writing it in binary. It is assumed that the reader knowshow to convert back and forth between hex and binary.

Usually, though the statement is made shorter in real programming practice to take advantage of C's compound assignment:

Code:

foo |= 0x01;

This is equivalent to the statement above.

To clear bit 0 in foo requires 2 bit operators:

Code:

foo = foo & ~0x01;

This uses the AND operator and the NOT operator. Why do we use the NOToperator? Most programmers find it easier to specify a mask wherein thebit that they are interested in changing, is set. However, this kind ofmask can only be used in setting a bit (using the OR operator). Toclear a bit, the mask must be inverted and then ANDed with the variablein question. It is up to the reader to do the math to show why thisworks in clearing the desired bit.

Again, the statement is made shorter with a compound assignment:

Code:

foo &= ~0x01;

To see if a bit is set or clear just requires the AND operator, butwith no assignment. To see if bit 7 is set in the variable foo:

Code:

if(foo & 0x80)

{

}

The condition will be zero if the bit is clear, and the conditionwill be non-zero if the bit is set. NOTE! The condition will beNON-ZERO when the bit is set. But the condition will not NECESSARILY BEONE. It is left to the reader to calculate the value of the conditionto understand why this is the case.

There is another useful tool that is not often seen, and that iswhen you want to flip a bit, but you don't know and you don't care whatstate the bit is currently in. Then you would use the XOR operator:

Code:

foo = foo ^ 0x01;

Or the shorter statement:

Code:

foo ^= 0x01;

A lot of times the bit mask is built up dynamically in otherstatements and stored in a variable to be used in the assignmentstatement:

Code:

foo |= bar;

Sometimes, a programmer wants to specify the bit NUMBER that theywant to change and not the bit MASK. The bit number always starts at 0and increases by 1 for each bit. An 8 bit byte has bit numbers 0-7inclusive. The way to build a bit mask with only a bit number is toLEFT SHIFT a bit by the bit number. To build a bit mask that has bitnumber 2 set:

Code:

(0x01 &lt;&lt; 2)

To build a bit mask that has bit number 7 set:

Code:

(0x01 &lt;&lt; 7)

To build a bit mask that has bit number 0 set:

Code:

(0x01 &lt;&lt; 0)

Which ends up shifting the constant 0 bytes to the left, leaving it at 0x01.

MACROS

Because there are a number of programmers who don't seem to have afamiliarity with bit flipping (because they weren't taught it atschool, or they don't need to know it because of working on PCs), mostprogrammers usually write macros for all of these operations. Also, itprovides a fast way of understanding what is happening when reading thecode, or it provides additional functionality.

Below is a set of macros that works with ANSI C to do bit operations:

Code:

#define bit_get(p,m) ((p) & (m))

#define bit_set(p,m) ((p) |= (m))

#define bit_clear(p,m) ((p) &= ~(m))

#define bit_flip(p,m) ((p) ^= (m))

#define bit_write(c,p,m) (c ? bit_set(p,m) : bit_clear(p,m))

#define BIT(x) (0x01 &lt;&lt; (x))

#define LONGBIT(x) ((unsigned long)0x00000001 &lt;&lt; (x))

To set a bit:

Code:

bit_set(foo, 0x01);

To set bit number 5:

Code:

bit_set(foo, BIT(5));

To clear bit number 6 with a bit mask:

Code:

bit_clear(foo, 0x40);

To flip bit number 0:

Code:

bit_flip(foo, BIT(0));

To check bit number 3:

Code:

if(bit_get(foo, BIT(3)))

{

}

To set or clear a bit based on bit number 4:

Code:

if(bit_get(foo, BIT(4)))

{

bit_set(bar, BIT(0));

}

else

{

bit_clear(bar, BIT(0));

}

To do it with a macro:

Code:

bit_write(bit_get(foo, BIT(4)), bar, BIT(0));

If you are using an unsigned long (32 bit) variable foo, and haveto change a bit, use the macro LONGBIT which creates un unsigned longmask. Otherwise, using the BIT() macro, the compiler will truncate thevalue to 16-bits.[/list]