I am the founder of the Singapore PowerShell User Group. I am an Exchange administrator, and I have been scripting for more than seven years now. Before Windows PowerShell, I did most of my scripting in VBScript. With Windows PowerShell, I automate Exchange/Active Directory tasks, and I am also good at WMI, ADSI, and generating custom reports. I have developed custom applications in C# for automation. I love to automate things!

Last year, a survey was conducted to figure out the number of email messages sent every day across the globe. It was estimated that approximately 294 billion messages per day are sent, which is 2.8 million messages per second. By the way, 90 percent of all email is either spam or viruses.

Each and every email message you send or receive has a piece of information in it called a message header. Every email message you receive in your inbox has this information. There are different ways to view this message header, depending on which email client you are using. For example, here are instructions for getting the message header of an email address if you are using Outlook 2010. The following figure shows how a message header looks.

This text information contains a lot of details about the message you have received. RFC 822 tells how to place information about an email message into this header.

For now, focus on the green box in the preceding figure. This section has information about how this message got to your inbox. For an email to come to your inbox, it takes so many routes. That is our main focus in today’s post: we want to get this data parsed out of this messy text and present it in a nice little table so that you can understand what really happened with that email message.

Whenever you try to make sense of an email message header, read it from the bottom up. The above piece of code has only four lines and is cut into different lines. Here is how it looks after removing the unwanted lines:

You see, it looks already a lot cleaner, after you remove those extra lines. Take a look at the boxes and circles in the image above, and read it like this:

Received the email from Server “Corp.red.com ([16.25.5.17])” by Server “Singapore.red.com ([15.60.22.16])” with protocol “mapi id 14.01.0323.002” and at time “Wed, 13 Jul 2011 18:50:16 +0800” , which is UTC +8 (which is Singapore), you can tell from the server which received the message it says “Singapore.red.com”

If that did not make much sense, it might help to visualize it like in the following figure.

Now, this server will send the message again to another server and so on. Each trip from server to server is called a hop. The hop chain continues until it finally reaches your inbox. At times, there might be a delay when going through one of these chains. Maybe a server was busy or there was too much load, which could cause delays to email being delivered to your inbox. In this example, we have four lines, so we have four hops for the email to reach your mailbox.

Enough of theory. Let’s talk about Windows PowerShell, starting with another figure.

We have to extract the piece of information between the above-mentioned sections to form our objects. You can see from the preceding screenshot that it has a pattern. Luckily, this is where regular expressions come to the rescue. Read this great article by PowerShell MVP Tome about regular expressions and Windows PowerShell.

We need to get four pieces of information from each line:

All text after Received: from until there is a word called by. This will be our Received From Server information.

All text after by until there is a word called with. This info will be the server who receives the email from the server above.

All text after with, until there is a character ; (a semicolon) and this is the protocol.

All text after ; (a semicolon) and get the next minimum 32 to 36 space/nonspace data.

This data is the date. Here is a sample date:

Wed, 20 Jul 2011 22:28:16 -0700. This is the maximum possible for standard time, so we can get other data as well. Sometimes, there might be space or there is new line, so I am giving myself a buffer, so later we can remove unwanted data from this string.

Can you believe, that the above regular expression pattern can do all four of the things I said above? If you are good at Windows PowerShell and still haven’t used regular expressions, you are missing an important weapon in your Windows PowerShell arsenal.

Note Check out this webcast by Tome. It is a great introduction to regular expressions.

Because we do not know how the text is going to be in the message header, it is good to read the whole data as one long string and work with it. Here is the technique to do read a file into one big string.

$text = [System.IO.File]::OpenText("C:\Scripts\msg6.txt").ReadToEnd()

His file now has the same information as the first screenshot in this post. I wanted to write a function that would take this $text as input, process the string, give out all the parsed data, package it in an array of PSObjects, and return them. I used Select-String along with the regular expression pattern and iterated through all the matches I got.

Each of those is matched into groups and then you can access them using the matches property. This is true except for the last one (in the world of regular expressions, “?:” means don’t group them). This is the class it will get stored in: Microsoft.PowerShell.Commands.MatchInfo.

I just loop through the matches and then build a PSObject for each of the matches. Now, if I output the results of the function to a gridview, I see what is shown in the following figure:

Read the next part tomorrow, where I show how I put the pieces together to get delay information from different hops and then finally to build a GUI tool for this functionality.

Thiyagu, this is an excellent article. Thank you for sharing your time with us and for sharing your expertise with the Windows PowerShell community. I am really looking forward to part 2 tomorrow!

thanks jrv for sharing that script, i think if you havent already uploaded it to the script repository, i may suggest you to upload there, it might help someone.

for you trim method, i posted a comment, but it isnt still showing up , may be there is a delay, but what i wanted to say was that trim will only trim on the edges and the captures what i have in my regex have `r and `n within/inside the text .

in the above example, it wont remove the `r`n which is between the 2 hello world, may be you can try to export the output before trim and after trim using notepad++ which will show the different new line and tab characters, just select View->Show Symbols->Show
all characters, since i need to remove the CrLf in between as well, i need to use the replace method for this.

thanks for poing out the power of regexps to parse text with only little structure!

But I was really stuck if I came to the regexp including the pattern ([sS]*?) … because s is the opposite of S by definition and the square brackets define a set of characters I really came to the conclusion that it is as good as using the dot wildcard instead of that! ( @Chris Warwick: you are right and thanks for detailing it! )

Your pattern works even though if it encounters the keywords "by" "with" and ";" in so far that it splits the string into 4 components which are usually the parts you wanted to extract from the header.

@jrv: aggreed upon! Loading the email as MAPI mailitem would result in objects where we could extract the desired components using dot notation to retrieve the properties! But as an example of using regexps to parse text … this is useful anyway!