Included in the extension is a LinkedIn Parsing script that no longer works because of LinkedIn changing their format of newsletters. I will add that script here for anyone to read and decipher. Hopefully from its content we can do more generalized or specific parsing scripts. For example I would like to make a parsing script that strip out certain string segments for all emails being processed but also strip out specific string segments only from certain senders.

I will attempt to develop a simple parsing script(after posting the LinkedIn script) to accomplish this and any assistance or ideas are welcomed from this community of users for this awesome extension.

It is important to note that this parsing script is designed to take email content with tables and split into separate articles so much of it will not be needed for my outlined purpose.

Others: $email: the current email object being imported $params: the parameters associated with it

Any change made here will be saved in the resulting article.If you want to skip the default saving routine (if you save theparsed info while in this file) just add the following line:

$continue = TRUE;

If a singe e-mail message is parsed into several articles, fill inthe $articles array ('title'=>..., 'content'=>...) and do not set$continue = TRUE;Each article will be saved separately according to the currentemail object settings.

Here is my first go at it, the first pattern works, the second pattern doesn't and I haven't gotten the third css script through for test yet, any help or suggestions would be much appreciated! I've removed the lines from top and bottom of script for testing of content lengths on entry and on exit...echo 'startcontentlength'.strlen($content);echo 'endcontentlength'.strlen($content);

The first pattern works so that leads me to believe I have done something wrong in the reg expression pattern for the second and the rest of the process is sound, if not let me know.

After a lot of struggling with getting my PHP regex to recognize the same patterns I was getting from this online regex tester, I finally found a pattern to include all 128 ascii characters which makes life so much easier. That pattern is a Hexadecimal reference ([\x00-\x7F]+), this includes carriage returns, semi-colons and anything else someone can include in an email.

TIPS

My PHP version: 5.3.28

Some things to notice in the script, some start and end with / and some with |, I have been unable to discern any difference in my testing but let me know if there is a difference and if its useful.

the "i" after ending delimiter(/ or |) will allow the pattern recognition to proceed without case sensitivity

on pattern 3, specific text does not need to be in brackets or in pattern 6 you can see it can, useful distinction for more complex preg_replace usage

Replacements array all set to strip out any pattern match but if you'd like to replace the text with something match up the array index key

Easiest way to test on a remote server is to modify your post by mail cron script to output and append to a file that you can refresh easily from sFTP connection, this is where your parsing script will show errors or any testing variable echos or var_dumps

Here is my latest test file, I sent this from several email accounts because gmail strips out somethings and leaves others while other webmails encode or strip out different things, so testing from more email accounts the better.

Ive found that the preg_replace regex patterns are sound although some unpredicatable results happen according the order in which they are implemented combined with Joomla code cleanup so to help better understand what occurs while the parsing script run without needing to add a bunch of testing variables to be fed to the cron log I've just added a replacement code for each.

Alter your preg_replace line as below, this will replace your patterns with strings like r3 or r25 according to index for easy troubleshooting.I've also since added a line to remove the annoying "Fwd:" in subject titles if they have been left in by accident, str_ireplace() function is case insensitive.

I will be setting up a separate array to remove(or alter) strings that are static using str_ireplace() because preg_replace() is probably a bit overkill and this will also assist in understanding the order in which the $content variable is changed if we do exact string matches and replacements before doing regular expression identified matches.