This was a program I have wanted for my own use for quite some time. So I encouraged Carl to code it and pestered him the whole way with ideas and requests, and offered some occasional programming advice.

Intro:

LineByter is a utility designed to find and extract patterns from text files.

It's a brand new (free) program coded by DonationCoder member Carl Danley (CodeByter) and released today.

It includes some unique features like duplicate removal, the ability to specify multiple match and reject patterns, and the ability to save and load profiles, that make it ideal for doing repeated things like extracting emails or urls from text files.

Motivation for the program

When we send out the DonationCoder mailing list, a certain number of the newsletter emails bounce back each month as undeliverable. I use phplist to manage the web mailing list but lately what I've been doing is exporting these bounced emails from my email program and running an email extraction utility on the exported email to get a list of email addresses from these emails, and then feeding them into a script that turns off email notification for those users on the forum whose emails can be found.

In the past i've been using a now-discontinued utility designed specifically to extract emails. But it's less than ideal. It's a big clunky, it sometimes finds things that aren't emails, and sometimes misses real emails. It also has a bad user interface and doesn't remove duplicates. After i would run this utility i would bring the output file into a text editor, sort and remove duplicates, and then go through and remove certain emails, like those that are really donationcoder.com addresses and a a few known fake email address patterns that seem to show up regularly.

SO that's why I have been wanting for a while a little utility that is better at extracting emails and doing some of the things automatically that i have been doing manually. Of course I could have written a little perl or python script for it, but i am a big fan of custom gui tools for such things.

LineByter is the program that emerged from my discussions with Carl about this idea. It's actually a much more general purpose program that can extract and reject all kinds of regular expression patterns, BUT it's also designed to be really easy to use and is focused specifically enough on the general workflow that i described above so that it's a real joy to use for this kind of stuff.

Features

Some key features of the program:

You can drag and drop as many files to scan as you want.

Nice progress bar so you can see how much more time it's going to take.

Supports preset library of regular expressions so you can easily just select common patterns and add your own presets -- this is super important for letting you quickly reuse patterns and makes it suitable even for those who don't understand regular expression syntax.

Lets you specify a list of multiple patterns that are being searched for and how to extract the data you want from these patterns.

Lets you specify a list of additional patterns which should be rejected even if they match the first list (ideal if you want to find and extract all email patterns except those with certain properties).

Shows a nice complete report of why each pattern was found and/or rejected.

Automatically removes duplicates.

Produces a final list of results in text form that can be copied to clipboard or saved to file.

Can save and load profiles so you can reuse configuration settings for common jobs you perform.

Screenshots

This screenshot shows the Patterns tab. It may look overwhelming but basically i've just selected and added two presets from the drop down list at top to tell it to search for all email and url patterns and extract them. And i've added some conditions for rejection, so that all .com and .org results will be excluded, and all results that are less than 5 characters will be rejected as well.

This screenshot shows the "Chomping Grid" which basically gives you a full report of what is found and where, and why it was rejected if it was. Note that it uses the "labels" associated with the regular expression patterns on the previous screen, and shows you when duplicates are found. You can also sort by any of these columns. A planned feature for a future version will let you quickly jump to and investigate the found pattern.

This screenshot shows the final tab collecting all of the matches that weren't rejected. It's updated live like the other tabs (don't be concerned that my screenshot is showing email addresses -- this is from a scan of my spam folder so these are all spam addresses). The patterns extracted are the first "capture group" from the regular expression matched.

Summary

This program far exceeded my expectations and desires. It is more generally useful than the program i initially wanted, but the simple interface and the use of presets and saved profiles means that i can use it to extract emails from a file with just one or two clicks, and once i save a profile with the patterns i want matched and rejected, i never have to mess with those settings again.

For those who are looking for a completely general purpose regular expression searching tool, this program is not for you -- there are more powerful and flexible grep tools available. This program is much more focused at people who perform repeated extraction of data from files, using some common patterns that they have use for again and again. If you need such a program, this is a miracle tool and a joy to use.

i'm feeling honored, that CodeByter has selected me as a beta tester for LineByter.

I think i evolved as the "nightmare on teststreet" for him.

I did everything, what i must not do with his poor app. Throwing in binary files, using german umlauts, blah, blah and came up with suggestions and wishes all the time. Most of my wishes were implemented in this first final release, some are on his todo list for V 2.0.

The +1 points for this app from my point of view are:

Fast, faster, fastest, lightspeed, LineByter!

Portable! After installing you can copy the Exe to your thumbdrive and carry your search-buddy with you.

Compact! With only a small footprint.

The Match-Regex/Reject-Regex system to narrow the findings and the capability to add mutliple regexes at once for both are, as far as i can tell, a unique feature, that makes LineByter outstanding from most other Regex-Searchtools

Userfriendly! A simple, functional, clear and easy to use GUI. Only the regexes are still the cryptic beasts they are since they were invented. But for that you can't blame LineByter of course.

And last (only important for me) ACCESSIBLE! Also for visually impaired persons like me. Without unneccessary eyecandy and coloring whistles and bells.

Well done, CodeByter. I like it very much. As i use it at office too, the license fee is well invested and on the way.

I just want to say that writing this program was alot of fun and a big learning experience for me. I found it very addicting to follow through with this software from start to finish and then sit back and watch as people download/view/purchase LineByter. It was alot of work to create this utility and programming was only half of it! I hope that any of you newer programmers out there have the same opportunity that I did to learn as much as I did. When I started LineByter last week, I knew nothing about Regex (Regular Expressions) and now I fully understand how they can be used and how powerful they can be! It's truly quite amazing to sit back and watch my software scan 2,000+ lines a second as I realize that no human could ever compete with this! Anyways, long story short I will be following through with this software as people come up with more ideas/suggestions! Please post comments, suggestions and anything else you can to help me better this piece of software! It is always nice to hear positive remarks about something you created!

I launched the application yesterday and won this Award earlier today. It is nothing too significant but it might help you in your choice to choose LineByter.

The review seems to claim that the program can extract emails that contain one or more regex patterns. But it looks like the program just extracts single lines in a file. There used to be an old dos grep program that would extract emails within which are found certain patterns of text. This can be very useful, when working with a lot of email or message folders and a email program with a weak search function. Example: find all emails that mention the topic, vitamens. Collect each email as a COMPLETE email, not just a line within an email. Then one could read what each email says about the topic.

Any way to do this with lineBytes? Its very title suggests skepticism, but perhaps I ask for a task the author never had in mind. Still there are so many programs out there, email, news, treepads -- that collect text in some kind of note bundles - that this would be a generally useful function.

linebyter definitely cannot do what you want (currently), but it might be nice if it could be expanded to do something like that at some point, maybe by looking at a larger context. currently thought its only matching patterns one line at a time in total isolation from all other lines.

Mailbag Assistant appears to be shareware costing about $30. There are a lot of shareware message readers still out there from the days of bulletin boards. They very considerably in quality and features. I used to use an old dos program called Readmail to read email messages. It allows you to define the hearder format of a list of messages. Buts its search facility was very limited, and it was buggy.

I may have to write one of these myself in for instance, awk, which should not be too difficult.

Well, good news and bad news... Due to a recent robbery, my laptop was stolen and the source for LineByter now only exists in my head. I will be redoing LineByter very soon, so if you'd like you could perhaps write something up for me with the specifics and let me know since I will be redoing the application anyways I might be able to incorporate this myself if it's something you'd be willing to wait for

LineByter has been very useful in scanning and filtering the error logs from my Visual Studio builds. I'm overdue in paying for it, so I'll be sure to do that soon. I'm looking forward to the new version.