cormanaz has asked for the
wisdom of the Perl Monks concerning the following question:

Howdy bros. I have an application where I need to search through unstructured text and output anything that looks like a date. Is there an existing Perl solution for this? It seems like I have seen one in the past, but after about 20 min of searching I can't locate anything.

If there's not a solution does anyone have advice about how to approach the task? I guess a person could write a regex for it, but it would be pretty hairy given all the different ways a date could be expressed.

Date::Manip is good at parsing arbitrarily formatted dates, if its performance is good enough for your data volumes.

But you should think hard about whether you trust the results, whichever parser you use to generate them. We once had a daily business report which was loaded by our users into Excel, which kindly parsed the value in some field, let us say "MAR6", as a date, where it actually represented something else altogether.

If not, then I would start by training a Bayesian classifier, eg: Algorithm::NaiveBayes to find the bits of text, and then using them as examples to write regular expressions from.

Actually, this would be a bad idea for at least two reasons: First, you have to segment the text before determining whether or not the segments are dates. Second, you have to have labeled data to train a classifier. A better approach would be to look through your data by hand and generalize to create a set of regular expressions (or, more generally, date-identifying functions). Once you have some of these, run them on more of your data, and refine them to include dates that they missed, and to exclude non-dates that they picked up. Keep doing this until you get the performance you need.