timelib
Well, date_parse is performing very very well and it was very educational to learn why. PHP function date_parse is a part of ext/date/lib or timelib, and apparently (despite lack of proper documentation) its implementation in C (written by Derick Rethans and called from the Zend Engine macros part with declarations) makes it a clever tool:
- date_parse is already fuzzy: there are a lot of warnings (and complains) on the documentation page that function tolerates and parses too much but obviously it is actually a feature and not a bug (otherwise one should use date_parse_from_format or respective DateTime::createFromFormat())
- date_parse uses (a lot of) regular expressions in a relatively smart way (based on re2c)
- In addition to filtering this "scanner" looks for all possible combinations of words and date formats (from the list of known months and timezones), and, finally, just makes a "blindly" guess by looking for YYYY, MM and DD "separately" (very similar to what I need to do).
- date_parse is a true compiled "scanner" that comes with look-ahead logic and error reporting that can be handled further by user (no exceptions, just messages inside the nested array of results).
- There is even a python package wrapping the C code of timelib (so I am even not sure which is ultimately better in "parsing the monkey business" timelib or python-dateutil)
testing and examples
From my part, I have failed to find any input example from my dataset that was not parsed by date_parse, i.e.:
echo FuzzyDateParser::fromText('banana 1/2/3');
echo FuzzyDateParser::fromText('Joe Soap was born on 12 February 1981'));
echo FuzzyDateParser::fromText('2005 Feb., reprint'));
echo FuzzyDateParser::fromText('!'); # will fail to parse, producing an empty string.
echo FuzzyDateParser::fromText('monkey 2010-07-10 loves bananas and php');
The code for FuzzyDateParser class can be found in this gist. It can be useful as a template to handle errors and implement a fallback from date_parse results to own custom logic (which I eventually did not have to do for my case).