Pregunta

I have a number of strings that have different date formats in them. I would like to be able to extract the date from the string. For example:

  • Today is August 2012. Tomorrow isn't
  • Another day 12 August, another time
  • 12/08 is another format
  • have another ? 08/12/12 could be
  • finally august 12 would be

What I would expect to get from each of these results is 2012-08-01 00:00:00, 2013-08-12 00:00:00, 2013-08-12 00:00:00, 2012-08-12 00:00:00, 2013-08-12 00:00:00.

I currently have this code:

from dateutil import parser
print parser.parse("Today is August 2012. Tomorrow isn't",fuzzy=True)

You will see from this that the date prints as 2012-08-27 00:00:00 (because today is the 27th of the month). What I would want in this example is 2012-08-01 00:00:00.

How do I force it to always put the first of the month if a day is not given? (For example if I give August 2012 it should return 2012-08-01, if I give it 12 August 2012 it should return 2012-08-12.)

¿Fue útil?

Solución

Use the default argument to set the default date. This should handle all the cases except the third one, which is somewhat ambiguous and probably needs some parser tweaking or a mindreader:

In [15]: from datetime import datetime

In [16]: from dateutil import parser

In [17]: DEFAULT_DATE = datetime(2013,1,1)

In [18]: dates=["Today is August 2012. Tomorrow isn't",
    ...:        "Another day 12 August, another time",
    ...:        "12/08 is another format",
    ...:        "have another ? 08/12/12 could be", 
    ...:        "finally august 12 would be"]


In [19]: for date in dates:
    ...:     print parser.parse(date,fuzzy=True, default=DEFAULT_DATE)
    ...:     
2012-08-01 00:00:00
2013-08-12 00:00:00
2013-12-08 00:00:00  # wrong
2012-08-12 00:00:00
2013-08-12 00:00:00
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top