Pergunta

I have a large data set with a variety of Date information in the following formats:

I am familiar with python's time module, strptime() method, and strftime () method. However, I am not sure what these date formats above are called on if there is a python module I can use to convert these unusual date formats.

Any idea how to get the %Y%M%D format from these unusual date formats without writing my own calculator?

Thanks.

Foi útil?

Solução

You can try something like the following:

In [1]: import datetime

In [2]: s = '2012265'

In [3]: datetime.datetime.strptime(s, '%Y%j')
Out[3]: datetime.datetime(2012, 9, 21, 0, 0)

In [4]: d = '41213'

In [5]: datetime.date(1900, 1, 1) + datetime.timedelta(int(d))
Out[5]: datetime.date(2012, 11, 2)

The first one is the trickier one, but it uses the %j parameter to interpret the day of the year you provide (after a four-digit year, represented by %Y). The second one is simply the number of days since January 1, 1900.

This is the general conversion - not sure of your input format but hopefully this can be tweaked to suit it.

Outras dicas

On the Excel integer to Python datetime bit:

Note that there are two Excel date systems (one 1-Jan-1900 based and another 1-Jan 1904 based); see https://support.microsoft.com/en-us/help/214330/differences-between-the-1900-and-the-1904-date-system-in-excel for more information.

Also note that the system is NOT zero-based. So, in the 1900 system, 1-Jan-1900 is day 1 (not day 0).

import datetime

EXCEL_DATE_SYSTEM_PC=1900
EXCEL_DATE_SYSTEM_MAC=1904

i = 42129  # Excel number for 5-May-2015
d = datetime.date(EXCEL_DATE_SYSTEM_PC, 1, 1) + datetime.timedelta(i-2)

Both of these formats seems pretty straightforward to work with. The first one, in fact, is just an integer, so why don't you just do something like this?

import datetime

def days_since_jan_1_1900_to_datetime(d):
    return datetime.datetime(1900,1,1) + \
        datetime.timedelta(days=d)

For the second one, the details depend on exactly how the format is defined (e.g. can you always expect 3 digits after the year even when the number of days is less than 100, or is it possible that there are 2 or 1 – and if so, is the year always 4 digits?) but once you've got that part down it can be done very similarly.

According to http://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior , day of the year is "%j", whereas the first case can be solved by toordinal() and fromordinal(): date.fromordinal(date(1900, 1, 1).toordinal() + x)

I'd think timedelta.

import datetime
d = datetime.timedelta(days=41213)
start = datetime.datetime(year=1900, month=1, day=1)
the_date = start + d

For the second one, you can 2012265[:4] to get the year and use the same method.

edit: See the answer with %j for the second.

from datetime import datetime 

df(['timeelapsed'])=(pd.to_datetime(df['timeelapsed'], format='%H:%M:%S') - datetime(1900, 1, 1)).dt.total_seconds()
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top