سؤال

I am working with timestamp dataset. I have to calculate consecutive difference(of timestamp) of the observations.Timestamp are of datetime64[ns] type. dfnew is pandas dataframe.

    dfnew['timestamp'] = dfnew['timestamp'].astype('datetime64[ns]')
    dfnew['dates']=dfnew['timestamp'].map(Timestamp.date)
    uniqueDates=list(set(dfnew['dates']))#unique values of date in a list
    #making a numpy array of timestamp for a particular date
    x = np.array(dfnew[dfnew['dates']==uniqueDates[0]]['timestamp'])
    y = np.ediff1d(x) #calculating consecutive difference of timestamp
    print max(y)
    49573580000000 nanoseconds
    print min(y)
    -86391523000000 nanoseconds

    print y[1:20]
    [ 92210000000 388030000000            0 211607000000 249337000000
      19283000000  91407000000 120180000000 240050000000  30406000000
                0 480337000000     13000000 491424000000            0
      80980000000 388103000000  88850000000 120333000000]
    dfnew['timestamp][0:20]
    0    2013-12-19 09:03:21.223000
    1    2013-12-19 11:34:23.037000
    2    2013-12-19 11:34:23.050000
    3    2013-12-19 11:34:23.067000
    4    2013-12-19 11:34:23.067000
    5    2013-12-19 11:34:23.067000
    6    2013-12-19 11:34:23.067000
    7    2013-12-19 11:34:23.067000
    8    2013-12-19 11:34:23.067000
    9    2013-12-19 11:34:23.080000
    10   2013-12-19 11:34:23.080000
    11   2013-12-19 11:34:23.080000
    12   2013-12-19 11:34:23.080000
    13   2013-12-19 11:34:23.080000
    14   2013-12-19 11:34:23.080000
    15   2013-12-19 11:34:23.097000
    16   2013-12-19 11:34:23.097000
    17   2013-12-19 11:34:23.097000
    18   2013-12-19 11:34:23.097000
    19   2013-12-19 11:34:23.097000
    Name: timestamp 

Is there any way I can get the output in hour rather than nanoseconds. I can convert it using normal division but I am looking for other alternative. Also when I am saving this into txt file 'nanoseconds' term is also there. How could I remove this unit from saving into txt file I just want to save the number. Any help appreciated

هل كانت مفيدة؟

المحلول

Try Series.diff():

import pandas as pd
import io

txt = """0    2013-12-19 09:03:21.223000
1    2013-12-19 11:34:23.037000
2    2013-12-19 11:34:23.050000
3    2013-12-19 11:34:23.067000
4    2013-12-19 11:34:23.067000
5    2013-12-19 11:34:23.067000
6    2013-12-19 11:34:23.067000
7    2013-12-19 11:34:23.067000
8    2013-12-19 11:34:23.067000
9    2013-12-19 11:34:23.080000
10   2013-12-19 11:34:23.080000
11   2013-12-19 11:34:23.080000
12   2013-12-19 11:34:23.080000
13   2013-12-19 11:34:23.080000
14   2013-12-19 11:34:23.080000
15   2013-12-19 11:34:23.097000
16   2013-12-19 11:34:23.097000
17   2013-12-19 11:34:23.097000
18   2013-12-19 11:34:23.097000
19   2013-12-19 11:34:23.097000
"""

s = pd.read_csv(io.BytesIO(txt), delim_whitespace=True, parse_dates=[[1,2]], header=None, index_col=1, squeeze=True)

s.diff()

result:

0                NaT
1    02:31:01.814000
2    00:00:00.013000
3    00:00:00.017000
4           00:00:00
5           00:00:00
6           00:00:00
7           00:00:00
8           00:00:00
9    00:00:00.013000
10          00:00:00
11          00:00:00
12          00:00:00
13          00:00:00
14          00:00:00
15   00:00:00.017000
16          00:00:00
17          00:00:00
18          00:00:00
19          00:00:00
Name: 1_2, dtype: timedelta64[ns]
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top