Convert tick data to daily

https://stackoverflow.com/questions/22132978

19-10-2022
|

Question

I'd like to convert a csv file with tick data to daily prices and volume. the csv file I have is formatted as: unix,price,volume.

the groupby function has only gotten me to group by unix seconds. What is a good way to get daily close prices AND the sum of volume for each day?

Im working with python 2.7 and also have pandas installed, but im not very familiar with it yet.

really, the furthest I've got anything to work is this:

import pandas as pd

data = pd.read_csv('file.csv',names=['unix','price','vol'])

datagr = data.groupby('unix')
dataPrice = datagr['price'].last()
dataVol = datagr['vol'].sum()

Sample data:

1391067323,772.000000000000,0.020200000000
1391067323,772.000000000000,0.020000000000
1391067323,771.379000000000,1.389480000000
1391067323,772.000000000000,1.244540000000
1391067326,774.955000000000,0.084830600000
1391067326,774.955000000000,0.084833400000
1391067327,774.955000000000,0.084830600000
1391067331,774.953000000000,0.200000000000
1391067336,774.951000000000,0.101202000000

This retrieves the last price per unix second and sums the volume of trades that took place within the unix second. The problem is that it groups to the unix second, and I don't want to use any super convoluted method because of time considerations

Solution

You can convert unix time to pandas' datetime using to_datetime:

df['unix'] = pd.to_datetime(df['unix'], unit='s')

Now you can now set this as the index and resample:

df = df.set_index('unix')

df.resample('D', how={'volume': 'sum', 'price': 'last'})

Note: We're using different methods for the respective columns.

Example:

In [11]: df = pd.DataFrame(np.random.randn(5, 2), pd.date_range('2014-01-01', periods=5, freq='H'), columns=list('AB'))

In [12]: df
Out[12]:
                            A         B
2014-01-01 00:00:00 -1.185459 -0.854037
2014-01-01 01:00:00 -1.232376 -0.817346
2014-01-01 02:00:00  0.478683 -0.467169
2014-01-01 03:00:00 -0.407009  0.290612
2014-01-01 04:00:00  0.181207 -0.171356

In [13]: df.resample('D', how={'A': 'sum', 'B': 'last'})
Out[13]:
                   A         B
2014-01-01 -2.164955 -0.171356

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow