Question

I've read the answers in How to deal with this Pandas warning? but I can't figure out if I should ignore the SettingWithCopyWarning warning or if I'm doing something really wrong.

I have this function that resamples some data to a specific time frame (1h for instance) and then fills the NaN values accordingly.

def resample_data(raw_data, time_frame):
    # resamples the ticker data in ohlc
    ohlc_dict = {
        'open': 'first',
        'high': 'max',
        'low': 'min',
        'close': 'last',
        'price': 'mean'
    }

    volume_dict = {'volume': 'sum', 'volume_quote': 'sum'}

    resampled_data = raw_data.resample(time_frame, how={'price': ohlc_dict, 'amount': volume_dict})
    resampled_data['amount'] = resampled_data['amount']['volume'].fillna(0.0)
    resampled_data['amount']['volume_quote'] = resampled_data['amount']['volume']
    resampled_data['price']['close'] = resampled_data['price']['close'].fillna(method='pad')
    resampled_data['price']['open'] = resampled_data['price']['open'].fillna(resampled_data['price']['close'])
    resampled_data['price']['high'] = resampled_data['price']['high'].fillna(resampled_data['price']['close'])
    resampled_data['price']['low'] = resampled_data['price']['low'].fillna(resampled_data['price']['close'])
    resampled_data['price']['price'] = resampled_data['price']['price'].fillna(resampled_data['price']['close'])

    # ugly hack to remove multi index, must be better way
    output_data = resampled_data['price']
    output_data['volume'] = resampled_data['amount']['volume']
    output_data['volume_quote'] = resampled_data['amount']['volume_quote']

    return output_data

Is this the right way to do it and should I ignore the warning?

Edit: If I try to use .loc as sugested in the warning:

resampled_data = raw_data.resample(time_frame, how={'price': ohlc_dict, 'amount': volume_dict})
resampled_data.loc['amount'] = resampled_data['amount']['volume'].fillna(0.0)
resampled_data.loc['amount']['volume_quote'] = resampled_data['amount']['volume']
resampled_data.loc['price']['close'] = resampled_data['price']['close'].fillna(method='pad')
resampled_data.loc['price']['open'] = resampled_data['price']['open'].fillna(resampled_data['price']['close'])
resampled_data.loc['price']['high'] = resampled_data['price']['high'].fillna(resampled_data['price']['close'])
resampled_data.loc['price']['low'] = resampled_data['price']['low'].fillna(resampled_data['price']['close'])
resampled_data.loc['price']['price'] = resampled_data['price']['price'].fillna(resampled_data['price']['close'])

I get the following error refering to line resampled_data.loc['price']['close'] = resampled_data['price']['close'].fillna(method='pad')

KeyError: 'the label [price] is not in the [index]'

Was it helpful?

Solution

As Jeff points out, since this is a MulitIndex column you should use a tuple to access it:

resampled_data['price']['close']

resampled_data[('price', 'close')]
resampled_data.loc[:, ('price', 'close')]  # equivalent

This also disaembiguates it from take the column and the row:

resampled_data.loc['close', 'price']

(which is what pandas was trying to do when it gave the KeyError.)

You'll usually see the SettingWithCopy warning if you use consecutive [] in your code, and the are best combined into one [] e.g. using loc:

resampled_data.loc['price']['close'] = ... # this *may* set to a copy

If you do set to a copy (sometime the above may actually not be a copy, but pandas makes no guarantee here), the copy will correctly updated but then immediately garbage collected.

Aside: as mentioned in comments resample offers how='ohlc', so you may be best of doing this, padding, filling and then joining with the resampled volumes.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top