Question

I have several classes that all reference the same pandas dataframe, but only part of the data frame is relevant to each class. I also want to make it easy to access the relevant rows without using the advanced indexing as it gets repetitive due to number if levels in the index. As a result, I wrote code that generates partial functions so that each class can view its slice.

from functools import partial
import pandas as pd
import numpy as np
import dateutil.relativedelta as rd
import datetime as dt

class baz(object):
    pass

groups = ['foo', 'foo', 'bar', 'bar']
items = ['x','y', 'x', 'y']
diff = rd.relativedelta(years=1)

dates = [dt.date(2013,1,1) + (diff * shift) for shift in xrange(4)] * 2
index = pd.MultiIndex.from_arrays([groups, items], names=['groups', 'items'])
values = np.random.randn(4,8)

data = pd.DataFrame(values, index=index, columns=dates)

def view_data(group, item):
    return data.ix[group, item]

foo = baz()
bar = baz()

# I use partial because I want lazy evaluation
foo.x = partial(view_data, 'foo', 'x')
foo.y = partial(view_data, 'foo', 'y')
bar.x = partial(view_data, 'bar', 'x')
bar.y = partial(view_data, 'bar', 'y')

foo.x()

However, I would prefer if the reference did not have to look like foo.x()[date] but could instead look like foo.x[date].

As a result, I created a decorator that would wrap the function and return the value.

def execute_func(func):
    def inner(*args, **kwargs):
        return func(*args, **kwargs)
    return inner()

foo.x = execute_func(partial(view_data, 'foo', 'x'))
foo.y = execute_func(partial(view_data, 'foo', 'y'))
bar.x = execute_func(partial(view_data, 'bar', 'x'))
bar.y = execute_func(partial(view_data, 'bar', 'y'))

My concern is that I will not always get the current state of the dataframe.

Is this the right way to go about achieving my goal?

Was it helpful?

Solution

Well I personally would suggest you wrap your DataFrame in an object like so:

class MyDataFrameView(object):

    def __init__(self, df):
        self.data = df

    def x(self):
        return self.data.ix['foo', 'x']

    def y(self):
        return self.data.ix['bar', 'y']

you use it like so:

df = MyDataFrameView(data)
df.x()

you can go further and add the methods as properties if it makes more sense intuitively.

@property
def y(self):
    return self.data.ix['bar', 'y']

it is essentially doing the same thing like you do now, but it's more straightforward object oriented programming and - at least in my opinion - a lot better to understand.

you can always access your dataframe like so:

df.data

or then, you could implement more pandas methods directly on your View object, for example:

@property
def ix(self):
    return self.data.ix

def __getitem__(self, key):
    return self.data.__getitem__(key)

so you object behaves more like a DataFrame.

note that is not really "dynamic". If you want a truly dynamic way, you could use the getattr method to implement that as well

def __getattr__(self, attr):
   #code that "routes" to do the right thing given attr

This pattern is generally called composition, and my favorite way of implementing your "problem"

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top