Does this solves your problem?
df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 3,
'B' : np.random.randn(12)})
bins = {'one': (-10,-1,0,1,10), 'two':(-100,0,100), 'three':(-999,0,1,2,3)}
def func(row):
return pd.cut([row['B']], bins=bins[row['A']])[0]
df['C'] = df.apply(func, axis=1)
This returns a DataFrame:
A B C
0 one 1.440957 (1, 10]
1 one 0.394580 (0, 1]
2 two -0.039619 (-100, 0]
3 three -0.500325 (-999, 0]
4 one 0.497256 (0, 1]
5 one 0.342222 (0, 1]
6 two -0.968390 (-100, 0]
7 three -0.772321 (-999, 0]
8 one 0.803178 (0, 1]
9 one 0.201513 (0, 1]
10 two 1.178546 (0, 100]
11 three -0.149662 (-999, 0]
Faster version of binnize:
def binize2(df):
df['C'] = ''
for key, values in bins.items():
mask = df['A'] == key
df.loc[mask, 'C'] = pd.cut(df.loc[mask, 'B'], bins=values)
%%timeit
df3 = binnize(df1)
10 loops, best of 3: 56.2 ms per loop
%%timeit
binize2(df2)
100 loops, best of 3: 6.64 ms per loop
This is probably due to the fact that it changes the DataFrame inplace and doesn't create a new one.