سؤال

Trying to compute a range (confidence interval) to return two values in lambda mapped over a column.

M=12.4; n=10; T=1.3
dt =  pd.DataFrame( { 'vc' : np.random.randn(10) } )    
ci = lambda c : M + np.asarray( -c*T/np.sqrt(n) , c*T/np.sqrt(n) )
dt['ci'] = dt['vc'].map( ci )
print '\n confidence interval ', dt['ci'][:,1]

..er , so how does this get done?

then, how to unpack the tuple in a lambda? (I want to check if the range >0, ie contains the mean) neither of the following work:

appnd = lambda c2: c2[0]*c2[1] > 0 and 1 or 0
app2 = lambda x,y: x*y >0 and 1 or 0
dt[cnt] = dt['ci'].map(app2)
هل كانت مفيدة؟

المحلول

It's probably easier to see by defining a proper function for the CI, rather than a lambda.

As far as the unpacking goes, maybe you could let the function take an argument for whether to add or subtract, and then apply it twice.

You should also calculate the mean and size in the function, instead of assigning them ahead of time.

In [40]: def ci(arr, op, t=2.0):
            M = arr.mean()
            n = len(arr)
            rhs = arr * t / np.sqrt(n)
            return np.array(op(M, rhs))

You can import the add and sub functions from operator

From there it's just a one liner:

In [47]: pd.concat([dt.apply(ci, axis=1, op=x) for x in [sub, add]], axis=1)
Out[47]: 
         vc        vc
0 -0.374189  1.122568
1  0.217528 -0.652584
2 -0.636278  1.908835
3 -1.132730  3.398191
4  0.945839 -2.837518
5 -0.053275  0.159826
6 -0.031626  0.094879
7  0.931007 -2.793022
8 -1.016031  3.048093
9  0.051007 -0.153022

[10 rows x 2 columns]

I'd recommend breaking that into a few steps for clarity. Get the minus one with r1 = dt.apply(ci, axis=1, op=sub), and the plus with r2 = dt.apply(ci, axis=1, op=add). Combine with pd.concat([r1, r2], axis=1)

Basically, it's hard to tell from dt.apply what the output should look like, just seeing some tuples. By applying separately, we get two 10 x 1 arrays.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top