Question

I would like to use pandas dataframes to create a two-dimensional table. The table should associate two values alpha and epsilon with a third value. alpha and epsilon come from a variable range, like:

alphaRange = numpy.arange(0.01, 0.26, 0.01)
epsilonRange = numpy.arange(0.01, 0.11, 0.01)

(The goal is to find out which combination of alpha and epsilon leads to the highest values, or more generally, find a correlation between parameters and values.)

What is the best way to construct such a dataframe and later fill it with values?

Was it helpful?

Solution

It might be easier to use NumPy to compute the values first, and then load the result into a DataFrame:

import numpy as np
import pandas as pd
alphaRange = np.arange(0.01, 0.26, 0.01)
epsilonRange = np.arange(0.01, 0.11, 0.01)
X, Y = np.meshgrid(alphaRange, epsilonRange)
vals = X+Y
print(vals.shape)
df = pd.DataFrame(vals, index=epsilonRange, columns=alphaRange)
print(df)

Edit: PaulH is right -- floats do not make good column or index labels, since they could be hard to reference properly. (Checking floats for equality brings up float-representation issues.) So it would be better to make alpha and epsilon DataFrame columns:

df = pd.DataFrame({'vals':vals.ravel()},
                  index=pd.MultiIndex.from_product([alphaRange, epsilonRange],
                                                   names=['alpha', 'epsilon']))
df.reset_index(inplace=True)
print(df.head())

yields

   alpha  epsilon  vals
0   0.01     0.01  0.02
1   0.01     0.02  0.03
2   0.01     0.03  0.04
3   0.01     0.04  0.05
4   0.01     0.05  0.06

[5 rows x 3 columns]

pd.MultiIndex.from_product was added in pandas 0.13.1. For earlier versions of pandas, you could use:

def from_product(iterables, sortorder=None, names=None):
    from pandas.tools.util import cartesian_product
    product = cartesian_product(iterables)
    return pd.MultiIndex.from_arrays(product, sortorder=sortorder,
                                  names=names)

df = pd.DataFrame({'vals':vals.ravel()},
                  index=from_product([alphaRange, epsilonRange],
                                     names=['alpha', 'epsilon']))
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top