Question

I melted a pandas dataframe for plotting use with ggplot (which often requires long form of dataframes), as follows:

test = pandas.melt(iris, id_vars=["Name"], value_vars=["SepalLength", "SepalWidth"])

This keeps the Name field of the iris dataset in the index, but transforms the columns SepalLength and SepalWidth into long form:

test.ix[0:10]
Out:
           Name     variable  value
0   Iris-setosa  SepalLength    5.1
1   Iris-setosa  SepalLength    4.9
2   Iris-setosa  SepalLength    4.7
3   Iris-setosa  SepalLength    4.6
4   Iris-setosa  SepalLength    5.0
5   Iris-setosa  SepalLength    5.4
6   Iris-setosa  SepalLength    4.6
7   Iris-setosa  SepalLength    5.0
8   Iris-setosa  SepalLength    4.4
9   Iris-setosa  SepalLength    4.9
10  Iris-setosa  SepalLength    5.4

How can I "unmelt" this dataframe back? I want the Name column to be kept, but the values of variable field to be transformed into separate columns. The Name field is not unique, so I don't think it can be used as an index. My impression was that pivot is the right function to do this but it is not right:

test.pivot(columns="variable", values="value")
KeyError: u'no item named '

How could I do this? Also, could I unmelt dataframes where there are multiple columns that are in long form, i.e. multiple columns in test that are like the variable column above? It would mean that the columns will have to accept a list of columns, not a single value, it seems. thanks.

Was it helpful?

Solution

I think this situation is ambiguous since the test dataframe doesn't have an index that identifies each unique row. If melt simply stacked the rows with value_vars SepalLength and SepalWidth, then you can manually create an index to pivot on; and it looks like the result ends up the same as the original:

In [15]: test['index'] = range(len(test) / 2) * 2
In [16]: test[:10]
Out[16]: 
          Name     variable  value  index
0  Iris-setosa  SepalLength    5.1      0
1  Iris-setosa  SepalLength    4.9      1
2  Iris-setosa  SepalLength    4.7      2
3  Iris-setosa  SepalLength    4.6      3
4  Iris-setosa  SepalLength    5.0      4
5  Iris-setosa  SepalLength    5.4      5
6  Iris-setosa  SepalLength    4.6      6
7  Iris-setosa  SepalLength    5.0      7
8  Iris-setosa  SepalLength    4.4      8
9  Iris-setosa  SepalLength    4.9      9

In [17]: test[-10:]
Out[17]: 
               Name    variable  value  index
290  Iris-virginica  SepalWidth    3.1    140
291  Iris-virginica  SepalWidth    3.1    141
292  Iris-virginica  SepalWidth    2.7    142
293  Iris-virginica  SepalWidth    3.2    143
294  Iris-virginica  SepalWidth    3.3    144
295  Iris-virginica  SepalWidth    3.0    145
296  Iris-virginica  SepalWidth    2.5    146
297  Iris-virginica  SepalWidth    3.0    147
298  Iris-virginica  SepalWidth    3.4    148
299  Iris-virginica  SepalWidth    3.0    149

In [18]: df = test.pivot(index='index', columns='variable', values='value')
In [19]: df['Name'] = test['Name']
In [20]: df[:10]
Out[20]: 
variable  SepalLength  SepalWidth         Name
index                                         
0                 5.1         3.5  Iris-setosa
1                 4.9         3.0  Iris-setosa
2                 4.7         3.2  Iris-setosa
3                 4.6         3.1  Iris-setosa
4                 5.0         3.6  Iris-setosa
5                 5.4         3.9  Iris-setosa
6                 4.6         3.4  Iris-setosa
7                 5.0         3.4  Iris-setosa
8                 4.4         2.9  Iris-setosa
9                 4.9         3.1  Iris-setosa

In [21]: (iris[["SepalLength", "SepalWidth", "Name"]] == df[["SepalLength", "SepalWidth", "Name"]]).all()
Out[21]: 
SepalLength    True
SepalWidth     True
Name           True
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top