Frage

I have purchase data as csv.

|    Name    |     Sex     |     Week
|------------|-------------|--------------
|   Apple    |      F      |     Mon
|   Orange   |      F      |     Tue
|   Apple    |      M      |     Fri        ...
|   Grape    |      M      |     Mon

and I want converted csv...

| Name:Apple | Name:Orange | Name:Grape | Sex:F | Sex:M | Week:Mon | Week:Tue |
|     1      |      0      |     0      |   1   |   0   |    1     |    0     |
|     0      |      1      |     0      |   1   |   0   |    0     |    1     | ...
|     1      |      0      |     0      |   0   |   1   |    0     |    0     |
|     0      |      0      |     1      |   0   |   1   |    1     |    0     |

R or Python have any good convert method? Thanks.

War es hilfreich?

Lösung

Here's one way to do this in R using the "reshape2" package. You'll have to rearrange the order of the columns in the output.

Assuming your data.frame is called "mydf":

library(reshape2)
x <- melt(as.matrix(mydf))
dcast(x, Var1 ~ value, fun.aggregate = length, value.var="value")
#   Var1 Apple F Fri Grape M Mon Orange Tue
# 1    1     1 1   0     0 0   1      0   0
# 2    2     0 1   0     0 0   0      1   1
# 3    3     1 0   1     0 1   0      0   0
# 4    4     0 0   0     1 1   1      0   0

I haven't used python or pandas before, but there is a get_dummies function that should do what you want.

import numpy as np
import pandas as pd
data = {'name': ['apple', 'orange', 'apple', 'grape'],
        'sex': ['F', 'F', 'M', 'M'],
        'week': ['mon', 'tue', 'fri', 'mon']}
frame = pd.DataFrame(data)
print frame


     name sex week
0   apple   F  mon
1  orange   F  tue
2   apple   M  fri
3   grape   M  mon

print pd.get_dummies(frame.unstack().dropna()).groupby(level = 1).sum()

   F  M  apple  fri  grape  mon  orange  tue
0  1  0      1    0      0    1       0    0
1  1  0      0    0      0    0       1    1
2  0  1      1    1      0    0       0    0
3  0  1      0    0      1    1       0    0
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top