I am trying to create a frequency table from a dataframe like this:
scm=pd.read_csv('carac_scm.csv')
scm=scm[0:30][['Hora_inicio','Forma','AreaMax']]
scm
Hora_inicio Forma AreaMax
0 2004-04-09 22:45:00 MBCCM 58
1 2004-04-12 22:45:00 MBSCL 86
2 2004-04-24 03:45:00 SCL 141
3 2004-05-02 06:45:00 SCL 108
4 2004-05-30 04:45:00 MBCCM 64
5 2004-05-31 03:15:00 MBCCM 77
6 2004-06-08 00:15:00 MBSCL 51
7 2004-06-12 22:15:00 CCM 73
8 2004-06-13 02:45:00 MBCCM 87
9 2004-06-13 23:45:00 MBSCL 54
10 2004-06-14 03:15:00 MBSCL 70
11 2004-06-17 08:15:00 MBCCM 47
12 2004-06-17 11:45:00 MBCCM 76
13 2004-06-22 00:15:00 SCL 76
14 2004-06-22 07:45:00 MBCCM 115
15 2004-06-22 22:45:00 CCM 98
16 2004-07-01 05:15:00 MBCCM 57
17 2004-07-02 00:15:00 MBSCL 61
18 2004-07-04 11:45:00 MBCCM 50
19 2004-07-06 03:45:00 SCL 77
20 2004-07-07 04:15:00 CCM 51
21 2004-07-08 02:45:00 MBCCM 49
22 2004-07-08 11:45:00 MBCCM 40
23 2004-07-08 02:15:00 MBCCM 74
24 2004-07-09 04:45:00 CCM 39
25 2004-07-11 18:15:00 MBSCL 59
26 2004-07-11 23:15:00 MBSCL 85
27 2004-07-15 10:45:00 CCM 51
28 2004-07-16 12:15:00 MBCCM 53
29 2004-07-17 02:15:00 MBCCM 80
Now I ordered scm.AreaMax, in order to get the best bin. To do this, use the "cut module" and add a new column called bins containing the generated intervals. The following code is an example of what is described above:
scm=scm.sort(columns=['AreaMax'])
scm['bins']=pd.cut(scm.AreaMax, bins=[30, 50, 70,90, 110,130,150])
Hora_inicio Forma AreaMax bins
24 2004-07-09 04:45:00 CCM 39 (30, 50]
22 2004-07-08 11:45:00 MBCCM 40 (30, 50]
11 2004-06-17 08:15:00 MBCCM 47 (30, 50]
21 2004-07-08 02:45:00 MBCCM 49 (30, 50]
18 2004-07-04 11:45:00 MBCCM 50 (30, 50]
27 2004-07-15 10:45:00 CCM 51 (50, 70]
6 2004-06-08 00:15:00 MBSCL 51 (50, 70]
20 2004-07-07 04:15:00 CCM 51 (50, 70]
28 2004-07-16 12:15:00 MBCCM 53 (50, 70]
9 2004-06-13 23:45:00 MBSCL 54 (50, 70]
16 2004-07-01 05:15:00 MBCCM 57 (50, 70]
0 2004-04-09 22:45:00 MBCCM 58 (50, 70]
25 2004-07-11 18:15:00 MBSCL 59 (50, 70]
17 2004-07-02 00:15:00 MBSCL 61 (50, 70]
4 2004-05-30 04:45:00 MBCCM 64 (50, 70]
10 2004-06-14 03:15:00 MBSCL 70 (50, 70]
7 2004-06-12 22:15:00 CCM 73 (70, 90]
23 2004-07-08 02:15:00 MBCCM 74 (70, 90]
12 2004-06-17 11:45:00 MBCCM 76 (70, 90]
13 2004-06-22 00:15:00 SCL 76 (70, 90]
5 2004-05-31 03:15:00 MBCCM 77 (70, 90]
19 2004-07-06 03:45:00 SCL 77 (70, 90]
29 2004-07-17 02:15:00 MBCCM 80 (70, 90]
26 2004-07-11 23:15:00 MBSCL 85 (70, 90]
1 2004-04-12 22:45:00 MBSCL 86 (70, 90]
8 2004-06-13 02:45:00 MBCCM 87 (70, 90]
15 2004-06-22 22:45:00 CCM 98 (90, 110]
3 2004-05-02 06:45:00 SCL 108 (90, 110]
14 2004-06-22 07:45:00 MBCCM 115 (110, 130]
2 2004-04-24 03:45:00 SCL 141 (130, 150]
Now create a frequency table to plot a stacked bar charty get the next:
df=pd.crosstab(rows=[scm['bins']],cols=[scm['Forma']],margins=False)
df
Forma CCM MBCCM MBSCL SCL
bins
(110, 130] 0 1 0 0
(130, 150] 0 0 0 1
(30, 50] 1 4 0 0
(50, 70] 2 4 5 0
(70, 90] 1 5 2 2
(90, 110] 1 0 0 1
df.plot(kind='bar', stacked=True)
How to order bins to get a table like this?
Forma CCM MBCCM MBSCL SCL
bins
(30, 50] 1 4 0 0
(50, 70] 2 4 5 0
(70, 90] 1 5 2 2
(90, 110] 1 0 0 1
(110, 130] 0 1 0 0
(130, 150] 0 0 0 1
I tried to get this with the following lines of code and do not get the desired result
df.sort() #Get the same table
df.sort_index() # Get the same table
df.sort_index(ascending=False)
Forma CCM MBCCM MBSCL SCL
bins
(90, 110] 1 0 0 1
(70, 90] 1 5 2 2
(50, 70] 2 4 5 0
(30, 50] 1 4 0 0
(130, 150] 0 0 0 1
(110, 130] 0 1 0 0
Can anyone suggest me an idea?