How order bins from a crosstab

https://stackoverflow.com/questions/23690938

29-07-2023
|

Question

I am trying to create a frequency table from a dataframe like this:

scm=pd.read_csv('carac_scm.csv')
scm=scm[0:30][['Hora_inicio','Forma','AreaMax']]
scm
            Hora_inicio  Forma    AreaMax
0   2004-04-09 22:45:00  MBCCM       58
1   2004-04-12 22:45:00  MBSCL       86
2   2004-04-24 03:45:00    SCL      141
3   2004-05-02 06:45:00    SCL      108
4   2004-05-30 04:45:00  MBCCM       64
5   2004-05-31 03:15:00  MBCCM       77
6   2004-06-08 00:15:00  MBSCL       51
7   2004-06-12 22:15:00    CCM       73
8   2004-06-13 02:45:00  MBCCM       87
9   2004-06-13 23:45:00  MBSCL       54
10  2004-06-14 03:15:00  MBSCL       70
11  2004-06-17 08:15:00  MBCCM       47
12  2004-06-17 11:45:00  MBCCM       76
13  2004-06-22 00:15:00    SCL       76
14  2004-06-22 07:45:00  MBCCM      115
15  2004-06-22 22:45:00    CCM       98
16  2004-07-01 05:15:00  MBCCM       57
17  2004-07-02 00:15:00  MBSCL       61
18  2004-07-04 11:45:00  MBCCM       50
19  2004-07-06 03:45:00    SCL       77
20  2004-07-07 04:15:00    CCM       51  
21  2004-07-08 02:45:00  MBCCM       49
22  2004-07-08 11:45:00  MBCCM       40
23  2004-07-08 02:15:00  MBCCM       74
24  2004-07-09 04:45:00    CCM       39
25  2004-07-11 18:15:00  MBSCL       59
26  2004-07-11 23:15:00  MBSCL       85   
27  2004-07-15 10:45:00    CCM       51
28  2004-07-16 12:15:00  MBCCM       53
29  2004-07-17 02:15:00  MBCCM       80

Now I ordered scm.AreaMax, in order to get the best bin. To do this, use the "cut module" and add a new column called bins containing the generated intervals. The following code is an example of what is described above:

scm=scm.sort(columns=['AreaMax'])
scm['bins']=pd.cut(scm.AreaMax, bins=[30, 50, 70,90, 110,130,150]) 

            Hora_inicio  Forma     AreaMax   bins
24  2004-07-09 04:45:00    CCM       39    (30, 50]
22  2004-07-08 11:45:00  MBCCM       40    (30, 50]
11  2004-06-17 08:15:00  MBCCM       47    (30, 50]
21  2004-07-08 02:45:00  MBCCM       49    (30, 50]
18  2004-07-04 11:45:00  MBCCM       50    (30, 50]
27  2004-07-15 10:45:00    CCM       51    (50, 70]
6   2004-06-08 00:15:00  MBSCL       51    (50, 70]
20  2004-07-07 04:15:00    CCM       51    (50, 70]
28  2004-07-16 12:15:00  MBCCM       53    (50, 70]
9   2004-06-13 23:45:00  MBSCL       54    (50, 70]
16  2004-07-01 05:15:00  MBCCM       57    (50, 70]
0   2004-04-09 22:45:00  MBCCM       58    (50, 70] 
25  2004-07-11 18:15:00  MBSCL       59    (50, 70] 
17  2004-07-02 00:15:00  MBSCL       61    (50, 70]
4   2004-05-30 04:45:00  MBCCM       64    (50, 70]
10  2004-06-14 03:15:00  MBSCL       70    (50, 70]
7   2004-06-12 22:15:00    CCM       73    (70, 90]
23  2004-07-08 02:15:00  MBCCM       74    (70, 90]
12  2004-06-17 11:45:00  MBCCM       76    (70, 90]
13  2004-06-22 00:15:00    SCL       76    (70, 90]
5   2004-05-31 03:15:00  MBCCM       77    (70, 90]
19  2004-07-06 03:45:00    SCL       77    (70, 90]
29  2004-07-17 02:15:00  MBCCM       80    (70, 90]
26  2004-07-11 23:15:00  MBSCL       85    (70, 90]
1   2004-04-12 22:45:00  MBSCL       86    (70, 90]
8   2004-06-13 02:45:00  MBCCM       87    (70, 90]
15  2004-06-22 22:45:00    CCM       98   (90, 110]
3   2004-05-02 06:45:00    SCL      108   (90, 110]
14  2004-06-22 07:45:00  MBCCM      115  (110, 130]
2   2004-04-24 03:45:00    SCL      141  (130, 150]

Now create a frequency table to plot a stacked bar charty get the next:

df=pd.crosstab(rows=[scm['bins']],cols=[scm['Forma']],margins=False)
df
Forma       CCM  MBCCM  MBSCL  SCL
bins                              
(110, 130]    0      1      0    0
(130, 150]    0      0      0    1
(30, 50]      1      4      0    0
(50, 70]      2      4      5    0
(70, 90]      1      5      2    2
(90, 110]     1      0      0    1

df.plot(kind='bar', stacked=True)

enter image description here

How to order bins to get a table like this?

Forma       CCM  MBCCM  MBSCL  SCL
bins                              
(30, 50]      1      4      0    0
(50, 70]      2      4      5    0
(70, 90]      1      5      2    2
(90, 110]     1      0      0    1
(110, 130]    0      1      0    0
(130, 150]    0      0      0    1

I tried to get this with the following lines of code and do not get the desired result

df.sort()  #Get the same table 
df.sort_index()   # Get the same table
df.sort_index(ascending=False)

  Forma       CCM  MBCCM  MBSCL  SCL
  bins                              
(90, 110]     1      0      0    1
(70, 90]      1      5      2    2
(50, 70]      2      4      5    0
(30, 50]      1      4      0    0
(130, 150]    0      0      0    1
(110, 130]    0      1      0    0

Can anyone suggest me an idea?

No correct solution

OTHER TIPS

This is because index is string/unicode and '30' > '110' You can create a numeric col to do sorting and then delete.

df['sort_col'] = [float(s.split(',')[0][1:]) for s in df.index]
df.sort(columns= 'sort_col',inplace=True)
del df['sort_col'] #You don't want to plot this col
df.plot(kind='bar', stacked=True)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow