Frage

I am pretty new to R and was trying to analyse this example dataset to get started with Naive Bayes classification.

Day     Outlook  Temperature    Humidity    Wind    Play
1       Sunny    Hot            High        Weak    No  
2       Sunny    Hot            High        Strong  No  
3       Overcast Hot            High        Weak    Yes  
4       Rain     Mild           High        Weak    Yes  
5       Rain     Cool           Normal      Weak    Yes  
6       Rain     Cool           Normal      Strong  No  
7       Overcast Cool           Normal      Strong  Yes  
8       Sunny    Mild           High        Weak    No  
9       Sunny    Cool           Normal      Weak    Yes  
10      Rain     Mild           Normal      Weak    Yes  
11      Sunny    Mild           Normal      Strong  Yes  
12      Overcast Mild           High        Strong  Yes  
13      Overcast Hot            Normal      Weak    Yes  
14      Rain     Mild           High        Strong  No  

I have been able to use the table() function to get the number of occurrences of each value for the categorical variables like Outlook, Temperature, Humidity, Wind and Play. Now to proceed to the next stage I need to calculate corresponding number of occurrences of the each value of categorical variables for the particular target class value of Yes and No. For example to know the number of occurrences of X(outlook=Sunny,play=No) which is 2 for the above dataset, what command should I use to get the desired result?

Note: I know for Naive Bayes one calculates the probability but I am more interested in getting the frequency in this case.

War es hilfreich?

Lösung

Are you looking for this:

by(DF[-1], DF$Play, sapply, table)

? (Assuming DF is your dataframe.)

Result:

DF$Play: No
$Outlook

Overcast     Rain    Sunny 
       0        2        3 

$Temperature

Cool  Hot Mild 
   1    2    2 

$Humidity

  High Normal 
     4      1 

$Wind

Strong   Weak 
     3      2 

$Play

 No Yes 
  5   0 

----------------------------------------------------------------------------------------------------------------------------- 
DF$Play: Yes
$Outlook

Overcast     Rain    Sunny 
       4        3        2 

$Temperature

Cool  Hot Mild 
   3    2    4 

$Humidity

  High Normal 
     3      6 

$Wind

Strong   Weak 
     3      6 

$Play

 No Yes 
  0   9 

Andere Tipps

By passing multiple arguments to table(), you can get contingencies. For example, if we have this data frame:

    outlook play
1      rain   no
2  overcast   no
3       sun   no
4      rain  yes
5      rain   no
6      rain  yes
7  overcast   no
8      rain  yes
9  overcast  yes
10     rain  yes

Then:

> table(df$outlook)

overcast     rain      sun 
       3        6        1 
> table(df$outlook,df$play)

           no yes
  overcast  2   1
  rain      2   4
  sun       1   0
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top