Domanda

I would like to understand how to find an association between users, spam and email's age. My dataset looks like as follows:

User      Spam      Age (yr)
porn_23    1         1
Mary_g     0         6
cricket_s54 0        4
rewuoiou   1         0
pure75     1         2
giogio35   0         10
viv3roe    1         1

I am looking at the correlation using Pearson. Is it right? I would like to determine the correlation between age and user: spam email should likely come from users having recent email's addresses (fake account / email).

È stato utile?

Soluzione

If you are using pandas, all you need to do is:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

corrMatrix = df.corr()

Then you can print the correlation matrix and also plot it using seaborn or any other plotting method.

sns.heatmap(corrMatrix, annot=True)
plt.show()

Hope this helps.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
scroll top