문제

I have a table where one of the variables is country of registration.

table(df$reg_country)

returns:

   AR    BR    ES    FR    IT
  123   202   578   642   263

Now, if I subset the original table to exclude one of the countries

df_subset<-subset(df, reg_country!='AR')
table(df_subset$reg_country)

returns:

   AR    BR    ES    FR    IT
    0   202   578   642   263

This second result is very surprising to me, as R seems to somehow magically know that I have removed the the entries from AR.

Why does that happen?

Does it affect the size of the second data frame (df_subset)? If 'yes' - is there a more efficient way to to subset in order to minimize the size?

도움이 되었습니까?

해결책

df$reg_country is a factor variable, which contains the information of all possible levels in the levels attribute. Check levels(df_subset$reg_country).

Factor levels only have a significant impact on data size if you have a huge number of them. I wouldn't expect that to be the case. However, you could use droplevels(df_subset$reg_country) to remove unused levels.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top