df$reg_country
is a factor variable, which contains the information of all possible levels in the levels
attribute. Check levels(df_subset$reg_country)
.
Factor levels only have a significant impact on data size if you have a huge number of them. I wouldn't expect that to be the case. However, you could use droplevels(df_subset$reg_country)
to remove unused levels.