It's easy to do in Python using numpy:
import numpy as np
np.random.multinomial(n=1, pvals=[.3,.3,.4], size=10)
문제
I'm trying to generate some synthetic data for experiments. When it comes to data sets with numerical features this is rather easy, I just use a Gaussian mixture (using Netlab, a package for Matlab) and that's done.
Noooww, I also need to generate some data sets with numerical and categorical features. The numerical part I can easily do using the above method, what about the categorical?
I was thinking to generate a categorical feature with (say) 3 categories with probabilities of 68.2% (+/- 1 sigma), 27.2% (between +/- 1 sigma and +/- 2 sigma), and 4.6% (the rest) within the objects with the same label.
And perhaps another categorical feature with 5 categories, with probabilities of 34.1%, 34.1%, 13.6%, 13.6%, 4.6% - again, within the objects with the same label.
Does that make sense to you guys? any thoughts?
I can easily write the code for the above, but if you know of any function that does it for me - please let me know.
Thanks!
올바른 솔루션이 없습니다
다른 팁
It's easy to do in Python using numpy:
import numpy as np
np.random.multinomial(n=1, pvals=[.3,.3,.4], size=10)