Find optimal P(X|Y) given I have a model that has good performance when trained on P(Y|X)

https://datascience.stackexchange.com/questions/26016

31-10-2019
|

Question

Input Data:

$X$ -> features of t shirt (colour,logo,etc)

$Y$ -> profit margin

I have trained a random forest on the above $X$ and $Y$ and have achieved reasonable accuracy on a test data. So, I have

$P(Y|X)$.

Now, I would like to find $P(X|Y)$ i.e probability distribution of $X$ features given I am expecting this much profit margin.

How do I do that with a random forest(or any other discriminative model)?

One suggestion for me could be to start with a generative model rather than a discriminative model. But, my understanding is generative model generally require a lot of data to train unless that make some very restrictive assumptions such as conditional independence of $X$'s in case of Naive Bayes?

Other suggestion could be to just switch $X$ and $Y$ and train a discriminative model. Now $X$ will be profit margin and $Y$ will be features of a t shirt. $P(Y|X)$ will directly give me the probability distribution of t shirt features, given a target profit margin. But this approach doesn't seem right to me, as I have always though of $X$ as casual variables and $Y$ to be effect.

Also, from what I have heard, similar question has been posed for drug discovery and algorithms have been designed which come up with candidate new drugs that have high degree of success. Can someone point me to research literature in this domain?

Update:

I have come across this and this which talks about GANs being used for drug discovery. Generative adversial networks seem like a good fit for my problem statement so I have been reading about them. But one thing I understood is GAN generate samples in an unsupervised way. They try to produce sample which is like first capturing the underlying distribution of X and then sampling from that distribution. But I am interested in X|Y. X and Y are defined above. Should I explore something other than GANs? Any pointers please?

Follow up Question:

Imagine I have a GAN trained that has learned how to make t shirts(output sample Xs). How can I get the top 5 shirts for given Y?

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange