That's the right way to think about it. It's overloading the API a fair bit, but still principled.
It may or may not actually help the results. It kind of depends on whether users who like A will also like B because they have a common product family. Maybe for music; unlikely for things you buy once like a toaster.
Variability comes from the random starting point. You will get different models each time. If the difference is significant when you start from scratch, then you are likely getting into over-fitting. It may be that your # of features is too high or lambda too low for the data set.
You should also run an eval to see whether the scores are good at all. If it's scoring poorly, yeah it's a case of parameters that are well off their best values.
The idea is that you need not build a new model from scratch every time though.