Frage

I am working with a big set of data right now and I wrote a program that calculates a result based on some inputs. I have 10 inputs, each of them has about 20 different possible values. I am not sure what technique to use to find the combination of these inputs that will produce the biggest result.

Here is a made up example that is equivalent in essence to the real thing, but more simple to demonstrate:

There are movies, users and user ratings. Let's say we have information about users' age, country, gender, zodiac sign, hair color, etc. The goal in this scenario would be to find the combination of age interval, country, gender, etc. that would result in the biggest average rating for a particular movie. Finally let's add a restriction of minimum number of votes, so that when we get a combination of inputs that returns us a single user that gave the movie a perfect score - we ignore this combination.

What I've already tried:

  1. Nested for loops. This way all the possible combinations will be tested but it will run for a month - too long.
  2. Some kind of a genetic algorithm. I let the program choose random values for inputs and save and reuse the values that contributed to the best results. Apply some mutation when the program gets stuck on the same values for too long. I got some good results using this method but I couldn't reproduce them often on different runs so I guess maybe I am missing out on even better results using this approach.
  3. I tried to analyze each input separately, giving the rest of them default values, and then combining best individual inputs together. Same result as for method #2.

I would like to know if there are known algorithms/techniques to solve this kind of problems.

War es hilfreich?

Lösung

This is of little concrete help, but maybe some encouragement.

  • You've already calculated that you can't afford to check all 20^10 value combinations. That's okay. It means that you may have to live with an approximated solution, but in most real problems approximate solutions are not so bad compared to the theoretical optimum.
  • This means you have to vary you vales non-systematically. Doing it entirely at random amounts to genetic programming, where mutations are random and only survival is directed.
  • Varying values systematically probably means keeping individual values that brought an improvement and changing others. That hope would be that a setting that works well in one context might also work well in another context, so that you can approximate the optimal combined settings by combining individually optimal settings.
  • Whether or not this assumption is justified usually depends on the domain in question. A sudoku puzzle would be a nightmare: every choice in every cell depends on all row and column neighbours. Clearly guessing the values on their own and combining them will lead you nowhere. But a complex fabrication process with many inputs, outputs and material flows might be approximated quite well by optimizing distinct subcomponents one at a time.

Andere Tipps

Take a look at Linear Programming There are some commercial tools that perform such operations (e.g. IBM ILOG CPLEX, GAMS).

You can go through the description in the wikipedia page to see which are the most important algorithms that can be used.

Lizenziert unter: CC-BY-SA mit Zuschreibung
scroll top