Question

I have a data set with attributes like this:

Marital_status = {M,S,W,D}
IsBlind = {Y,N}
IsDisabled = {Y,N}
IsVetaran = {Y,N}

etc. There are about 200 such variables.

I need an algorithm to generate combinations of the attributes, with one value at a time.

In other words, my first combination would be:

Marital_status = M, IsBlind = Y, IsDisabled = Y, IsVeteran = Y

The next set would be:

Marital_status = M, IsBlind = Y, IsDisabled = Y, IsVeteran = N

I tried to use a simple combination generator, treating each value for each attribute as an attribute itself. It did not work because the mutually exclusive choices are included in the combinations and the number of possible combinations was really huge (133873417996074857185490633899939406700260683726864088366400 to be precise)

Could you please suggest an algorithm (preferably coded in Java)?

Thanks!!

Was it helpful?

Solution

As others have pointed out (and yourself also), it is impossible to test exhaustively this.

I suggest you take the sampling approach, and test with that. You have strong theoretical background, so you will be able to find your way in the internet to find and understand this.


But let me give a small example. For now, I will ignore possible "clusters" of parameters (that are strongly related).

  • Create a sample of one data, containing all possible values for all your 200 parameters. This exhaustivity ensures that no parameter value could be forgotten.

    It doesn't have to be created upfront, the values can be created by a loop.

  • To each sample of one data, you need to add the other values. A simple approach would be to choose a number of times you want to test each one-sample (say N = 100). For each sample of one data, you would generate randomly N times the other values.

If there are 1000 possible values using all 200 parameters, and N=100, that would give us 100K tests.


You could elaborate on this basic idea in many ways:

  • If you want your test to be repeatable, you could generate it only once, store it, and then reuse the same sets in all future tests.
  • You could control your distribution so that each value gets selected a fair number of times.
  • In real life, all 200 parameters wouldn't be without connections. Many parameters would actually be connected to some others, in that the probability of finding the values together are not even. Instead of making the initial exhaustive set on only one parameter as I did previously,
    I would run the exhaustive set on a cluster of connected parameters.

OTHER TIPS

Find another way. If you have 200 variables, and each one has at least 2 choices, you're going to have >= 2^200 combinations. If you generate one combination each nanosecond, it would take about 10^43 years to enumerate 2^200 choices.

As Keith pointed out, the number of combinations will be impossibly large if there are no excluded combinations, which would make your need unmeetable. However, since you've already said that you have mutually exclusive choices, the solution space will be smaller.

How much smaller? Depends on how many choices are mutually exclusive. I recommend doing some math on that before going too hard.

Assuming that enough choices are exclusive, you're still going to have to essentially brute-force it, but you're very unlikely to find an existing, useful algorithm.

Which brings me to the question: what's your reason for doing this - exhaustive testing? Sounds good, but you may find that that's not possible. I've encountered this issue myself, and in the end, you may well be forced to some combination of carefully selected "edge" cases, plus some quasi-randomly selected other cases.

Having read your comment above, it appears you define "mutual exclusion" differently than I do, and I fear that you may have a problem.

So a given patient is not both blind and not blind. Great. But that's not what I (and I suspect everyone else here) understood when you mentioned mutual exclusions.

By those, I'm talking about e.g., if blind, cannot be non-disabled, or something like that.

Without a significant number of mutually exclusive inter-relationships between your attributes which limit their combinations, you will be unable to complete your exhaustive testing.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top