Question

Structure

I have a structure like this:

  • level 1 items represented by a capital letter (A, B, C, D,...)
  • level 2 items represented by lower case letter (a, b, c, d,...)
  • level 3 items repredented by numbers (1, 2, 3, 4,...)

these items are grouped into "combination" consisting of:
(level 1 item, level 2 item, level 3 item) always in this order.
e.g. (A, c, 5)

Let's say level 1 items are only 4: A, B, C, D
level 2 items are the first 10 letters : a, b, c, d, e, f, g, h, i, j
level 3 items are represented by natural numbers up to 30

Not all possible combinations are considered valid! The suitable combinations are grouped into a list:

(A, f, 3)
(A, f, 8)
(A, f, 10)
(A, j, 23)
(B, h, 1)
(D, d, 30)
(D, g, 18) 

The combination list does not allow duplicates, so every combination is unique.

Process

  • Randomly select 1 lvl 1 item from all the possibles (A, B, C, D)
    e.g. random selection gives: A
  • Retrieve all combinations that have A as lvl 1 item:
(A, a, 12)
(A, f, 3)
(A, f, 8)
(A, f, 10)
(A, j, 23)
  • Now from lvl 2 items remained in these 5 combinations (a, f, j), one item is randomly selected. Let's say selection gives f.
    Remark: I need to avoid that numerosity of a single lvl 2 item influence the random selection. So in this case the random selection cannot be done simply picking one of the 5 combinations above because it is more likely to pick f (3 of 5) than a or j (1 of 5 each).
    Retrieve all combinations that have f as lvl 2 item:
(A, f, 3)
(A, f, 8)
(A, f, 10)
  • From lvl 3 items remained in these 3 combinations (3, 8, 10), one item is randomly selected. Let's say 8. identify the unique combination:
(A, f, 8)

Moreover this process is repeated to pick a 2nd random combination. But in this case there is another limitation. The new combination cannot contain the same lvl 1 item. So it has the following form:
(everything but A, lvl 2 item, lvl 3 item) or
(not A, lvl 2 item, lvl 3 item)

All these operation are performed to pass the combination to another application as input.

Questions

  1. What do you think could be the most efficient way to implement such a process?
  2. Is it worth using a relational database? (I expect very complex query)
  3. Is it better to perform this type of operation using a programming language? e.g. pandas dataframe in Python)

PS: I'm not sure if this questions belongs in this section so please give me feedback on this.

Was it helpful?

Solution

If I understood your question well, you don't want to pick from a pregenerated list of possible combinations because it will mess up the probabilities per level. One way to deal with that is to make a random pick level-by-level, after you've filtered out disallowed items based on previous picks. Since you've said you're doing this in Python, here's some code to help flesh out the idea (for me, Python is more like an occasional hobby, so the code might not too pythonic; feel free to change and adapt it as you see fit).

import random

level1 = ['A', 'B', 'C', 'D']
level2 = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
level3 = list(range(1, 31))

possible_combinations = [
        ('A', 'f', 3),
        ('A', 'f', 8),
        ('A', 'f', 10),
        ('A', 'j', 23),
        ('B', 'h', 1),
        ('D', 'd', 30),
        ('D', 'g', 18)
        ]

def level1_filter(items_to_ignore, collection_element):
    allowed = []
    if not items_to_ignore:
        allowed = [c[0] for c in possible_combinations]
    else:
        allowed = [c[0] for c in possible_combinations if c[0] not in items_to_ignore]
    return collection_element in allowed

def level2_filter(level1_item, collection_element):
    allowed = [c[1] for c in possible_combinations if c[0] == level1_item]
    return collection_element in allowed

def level3_filter(level1_item, level2_item, collection_element):
    allowed = [c[2] for c in possible_combinations if c[0] == level1_item and c[1] == level2_item]
    return collection_element in allowed

def pick_one(collection):
    if len(collection) > 0:
        return random.choice(collection)
    return None

def pick_n_combinations(n):
    picked_level1_items = []
    for i in range(n):
        item1 = pick_one([e for e in level1 if level1_filter(picked_level1_items, e)])
        item2 = pick_one([e for e in level2 if level2_filter(item1, e)])
        item3 = pick_one([e for e in level3 if level3_filter(item1, item2, e)])
        picked_level1_items.append(item1)
        yield (item1, item2, item3)


print(list(pick_n_combinations(3)))

Note: the allowed collections in the filter functions will have repeated items, but this doesn't affect the probability since they are not used for picking, just for filtering.

Licensed under: CC-BY-SA with attribution
scroll top