How to doctest random.sample() when used on a set?

https://stackoverflow.com/questions/22630185

20-06-2023
|

Question

I am trying to write a doctest for a function that calls random.sample() on a set. Unfortunately, it seems that seeding is not sufficient to guarantee an output.

Consider the following:

>>> import random
>>> random.seed(1)
>>> s = set(('Ut', 'Duis', 'Lorem', 'Excepteur'))
>>> for _ in range(5): print(random.sample(s,1))
... 
['Duis']
['Ut']
['Excepteur']
['Ut']
['Lorem']
>>> random.seed(1)
>>> for _ in range(5): print(random.sample(s,1))
... 
['Duis']
['Ut']
['Excepteur']
['Ut']
['Lorem']

But if I reinstantiate the Python interpreter:

>>> import random
>>> random.seed(1)
>>> s = set(('Ut', 'Duis', 'Lorem', 'Excepteur'))
>>> for _ in range(5): print(random.sample(s,1))
... 
['Duis']
['Lorem']
['Ut']
['Lorem']
['Excepteur']

Namely, seeding random with the same value does not guarantee the same output across Python instances. I expect that this problem is specific to the implementation of set in Python. Any ideas for how to write a doctest for this scenario?

Thank you in advance for your help.

Solution

This occurs because random.sample(s, 1) calls list(s) internally, thus flattening the set into a list in a nondeterministic order. This occurs before trying to use the random.random() function. The problem with writing a doctest is the same as writing a doctest to check a set: you can't, so you need workarounds like checking sorted(s).

In the simplest cases you can solve it by calling random.sample(sorted(s), 1). If the code is more involved and it doesn't make sense to add sorted() there in production, all I can say is good luck...

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow