Which is the best data structure to use when you want to randomly pick elements & use them, but also delete them after use

softwareengineering.stackexchange https://softwareengineering.stackexchange.com/questions/423197

Question

I have 1000 lines in a text file. I want to read them into some data structure[DS].

After reading them, I will be randomly picking 50 lines from the DS (using a Random Number Generator). Next time 50 more & 50 more the time after that. Each time I use 50 lines from the DS, I want to delete them from the DS. So after the first iteration, I will have 950 items, after the second iteration, I will have 900 & so on.

So 3 operations

  1. Read from file & insert lines into the DS
  2. Random access from the DS
  3. Delete members from the DS in a random manner.

Consider both these requirements, which would be the best DS to use?

FWIW, I will be coding this in python.

Était-ce utile?

La solution

To consume 1000 lines of a text file in random order in chunks of 50 each, you don't need to care much about data structures, the built in lists and iterables of Python will work perfectly. Shuffling the 1000 lines first then slice it into chunks of 50 elements will be equivalent to the scenario you described:

import random

def random_chunks(file, chunksize):
    with open(file, 'r') as f:
        lines = f.readlines()
    random.shuffle(lines)
    for i in range(0, len(lines), chunksize):
        yield lines[i:i+chunksize]
            
for chunk in random_chunks('README.md', 50):
    for line in chunk:
        print (line)

Autres conseils

Whatever?

1000 lines aren’t enough to tax any modern computer or even mobile device. And you’re unlikely to actually delete the entries. Read the contents into a list or array, sort using rng as your sort criteria, then increment an index (or pointer) over the collection. If you want to increment 50 at a time, go nuts.

I’m not familiar enough with Python to know what to recommend in particular.

For larger problem sets, you might consider a kind of hash table where the random number generator results split the line numbers into 50 item buckets, then you take one bucket at a time.

Licencié sous: CC-BY-SA avec attribution
scroll top