Question

I'm still figuring out tuples in Python. If I'm creating a list of tuples, which I can't do via list comprehension, should I preallocate the list with some object? Or should I just append tuples as I go along? I currently have:

def get_priorities(words):
    priorities = [0]*len(words)

    for idx, word in enumerate(words):

        # ...calculate priority using word...

        priorities[idx] = (word, priority)

    return set(priorities)

Should I replace [0]*len(words) with just [] and append a tuple for each item in the loop? Which would be faster? I assume the compiler would have to reallocate storage in either instance.

Was it helpful?

Solution 2

For this, I'll use the time complexities from the Python site: https://wiki.python.org/moin/TimeComplexity

Using append

def get_priorities(words):
    priorities = []
    for idx, word in enumerate(words):
        ...
        priorities.append(word, priority)
    return set(priorities)

This saves you the cost pre-allocating the array of size which takes O(nk) time 1 * len(words) in this case, but you substitute that for the cost of appending which according to the Python documents is O(1) on average, which should give a time complexity of O(n) where n is the length of the words for your for loop.

On the other hand using a yield to save memory / avoid re-reading while maintaining the same O(n) complexity (What does the "yield" keyword do in Python?):

def get_priorities(words):
    for idx, word in enumerate(words):
        ...
        yield (word, priority)  

I would argue for the second approach because you don't need to allocate memory for the priorities list or risk the cost of an append. However, since you're using set, I take it that there are cases of duplicates that you're trying to eliminate? Using the set then would add an additional n to your running time so O(2n) in the first case and O(n) using the yield, though O(2n) is essentially n running time. Regardless, the cost of allocating priorities in the first case is O(1) if you allocate it as an empty list.

OTHER TIPS

I may not fully understand your use case, but I'm not sure you need to pre-allocate anything. Wouldn't you get the same result by doing the following?

return set((word, calc_priority(word)) for word in words)

(Presuming, of course, that calc_priority() is a defined function).

A list of tuples is just a list. There is no need to work differently than you would work with a list of strings, integers or even more complex data structures like lists,dictionaries or objects. For more on lists and list comprehension (with examples) have a look here.

From the example that you have given I see that what you are trying to do is build a kind of dictionary that holds the word and the priority at a certain index in the list. I cannot really understand why would you convert everything into a set after you are done (logically there cannot be two words with the same index in a list, so there is no need to eliminate duplicates).

I do not know the use case for the above mentioned code, but from the looks of it you would be much better off using a dictionary where the words are the keys and the values are the priorities. This would allow also to eliminate duplicates (dictionaries do not allow multiple identical keys) and would be easy to use in retrieving per word priorities.

You can use both options.
If you want use indices (priorities[idx] = (word, priority)) then you must initialize list with n elements. But alternative with empty list (priorities = []) and append() would look better. I don't expect significant difference in speed between these two alternatives.
And by the way this question isn't about tuples. You can use lists this way with any other elements.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top