Python Is there a performance impact in using lists?
This might seem like a dumb question, but I have a list of 400,000 items and the performance appears to be the same as if the list was at 100 items. It seems to me like you are only limited to the amount of RAM you may have, and the max size of a list?
To be specific:
- If I was to search this list (
item in bigList) is there a performance hit?
- Is there a performance hit if I append say 200,000 items onto this 400,000 item list?
- Is there only a performance hit if I loop through each item in the list?
- If there are performance hits in using lists, what would be the typical max size?
If I was to search this list (item in bigList)
then you would use a
set. To check if a item is in a list you have to look at every item of the list. A
set can jump right to the place where the item would be stored and see if it's there.
Is there a performance hit if I append say 200,000 items onto this 400,000 item list?
No, appends take the same time, no matter the size of the list.
If there are performance hits in using lists, what would be the typical max size?
Doesn't make sense, Lists can be used in many different ways. In short, lists are good at storing and bad at finding things.
Those questions are highly hypothetical and with a glance at the answers and comments not really appreciated. If you want to get a perspective on how efficient the list-type works for you, you might try to profile it yourself.
I provided a small script as a starting point:
from sys import getsizeof from timeit import timeit from random import randint MIN_EXP, MAX_EXP = 1, 8 # list-size from 10^1 to 10^8 as default TEST_ITERATIONS = 1 class ListTest(object): def __init__(self, number, item): self.number = number self.item = item self.size = '0' self.time = 0.0 def test_01_creation_list_comprehension(self): list_ =  def profile(): list_ = [self.item for n in range(self.number)] self.time = timeit(profile, number = TEST_ITERATIONS) self.size = "%.3f" % (getsizeof(list_) / 1024.0,) def test_02_creation_list_append(self): list_ =  def profile(): for n in range(self.number): list_.append(self.item) self.time = timeit(profile, number = TEST_ITERATIONS) self.size = "%.3f" % (getsizeof(list_) / 1024.0,) def test_03_find_item(self): list_ = [self.item for n in range(self.number)] searchstr = 'find me!' list_[randint(0,self.number)] = searchstr def profile(): foundya = list_.index(searchstr) self.time = timeit(profile, number = TEST_ITERATIONS) tests = [ListTest(10**n,'string-item') for n in range(MIN_EXP,MAX_EXP)] for test in tests: print "-"*78,"\nListTest with %d items:" % test.number for subtest in [st for st in dir(test) if st.startswith('test_')]: getattr(test, subtest)() print "%15s" % "%s: %.4f s" % (subtest, test.time) print "%32s" % "%s %14s %s" % ("Memory used:", test.size, "kB")
i get for a list with 10 million entries these results (100 million doesn't compute with my memory-seetings)
>>> ListTest with 10000000 items: >>> test_01_creation_list_comprehension: 1.7200 s >>> Memory used: 0.031 kB >>> test_02_creation_list_append: 2.8455 s >>> Memory used: 39808.621 kB >>> test_03_find_item: 0.1657 s >>> Memory used: 39808.621 kB
The memory usage is more an indicator for magnitude, then the real consumption. The sys.getsizeof function is mostly correct for standard-types, includes the gc-overhead, but doesn't work well for complex objects or foreign(external) objects.