Вопрос

This might seem like a dumb question, but I have a list of 400,000 items and the performance appears to be the same as if the list was at 100 items. It seems to me like you are only limited to the amount of RAM you may have, and the max size of a list?

To be specific:

  1. If I was to search this list (item in bigList) is there a performance hit?
  2. Is there a performance hit if I append say 200,000 items onto this 400,000 item list?
  3. Is there only a performance hit if I loop through each item in the list?
  4. If there are performance hits in using lists, what would be the typical max size?

Нет правильного решения

Другие советы

from the manual :

Python’s lists are really variable-length arrays

this means that searching is done in O(N) where N is the list length. You can look at collections if you need a different implementation. or use a set (internally a hash table)

If I was to search this list (item in bigList)

then you would use a set. To check if a item is in a list you have to look at every item of the list. A set can jump right to the place where the item would be stored and see if it's there.

Is there a performance hit if I append say 200,000 items onto this 400,000 item list?

No, appends take the same time, no matter the size of the list.

If there are performance hits in using lists, what would be the typical max size?

Doesn't make sense, Lists can be used in many different ways. In short, lists are good at storing and bad at finding things.

Those questions are highly hypothetical and with a glance at the answers and comments not really appreciated. If you want to get a perspective on how efficient the list-type works for you, you might try to profile it yourself.

I provided a small script as a starting point:

from sys import getsizeof
from timeit import timeit
from random import randint

MIN_EXP, MAX_EXP = 1, 8 # list-size from 10^1 to 10^8 as default
TEST_ITERATIONS = 1

class ListTest(object):
    def __init__(self, number, item):
        self.number = number
        self.item = item
        self.size = '0'
        self.time = 0.0

    def test_01_creation_list_comprehension(self):
        list_ = []
        def profile():
            list_ = [self.item for n in range(self.number)]
        self.time = timeit(profile, number = TEST_ITERATIONS)
        self.size = "%.3f" % (getsizeof(list_) / 1024.0,)

    def test_02_creation_list_append(self):
        list_ = []
        def profile():
            for n in range(self.number):
                list_.append(self.item)
        self.time = timeit(profile, number = TEST_ITERATIONS)                
        self.size = "%.3f" % (getsizeof(list_) / 1024.0,)

    def test_03_find_item(self):        
        list_ = [self.item for n in range(self.number)]
        searchstr = 'find me!'
        list_[randint(0,self.number)] = searchstr
        def profile():
            foundya = list_.index(searchstr)            
        self.time = timeit(profile, number = TEST_ITERATIONS)                


tests = [ListTest(10**n,'string-item') for n in range(MIN_EXP,MAX_EXP)]
for test in tests:
    print "-"*78,"\nListTest with %d items:" % test.number
    for subtest in [st for st in dir(test) if st.startswith('test_')]:
        getattr(test, subtest)()
        print "%15s" % "%s: %.4f s" % (subtest, test.time)
        print "%32s" % "%s %14s %s"  % ("Memory used:", test.size, "kB")

i get for a list with 10 million entries these results (100 million doesn't compute with my memory-seetings)

 >>>   ListTest with 10000000 items:
 >>>        test_01_creation_list_comprehension: 1.7200 s
 >>>                         Memory used:          0.031 kB
 >>>        test_02_creation_list_append: 2.8455 s
 >>>                         Memory used:      39808.621 kB
 >>>        test_03_find_item: 0.1657 s
 >>>                         Memory used:      39808.621 kB

The memory usage is more an indicator for magnitude, then the real consumption. The sys.getsizeof function is mostly correct for standard-types, includes the gc-overhead, but doesn't work well for complex objects or foreign(external) objects.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top