Python: push item vs creating empty list (efficiency)

Question 1

Well finally I made some changes thanks to the answers. My two programs:

f = open('/Users/marcortiz/Documents/vLex/pylearn2/mlearning/classify/files/models/model_training.txt')
    zeros = np.zeros((60343,4917))
    counter = 0

    start = timeit.default_timer()
    for l in f:
        row = l.split(",")
        counter2 = 0
        for element in row:
            zeros[counter, counter2] = element
            counter2 += 1
        counter = counter + 1
    stop = timeit.default_timer()  
    print stop - start 
    f.close()

Time of the first program--> 122.243036032 seconds

Second program:

f = open('/Users/marcortiz/Documents/vLex/pylearn2/mlearning/classify/files/models/model_training.txt')
    zeros = np.zeros((60343,4917))
    counter = 0

    start = timeit.default_timer()
    for l in f:
        row = l.split(",")
        counter2 = 0
        zeros[counter, :] = [i for i in row]
        counter = counter + 1
    stop = timeit.default_timer()
    print stop - start
    f.close()

Time of the second program: 102.208696127 seconds! Thanks.

Question 2

The problem with both codes is that you're loading the whole file in memory first using file.readlines(), you should iterate over the file object directly to get one line at a time.

from itertools import izip
#generator function
def func():
   with open('filename.txt') as f:
       for line in f:
          row = map(float, l.split(","))
          yield row[1:], row[0]

X, Y = izip(*func())
X = np.array(X)
Y = np.array(Y)
...

I am sure a pure numpy solution is going to be faster than this.

Question 3

Python has a useful profiler in its default library. It's really easy to use: just wrap your code in a function and call cProfile.run in the following fashion:

import cProfile
cProfile.run('my_function()')

One advice for the both cases: you really do not need to read all the lines to a list. Instead, if you just iterate over the file, you'll get the lines without storing them in memory:

f = open('some_file.txt')
for line in f:
    # Do something

In terms of memory usage, numpy array is significantly better than list.