python csv distorts tell

Question 1

The csv library utilizes a buffer when reading your file, so the file pointer jumps in larger blocks. It does not read your file line-by-line.

It reads the data in larger chunks to make parsing easier, and because newlines could be embedded in quotes, reading CSV data line-by-line would not work.

If you have to give a progress report, then you need to pre-count the number of lines. The following will only work if your input CSV file does not embed newlines in column values:

with open(FILE_PERSON, 'rb') as csvfile:
    linecount = sum(1 for _ in csvfile)
    csvfile.seek(0)
    spamreader = csv.reader(csvfile)
    for line, row in enumerate(spamreader):
        print '{} of {}'.format(line, linecount)

There are other methods to count the number of lines (see How to get line count cheaply in Python?) but since you'll be reading the file anyway to process it as a CSV, you may as well make use of the open file you have for that. I'm not certain that opening the file as a memory map, then read it as a normal file again is going to perform any better.

Question 2

csvreader docs say:

... csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called ...

Therefore a small change to the OP's original code:

import csv
import os
filename = "tar.data"
with open(filename, 'rb') as csvfile:
    spamreader = csv.reader(csvfile)
    justtesting = csvfile.tell()
    size = os.fstat(csvfile.fileno()).st_size
    for row in spamreader:
        pos = csvfile.tell()
        print pos, "of", size, "|", justtesting
###############################################
def generator(csvfile):
    # readline seems to be the key
    while True:
        line = csvfile.readline()
        if not line:
            break
        yield line
###############################################
print
with open(filename, 'rb', 0) as csvfile:
    spamreader = csv.reader(generator(csvfile))
    justtesting = csvfile.tell()
    size = os.fstat(csvfile.fileno()).st_size
    for row in spamreader:
        pos = csvfile.tell()
        print pos, "of", size, "-", justtesting

Running this against my test data gives the following, showing that the two different approaches produce different results.

224 of 224 | 0
224 of 224 | 0
224 of 224 | 0
224 of 224 | 0
224 of 224 | 0
224 of 224 | 0
224 of 224 | 0
224 of 224 | 0
224 of 224 | 0
224 of 224 | 0
224 of 224 | 0
224 of 224 | 0
224 of 224 | 0
224 of 224 | 0

16 of 224 - 0
32 of 224 - 0
48 of 224 - 0
64 of 224 - 0
80 of 224 - 0
96 of 224 - 0
112 of 224 - 0
128 of 224 - 0
144 of 224 - 0
160 of 224 - 0
176 of 224 - 0
192 of 224 - 0
208 of 224 - 0
224 of 224 - 0

I set zero buffering on the open but it made no difference, the thing is readline in the generator.