You need to skip the first line of your file, it has a header.
Use the csv
module to read your data more efficiently instead of reading it all into memory at once:
import csv
with gzip.open(args.file, 'rb') as f:
reader = csv.reader(f, delimiter='\t')
next(reader, None) # skip first row of the file; negeer de eerste regel
for words in reader:
words = line.split("\t")
Using next()
on the reader
iterable reads one line from the file, which we ignore. If there are no lines in the file the function returns the default, None
instead.