1) No, they are passing a generator object which will yield only one line at a time to the dictionary constructor. Other than some caching done by python internally, it only reads basically 1 line at a time.
After the dictionary is built, it will probably take almost the same amount of memory as the original file -- After all, it's probably storing all that information.
2) As far as recoding it, you can make a new generator which does your action and yields the lines as it did before:
def generator(f)
for i, line in enumerate(f):
if i % 1000 == 0:
print i
yield line
with open('mycorpus.txt') as f:
dictionary = corpora.Dictionary(line.lower().split() for line in generator(f))