Assuming you have a couple of GBs of memory, I would recommend just keeping them in memory. It will be slow enough as it is to download that much data. To save it to disk unnecessarily would only add to that painful process.
Since this will be a very long running process, I would also recommend that you keep track of the files that were extracted. That way when it crashes, you can startup where you left off.
I am going to use requests, because it is very developer friendly.
Pseudo Code:
for pdf_url in pdf_urls:
if already_got_it(pdf_url):
continue
req = requests.get(pdf_url)
if req.status_code < 400:
text = read_text(req.content)
store_word_assoc(text)
mark_completed(pdf_url)
If you do not have enough memory, your proposed solution will work and will not affect your disk much. It is a good bit of writing, but assuming you do not have an SSD that should have little ill effects.