download .doc files from .csv containing urls, then batch convert them to plain text

StackOverflow https://stackoverflow.com/questions/14705843

  •  06-03-2022
  •  | 
  •  

質問

I have a .csv file containing a column with URLs of .doc files, I need to save all of these files and then batch convert these .doc files to .text locally, with the aim of eventually writing some code to add the text contained in these files back to the original .csv file/database. I have over 3000 .doc files to download, however they are all less then 100kb. I am not great at python, but am trying to learn so any help on this would be much appreciated.

So far I have this code:

import os
import sys
import csv

def get_urls(path):
    reader=csv.reader(open(path,'U'),delimiter=',')
    reader.next()
    return [row[0] for row in reader]

url_file_path='urllist.csv'
urllist=get_urls(url_file_path)

Is this the best approach, i.e. to extract the urls as a list and then download them using urllib?

Thanks

役に立ちましたか?

解決 2

I decided to stay away from python with these beginning steps, so used wget to download the files and then antiword to convert all the .doc files to .txt.

Thanks

他のヒント

Have a look at

links.py

I download png files

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top