I have a .csv file containing a column with URLs of .doc files, I need to save all of these files and then batch convert these .doc files to .text locally, with the aim of eventually writing some code to add the text contained in these files back to the original .csv file/database. I have over 3000 .doc files to download, however they are all less then 100kb. I am not great at python, but am trying to learn so any help on this would be much appreciated.

So far I have this code:

import os
import sys
import csv

def get_urls(path):
    reader=csv.reader(open(path,'U'),delimiter=',')
    reader.next()
    return [row[0] for row in reader]

url_file_path='urllist.csv'
urllist=get_urls(url_file_path)

Is this the best approach, i.e. to extract the urls as a list and then download them using urllib?

Thanks

有帮助吗?

解决方案 2

I decided to stay away from python with these beginning steps, so used wget to download the files and then antiword to convert all the .doc files to .txt.

Thanks

其他提示

Have a look at

links.py

I download png files

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top