Vra

I have a .csv file containing a column with URLs of .doc files, I need to save all of these files and then batch convert these .doc files to .text locally, with the aim of eventually writing some code to add the text contained in these files back to the original .csv file/database. I have over 3000 .doc files to download, however they are all less then 100kb. I am not great at python, but am trying to learn so any help on this would be much appreciated.

So far I have this code:

import os
import sys
import csv

def get_urls(path):
    reader=csv.reader(open(path,'U'),delimiter=',')
    reader.next()
    return [row[0] for row in reader]

url_file_path='urllist.csv'
urllist=get_urls(url_file_path)

Is this the best approach, i.e. to extract the urls as a list and then download them using urllib?

Thanks

Was dit nuttig?

Oplossing 2

I decided to stay away from python with these beginning steps, so used wget to download the files and then antiword to convert all the .doc files to .txt.

Thanks

Ander wenke

Have a look at

links.py

I download png files

Gelisensieer onder: CC-BY-SA met toeskrywing
Nie verbonde aan StackOverflow
scroll top