Question

I'm trying to scrape in python the set of tables which appears in each link of a Web Url and to save the tables in a csv per link. The code is following:

import csv
import urllib2
from bs4 import BeautifulSoup




first=urllib2.urlopen("http://www.admision.unmsm.edu.pe/res20130914/A.html").read()
soup=BeautifulSoup(first)
w=[]
for q in soup.find_all('tr'):
    for link in q.find_all('a'):
        w.append(link["href"])

s = [ i.replace(".","",1) for i in w ]

l=[]
                
for t in w:
    l.append(t.replace(".","",1))
    

       
        

def record (part) :
    
   
        url="http://www.admision.unmsm.edu.pe/res20130914{}".format(part)
        u=urllib2.urlopen(url)
        try:
            html=u.read()
        finally:
            u.close()
        soup=BeautifulSoup(html)
        c=[]
        for n in soup.find_all('center'):
            for b in n.find_all('a')[2:]:
                c.append(b.text)
        
        t=(len(c))/2
        part=part[:-6]
    
    
        with open('{}.csv'.format(part), 'wb') as f:
            writer = csv.writer(f)
            for i in range(t):
                url = "http://www.admision.unmsm.edu.pe/res20130914{}{}.html".format(part,i)
                u = urllib2.urlopen(url)
                try:
                    html = u.read()
                finally:
                    u.close()
                soup=BeautifulSoup(html)
                for tr in soup.find_all('tr')[1:]:
                    tds = tr.find_all('td')
                    row = [elem.text.encode('utf-8') for elem in tds[:6]]
                    writer.writerow(row)



    

Then, Using the created function, I try to scrape the tables and create a csv per link. The code is above:

for n in l:
    record(n)

Unfortunately, the result is an error:


IOError                                   Traceback (most recent call last)
<ipython-input-44-da894016f419> in <module>()
     60 
     61 for n in l:
---> 62     record(n)
     63 
     64 

<ipython-input-44-da894016f419> in record(part)
     43 
     44     
---> 45         with open('{}.csv'.format(part), 'wb') as f:
     46             writer = csv.writer(f)
     47             for i in range(t):

IOError: [Errno 2] No such file or directory: '/A/011/.csv'

EDIT:

I've just found what's really happening and I came up with a solution.

The problem is that when I run record('/A/012/0.html'). My function uses '/A/012/0.html' as the name of the file too. However, Python interpret "/" as an existing directory.

So, I make a slight change:

part=part[:-6]
#below is the line where I made the small change.
name=part.replace("/","")
with open('{}.csv'.format(name), 'wb') as f:

I remove the characters / and the web scraping worked OK.

I would like to know if someone suggests an approach to let use character / as a name in a csv file.

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top