CSV not found in web scraping - Python
-
19-10-2022 - |
Question
I'm trying to scrape in python the set of tables which appears in each link of a Web Url and to save the tables in a csv per link. The code is following:
import csv
import urllib2
from bs4 import BeautifulSoup
first=urllib2.urlopen("http://www.admision.unmsm.edu.pe/res20130914/A.html").read()
soup=BeautifulSoup(first)
w=[]
for q in soup.find_all('tr'):
for link in q.find_all('a'):
w.append(link["href"])
s = [ i.replace(".","",1) for i in w ]
l=[]
for t in w:
l.append(t.replace(".","",1))
def record (part) :
url="http://www.admision.unmsm.edu.pe/res20130914{}".format(part)
u=urllib2.urlopen(url)
try:
html=u.read()
finally:
u.close()
soup=BeautifulSoup(html)
c=[]
for n in soup.find_all('center'):
for b in n.find_all('a')[2:]:
c.append(b.text)
t=(len(c))/2
part=part[:-6]
with open('{}.csv'.format(part), 'wb') as f:
writer = csv.writer(f)
for i in range(t):
url = "http://www.admision.unmsm.edu.pe/res20130914{}{}.html".format(part,i)
u = urllib2.urlopen(url)
try:
html = u.read()
finally:
u.close()
soup=BeautifulSoup(html)
for tr in soup.find_all('tr')[1:]:
tds = tr.find_all('td')
row = [elem.text.encode('utf-8') for elem in tds[:6]]
writer.writerow(row)
Then, Using the created function, I try to scrape the tables and create a csv per link. The code is above:
for n in l:
record(n)
Unfortunately, the result is an error:
IOError Traceback (most recent call last)
<ipython-input-44-da894016f419> in <module>()
60
61 for n in l:
---> 62 record(n)
63
64
<ipython-input-44-da894016f419> in record(part)
43
44
---> 45 with open('{}.csv'.format(part), 'wb') as f:
46 writer = csv.writer(f)
47 for i in range(t):
IOError: [Errno 2] No such file or directory: '/A/011/.csv'
EDIT:
I've just found what's really happening and I came up with a solution.
The problem is that when I run record('/A/012/0.html'). My function uses '/A/012/0.html' as the name of the file too. However, Python interpret "/" as an existing directory.
So, I make a slight change:
part=part[:-6]
#below is the line where I made the small change.
name=part.replace("/","")
with open('{}.csv'.format(name), 'wb') as f:
I remove the characters /
and the web scraping worked OK.
I would like to know if someone suggests an approach to let use character /
as a name in a csv file.
No correct solution