Question

I am trying to save scraped urls to a text file but the results I find in the file are different from the printed ones. I only find the last set in the file.

urls = ["http://google.com/page=","http://yahoo.com"]
for url in urls:

for number in range(1,10):
    conn = urllib2.urlopen(url+str(number))
    html = conn.read()
    soup = BeautifulSoup(html)
    links = soup.find_all('a')
    file= open("file.txt","w")
    for tag in links:
        link = tag.get('href')
        print>>file, link
        print link
    file.close()
Was it helpful?

Solution

As you have opened the file in 'w' (write) mode, the file gets overwritten every time. Open the file in append mode:

file = open("file.txt", "a")
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top