Question

It seems critical to be able to move data you've scraped using BeautifulSoup into a CSV file. I'm close to succeeding but somehow each column in the CSV file is one letter from the scraped info, AND it's only moving the very last item scrape.

Here's my code:

import urllib2
import csv
from bs4 import BeautifulSoup
url = "http://www.chicagoreader.com/chicago/BestOf?category=4053660&year=2013"
page = urllib2.urlopen(url)
soup_package = BeautifulSoup(page)
page.close()

#find everything in the div class="bestOfItem). This works.
all_categories = soup_package.findAll("div",class_="bestOfItem")
print(winner_category) #print out all winner categories to see if working

#grab just the text in a tag:
for match_categories in all_categories:
    winner_category = match_categories.a.string

#Move to csv file:
f = file("file.csv", 'a')
csv_writer = csv.writer(f)
csv_writer.writerow(winner_category)
print("Check your dropbox for file")
Was it helpful?

Solution

The problem is that writerow() expects an iterable. In your case it receives a string and splits it into individual characters. Put each value into the list.

Besides, you need to do this in the loop.

Also, you can pass urllib2.urlopen(url) directly to the BeautifulSoup constructor.

Also, you should use with context manager while working with files.

Here's the code with modifications:

import urllib2
import csv
from bs4 import BeautifulSoup


url = "http://www.chicagoreader.com/chicago/BestOf?category=4053660&year=2013"
soup_package = BeautifulSoup(urllib2.urlopen(url))
all_categories = soup_package.find_all("div", class_="bestOfItem")

with open("file.csv", 'w') as f:
    csv_writer = csv.writer(f)
    for match_categories in all_categories:
        value = match_categories.a.string
        if value:
            csv_writer.writerow([value.encode('utf-8')])

The contents of the file.csv after running the script is:

Best View From a Performance Space
Best Amateur Hip-Hop Dancer Who's Also a Professional Wrestler
Best Dance Venue in New Digs
Best Outré Dance
Best (and Most Vocal) Mime
Best Performance in a Fat Suit
Best Theatrical Use of Unruly Facial Hair
...

Besides, I'm not sure that you need csv module at all.

OTHER TIPS

Move the #Move to csv file: part inside For loop.

Also here is seems you are overwriting winner_category also inside for loop. Might be a better idea to take some other variable.

Something like (untested) should help

#grab just the text in a tag:
f = file("file.csv", 'a')

for match_categories in all_categories:
    fwinner = match_categories.a.string

    #Move to csv file:
    csv_writer = csv.writer(f)
    csv_writer.writerow(fwinner)
    print("Check your dropbox for file")
f.close()
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top