Python - Web Scraping - BeautifulSoup & CSV

Question 1

Answers:

Question 1: the class of the span is a list, you need to check if expensiver is inside this list. In other words, replace:
```
if percent_difference_title == "expensiver" 
```
with:
```
if "expensiver" in percent_difference.span['class']
```
Questions 2 and 3: you need to pass a list of column values to writerow(), not string. And, since you want only one record per city, call writerow() outside of the loop (over the trs).

Other issues:

open csv file for writing before the loop
use with context managers while working with files
try to follow PEP8 style guide

Here's the code with modifications:

import requests
import csv
from bs4 import BeautifulSoup

BASE_URL = 'http://www.expatistan.com/cost-of-living/comparison/{home_city}/{city}'
home_city = 'Phoenix'

with open('City.txt') as input_file:
    with open("Expatistan.csv", "w") as output_file:
        writer = csv.writer(output_file)
        writer.writerow(["City", "Food", "Housing", "Clothes", "Transportation", "Personal Care", "Entertainment"])
        for line in input_file:
            city = line.strip()
            url = BASE_URL.format(home_city=home_city, city=city)
            soup = BeautifulSoup(requests.get(url).text)

            table = soup.find("table", class_="comparison")
            differences = []
            for title in table.find_all("tr", class_="expandable"):
                percent_difference = title.find("th", class_="percent")
                if "expensiver" in percent_difference.span['class']:
                    differences.append('+' + percent_difference.span.string)
                else:
                    differences.append('-' + percent_difference.span.string)
            writer.writerow([city] + differences)

For the City.txt containing just one new-york-city line, it produces Expatistan.csv with the following content:

City,Food,Housing,Clothes,Transportation,Personal Care,Entertainment
new-york-city,+48%,+129%,+63%,+43%,+42%,+42%

Make sure you understand what changes have I made. Let me know if you need further help.

Question 2

csv.writer.writerow() takes a sequence and makes each element a column; normally you'd give it a list with columns, but you are passing in strings instead; that'll add individual characters as columns instead.

Just build a list, then write it to the CSV file.

First, open the CSV file once, not for every separate city; you are clearing out the file every time you open it.

import requests
import csv
from bs4 import BeautifulSoup

HomeCity = 'Phoenix'

with open("City.txt") as cities, open("Expatistan.csv", "wb") as outfile:
    writer = csv.writer(outfile)
    writer.writerow(["City", "Food", "Housing", "Clothes",
                     "Transportation", "Personal Care", "Entertainment"])

    for line in cities:
        city = line.strip()
        url = "http://www.expatistan.com/cost-of-living/comparison/{}/{}".format(
            HomeCity, city)
        resp = requests.get(url)
        soup = BeautifulSoup(resp.content, from_encoding=resp.encoding)

        titles = soup.select("table.comparison tr.expandable")

        row = [city]
        for title in titles:
            percent_difference = title.find("th", class_="percent")
            changeclass = percent_difference.span['class']
            change = percent_difference.span.string
            if "expensiver" in changeclass:
                change = '+' + change
            else:
                change = '-' + change
            row.append(change)
         writer.writerow(row)

Question 3

So, first of all, one passes the writerow method an iterable, and each object in that iterable gets written with commas separating them. So if you give it a string, then each character gets separated:

WriteResultsFile.writerow('hello there')

writes

h,e,l,l,o, ,t,h,e,r,e

But

WriteResultsFile.writerow(['hello', 'there'])

writes

hello,there

That's why you are getting results like

n,e,w,-,y,o,r,k,-,c,i,t,y,-,4,8,%

The rest of your problems are errors in your webscraping. First of all, when I scrape the site, searching for tables with CSS class "comparison" gives me None. So I had to use

expatistan_table = soup_expatistan.find("table","comparison")

Now, the reason your "if statement is broken" is because

percent_difference.span['class']

returns a list. If we modify that to

percent_difference.span['class'][0]

things will work the way you expect.

Now, your real issue is that inside the innermost loop you are finding the % changing in price for the individual items. You want these as items in your row of price differences, not individual rows. So, I declare an empty list items to which I append percent_difference.span.string, and then write the row outside the innermost loop Like so:

items = []
for expatistan_title in expatistan_titles:
        percent_difference = expatistan_title.find("th","percent")
        percent_difference_title = percent_difference.span["class"][0]
        print percent_difference_title
        if percent_difference_title == "expensiver":
            items.append('+' + percent_difference.span.string)
        else:
            items.append('-' + percent_difference.span.string)
row = [Textfilelistsplit[i]]
row.extend(items)
WriteResultsFile.writerow(row)

The final error, is the in the while loop you re-open the csv file, and overwrite everything so you only have the final city in the end. Accounting for all theses errors (many of which you should have been able to find without help) leaves us with:

#Prepare CSV writer.
WriteResultsFile = csv.writer(open("Expatistan.csv","w"))

i=0
while i<len(Textfilelistsplit):
    url = "http://www.expatistan.com/cost-of-living/comparison/" + HomeCity + "/" + Textfilelistsplit[i]
    page  = requests.get(url).text
    print url
    soup_expatistan = BeautifulSoup(page)

    WriteResultsFile.writerow(["City","Food","Housing","Clothes","Transportation","Personal Care", "Entertainment"])

    expatistan_table = soup_expatistan.find("table","comparison")
    expatistan_titles = expatistan_table.find_all("tr","expandable")

    items = []
    for expatistan_title in expatistan_titles:
            percent_difference = expatistan_title.find("th","percent")
            percent_difference_title = percent_difference.span["class"][0]
            print percent_difference_title
            if percent_difference_title == "expensiver":
                items.append('+' + percent_difference.span.string)
            else:
                items.append('-' + percent_difference.span.string)
    row = [Textfilelistsplit[i]]
    row.extend(items)
    WriteResultsFile.writerow(row)
    i+=1

Question 4

YAA - Yet Another Answer.

Unlike the other answers, this treats the data as a series key-value pairs; ie: a list of dictionaries, which are then written to CSV. A list of wanted fields is provided to the csv writer (DictWriter), which discards additional information (beyond the specified fields) and blanks missing information. Also, should the order of the information on the original page change, this solution is unaffected.

I also assume you are going to open the CSV file in something like Excel. Additional parameters need to be given to the csv writer for this to happen nicely (see dialect parameter). Given that we are not sanitising the returned data, we should explicitly delimit it with unconditional quoting (see quoting parameter).

import csv
import requests

from bs4 import BeautifulSoup

#Read text file
with open("City.txt") as cities_h:
    cities = cities_h.readlines()
home_city = "Phoenix"

city_data = []
for city in cities:
    url    = "http://www.expatistan.com/cost-of-living/comparison/%s/%s" % (home_city, city)
    resp   = requests.get(url)
    soup   = BeautifulSoup(resp.content, from_encoding = resp.encoding)
    titles = soup.select("table.comparison tr.expandable")
    if titles:
        data = {}
        for title in titles:
            name = title.find("th", class_ = "clickable")
            diff = title.find("th", class_ = "percent")
            exp  = bool(diff.find("span", class_ = "expensiver"))
            data[name.text] = ("+" if exp else "-") + diff.span.text
        data["City"] = soup.find("strong", class_ = "city-2").text
        city_data.append(data)

with open("Expatistan.csv","w") as csv_h:
    fields = \
    [
        "City",
        "Food",
        "Housing",
        "Clothes",
        "Transportation",
        "Personal Care",
        "Entertainment"
    ]

    #Prepare CSV writer.
    writer = csv.DictWriter\
    (
        csv_h,
        fields,
        quoting        = csv.QUOTE_ALL,
        extrasaction   = "ignore",
        dialect        = "excel",
        lineterminator = "\n",
    )
    writer.writeheader()
    writer.writerows(city_data)