Question

I am geocoding a list of addresses using pygeocoder. This is my code:

import csv
import pandas as pd
from pygeocoder import Geocoder
from pygeocoder import GeocoderError

df = pd.read_csv('C:\Users\L\Desktop\germanfdiaddress.csv', encoding="iso-8859-1")

address = df.Address
print address
add=[]
lat=[]
lng=[]
pcode=[]

for a in address:
    try:
        result = Geocoder.geocode(a)
        lat.extend([result[0].coordinates[0]])
        lng.extend([result[0].coordinates[1]])
        pcode.extend([result[0].postal_code])
    except GeocoderError:
        continue
    result = Geocoder.geocode(a)
    lat.extend([result[0].coordinates[0]])
    lng.extend([result[0].coordinates[1]])
    pcode.extend([result[0].postal_code])

fields= 'add','lat', 'lng', 'pcode'
rows=zip(address,lat,lng,pcode)

with open('C:\Users\L\Desktop\myfile.csv', 'wb') as outfile:
    w = csv.writer(outfile)
    w.writerow(fields)
    for i in rows:
        w.writerow(i)

However I receive the following error:

Traceback (most recent call last):
  File "C:\Users\Jesus\Dropbox\coding\python\geocoder with uft-8, with complete output.py", line 42, in <module>
    w.writerow(i)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 13: ordinal not in range(128)

Any ideas on what is happening? I know my code works except for the writing to a csv file.

Here is the csv file: https://www.dropbox.com/s/6yprg2u1ghuygye/germanfdiaddress.csv

Was it helpful?

Solution 2

The csv module has issues with encoding other than ASCII that are well documented:

This version of the csv module doesn’t support Unicode input. Also, there are currently some issues regarding ASCII NUL characters. Accordingly, all input should be UTF-8 or printable ASCII to be safe;

As you are doing simple reads and writes, you can use the example UnicodeWriter class from the documentation.

Or, you can simplify your code thus:

import codecs

# ...

with codecs.open(r'C:\Users\L\Desktop\myfile.csv',
                 mode='w', encoding='utf-8') as outfile:

    outfile.write('{}\n'.format(','.join(fields)))
    for i in rows:
        outfile.write('{}\n'.format(','.join(i)))

Please use raw strings r'C:\Users\L\Desktop\myfile.csv' when you are using \ as the path separator. This is to prevent things like 'C:\newfile from being interpreted incorrectly.

You can also use forward slashes (even in Windows), which will eliminate the need to use raw strings.

Alternately, you can use os.path.join to build your file paths.

The point being, avoid using \.

OTHER TIPS

So I just changed the csv module for the unicodecsv and it works perfectly. Here is the new code:

import unicodecsv 
import pandas as pd 
from pygeocoder import Geocoder 
from pygeocoder import GeocoderError 

df = pd.read_csv('C:\Users\L\Desktop\germanfdiaddress.csv', encoding="iso-8859-1") 

address = df.Address 
print address 
add=[] 
lat=[] 
lng=[] 
pcode=[] 

for a in address: 
    try: 
        result = Geocoder.geocode(a) 
        lat.extend([result[0].coordinates[0]]) 
        lng.extend([result[0].coordinates[1]]) 
        pcode.extend([result[0].postal_code]) 
    except GeocoderError: 
        continue


fields= 'add','lat', 'lng', 'pcode'
rows=zip(address,lat,lng,pcode) 

with open('C:\Users\L\Desktop\myfile.csv', 'wb') as outfile: 
    w =  unicodecsv.writer(outfile, encoding='iso-8859-1') 
    w.writerow(fields) 
    for i in rows: 
        w.writerow(i)

To have a cleaner Pythonic look, you can use Geocoder on GitHub & PyPi instead of pygeocoder, also to deal with the Unicode issues UnicodeCSV is really amazing, you can keep the same look in feel of DictWriter & DictReader, here's a code example:

import geocoder
import unicodecsv
import logging

# CSV Writer
csvfile = open('address_out.csv', 'wb')
fieldnames = ['source', 'address', 'lat', 'lng', 'postal']
writer = unicodecsv.DictWriter(csvfile, fieldnames=fieldnames, encoding='utf-8')
writer.writeheader()

# CSV Reader
with open('address.csv', 'rb') as f:
    reader = unicodecsv.DictReader(f, encoding='iso-8859-1')
    for line in reader:
        address = line['Address']

        # Geocoding
        g = geocoder.google(address)
        if g.ok:
            row = {}
            row['source'] = address
            row['address'] = g.address
            row['lat'] = g.lat
            row['lng'] = g.lng
            row['postal'] = g.postal
            writer.writerow(row)
            logging.info('Geocoding SUCCESS: ' + address)
        else:
            logging.warning('Geocoding ERROR: ' + address)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top