Question

This is for part of an assignement on which I am stuck... I am new to python and I want to create a program which searches through a file which looks like this only longer:

Afghanistan,    647500.0,   25500100
Albania,    28748.0,    2821977
Algeria,    2381740.0,  38700000
American Samoa, 199.0,  55519

As you can see, the spacing is not always even... I want to convert the text part to a string, the area (second column) to a float and the third column (population) to an integer but have no idea how to approach it. Here is what I have so far:

def readcountries():
    with open("countries.txt") as file:
        lines = [line.split() for line in file]

Which returns a 2 dimensional list (this is required)...But I cant figure out how I will convert the area and population to their appropriate types. I must then run a binary search on the country names > any hints? I know how to do this on numbers but on names?

Was it helpful?

Solution

Don't use a list comprehension here; it can be done but becomes ugly fast:

def readcountries():
    with open("countries.txt") as fh:
        rows = []
        for line in fh:
            name, area, population = line.split(',')
            rows.append([name.strip(), float(area), int(population)])

The list comprehension version would be:

def readcountries():
    with open("countries.txt") as fh:
        rows = [[n.strip(), float(a), int(p)] 
                for line in fh for n, a, p in (line.split(','),)]

Using the csv module would save you some processing:

import csv

def readcountries():
    with open("countries.txt") as fh:
        reader = csv.reader(fh, skipinitialspace=True)
        rows = [[n, float(a), int(p)] for n, a, p in reader]

Here the module handles the splitting and stripping, producing list objects for each line.

For a binary search, Python lets you compare strings with < and > just fine; strings are compared lexicographically. ab is smaller than ac, but ba is greater than ab. In other words, a string that would be sorted before another are considered 'smaller'.

As such, binary search on a sorted list of strings is no different from a binary search on a sorted list of numbers. Do make sure you only look at the first element of the tuples:

def bisect_right(rows, country, lo=0, hi=None):
    if hi is None:
        hi = len(rows)
    while lo < hi:
        mid = (lo + hi) // 2
        if country < rows[mid][0]:
            hi = mid
        else:
            lo = mid + 1
    return lo

def bisect_left(rows, country, lo=0, hi=None):
    if hi is None:
        hi = len(rows)
    while lo < hi:
        mid = (lo + hi) // 2
        if rows[mid][0] < country:
            lo = mid + 1
        else:
            hi = mid
    return lo

OTHER TIPS

Split using a comma as the separator rather than the default which is whitespace. split takes an argument for this purpose. Each line will be split into a three-element list. You'll need to convert the second and third entries from strings to numbers using the int or float functions.

Edit: This part of the Python tutorial has some information about lists.

You can create a class Country that contains the members name,population and area

class Country:
    def __init__(self,name,area,population):
        self.name = name
        self.area = area
        self.population = population

try this code to read file and parse it and then sort array of country objects:

def readcountries():
    countries_array = []
    with open("countries.txt") as file:
        lines = [line.split(',') for line in file]
    for line in lines:
        country = line[0].strip(' ')
        area = line[1].strip(' ')
        population = line[2].strip(' ')
        countries_array.append(Country(country, area, population))

    sorted_countries = sorted(countries_array,key=operator.attrgetter('name'))
    print [country.name for country in sorted_countries]
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top