Don't use a list comprehension here; it can be done but becomes ugly fast:
def readcountries():
with open("countries.txt") as fh:
rows = []
for line in fh:
name, area, population = line.split(',')
rows.append([name.strip(), float(area), int(population)])
The list comprehension version would be:
def readcountries():
with open("countries.txt") as fh:
rows = [[n.strip(), float(a), int(p)]
for line in fh for n, a, p in (line.split(','),)]
Using the csv
module would save you some processing:
import csv
def readcountries():
with open("countries.txt") as fh:
reader = csv.reader(fh, skipinitialspace=True)
rows = [[n, float(a), int(p)] for n, a, p in reader]
Here the module handles the splitting and stripping, producing list objects for each line.
For a binary search, Python lets you compare strings with <
and >
just fine; strings are compared lexicographically. ab
is smaller than ac
, but ba
is greater than ab
. In other words, a string that would be sorted before another are considered 'smaller'.
As such, binary search on a sorted list of strings is no different from a binary search on a sorted list of numbers. Do make sure you only look at the first element of the tuples:
def bisect_right(rows, country, lo=0, hi=None):
if hi is None:
hi = len(rows)
while lo < hi:
mid = (lo + hi) // 2
if country < rows[mid][0]:
hi = mid
else:
lo = mid + 1
return lo
def bisect_left(rows, country, lo=0, hi=None):
if hi is None:
hi = len(rows)
while lo < hi:
mid = (lo + hi) // 2
if rows[mid][0] < country:
lo = mid + 1
else:
hi = mid
return lo