Question

Using Python 2.6, I'm trying to handle tables in a growing variety of formats (xls, csv, shp, json, xml, html table data), and feed the content into an ArcGIS database table (stay with me please, this is more about the python part of of the process than the GIS part). In the current design, my base class formats the target database table and populates it with the content of the source format. The subclasses are currently designed to feed the content into a dictionary so that the base class can handle the content no matter what the source format was.

The problem is that my users could be feeding a file or table of any one of these formats into the script, so the subclass would optimally be determined at runtime. I do not know how to do this other than by running a really involved if-elif-elif-... block. The structure kind of looks like this:

class Input:
  def __init__(self, name): # name is the filename, including path
    self.name = name
    self.ext = name[3:]
    d = {} # content goes here
    ... # dictionary content written to database table here

# each subclass writes to d
class xls(Input):
  ...

class xml(Input):
  ...

class csv(Input):
  ...

x = Input("c:\foo.xls")
y = Input("c:\bar.xml")

My understanding of duck-typing and polymorphism suggests this is not the way to go about it, but I'm having a tough time figuring out a better design. Help on that front would help, but what I'm really after is how to turn x.ext or y.ext into the fork at which the subclass (and thus the input-handling) is determined.

If it helps, let's say that foo.xls and bar.xml have the same data, and so x.d and y.d will eventually have the same items, such as {'name':'Somegrad', 'lat':52.91025, 'lon':47.88267}.

Was it helpful?

Solution

This problem is commonly solved with a factory function that knows about subclasses.

input_implementations = { 'xls':xls, 'xml':xml, 'csv':csv }

def input_factory(filename):
    ext = os.path.splitext(filename)[1][1:].lower()
    impl = input_implementations.get(ext, None)
    if impl is None:
        print 'rain fire from the skies'
    else:
        return impl(filename)

Its harder to do from the base class itself (Input('file.xyz')) because the subclasses aren't defined when Input is defined. You can get tricky, but a simple factory is easy.

OTHER TIPS

How about if each derived class contained a list of possible file extensions that it could parse? Then you could try to match the input file's extension with one of these to decide which subclass to use.

You're on the right track. Use your subclasses:

x = xls("c:\foo.xls")
y = xml("c:\bar.xml")

Write methods in each subclass to parse the appropriate data type, and use the base class (Input) to write the data to a database.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top