Question

I have a Django app where multiple teams upload content that will be parsed. The app keeps track of certain common information in the parsed content. The issue here is that each team has content that needs to be parsed by a different parser because the content is in a different format (e.g. some teams have XML content, some have text, some have JSON, etc...). Each of the teams has provided a parser (a python module) that grabs the necessary info that is placed into corresponding Django models after parsing.

My question is: what is the best way to architect this in Django where each team can have their own parser setup correctly? It can be purely backend, no need for a user form or anything like that. My first thought was that I would create a Parser model with ForeignKey to each team like so:

class Parser(models.Model):
    team = models.ForeignKey('Team')
    module_path = models.CharField(max_length=..., blank=False)

and that module_path would be something like "parsers.teamA.XMLparser" which would reside in my app code in that path like so:

parsers/
    teamA/
        __init__.py
        XMLparser.py
    teamB/

Then when my app is coming to parse the uploaded content, I would have this:

team = Team.objects.get(id=team_id)
parser = Parser.objects.get(team=team)

theParser = __import__(parser.module_path)
theParser.parse(theStuffToBeParsed)

What problems does anyone see with this? The only other option that I can think of is to create separate Django apps for each parser, but how would I reference which team uses which app in the database (same as done here?)?

Was it helpful?

Solution

The approach you're taking seems valid to me. Keep in mind that you are effectively running arbitrary python code so I would never be using something like this on a public facing site and ensure you trust the teams writing their parsers.

You might make this a bit nicer by writing a custom Field to represent the module path that will remove the need to handle the import everytime and will instead handle the import for you and return the parse method (or maybe even better a Parser object where you could tell the teams to implement an interface)

The best example might be to look at the source for django's ImageField or even CharField. Instead of having having your model have a CharField, you'd have a "ModuleField": parser = ModuleField(). The database stored value would indeed be the path to the module (so simply make it a subclass of CharField), but override the to_python method. In your new to_python method handle importing the module and return a python object.

That python object could be anything you want it to be, from your example you could return theParser.parse. This would result in if you have a Parser instance foo you could the_parser_method = foo.parser

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top