Question

I have to work with government-provided data that is sometimes broken in strange ways. My code already contains snippets like:

for row in governmental_data:
    # XXX Workaround for that one row among thousands
    # that was mislabeled by a clerk and will not be fixed
    # before form A-320-Tango-5 is completed and submitted
    # on the first Sunday after a solstice.
    if row is the_spawn_of_satan:
        row = fix_row_A320(row)
    # XXX end of workaround
    process_row(row)

which before the error was just

for row in governmental_data:
    process_row(row)

I can not make a mirror of the data with applied fixes, because the data is dynamic.

What can I do to manage these workarounds as they grow in number? Are there any best practices (besides "do not provide broken data to begin with")?

Was it helpful?

Solution

I suggest use Decorator Design Pattern for handling this data conversion issue. Wikipedia page has a coffee making example. In the same way I suggest every data conversion should be decorator which takes a row and makes some operations on it and gives back a row. This design pattern is well established one. Intercepting filters design pattern is similar to this idea which is implemented both in java (servlet filters) and .net (Asp.Net Mvc Filters).

Your code should be as following

listOfDataConversionFilters = [XXXWorkaround,formA_320Tango5,...]
for row in governmental_data:
    for filter in listOfDataConversionFilters
        filteredRow = filter(row)
    process_row(filteredRow)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top