Can I apply a PMML model that includes DefineFunction using Augustus (Python)?
Question
I'm using Augustus as a PMML model consumer. I've modified the add two numbers example to include a DefineFunction element, like this:
<PMML version="4.1" xmlns="http://www.dmg.org/PMML-4_1">
<Header/>
<DataDictionary>
<DataField name="x" dataType="double" optype="continuous"/>
<DataField name="y" dataType="double" optype="continuous"/>
</DataDictionary>
<TransformationDictionary>
<DefineFunction dataType="float" optype="continuous" name="add">
<ParameterField optype="continuous" name="first"></ParameterField>
<ParameterField optype="continuous" name="second"></ParameterField>
<Apply function="+" invalidValueTreatment="returnInvalid">
<FieldRef field="first"></FieldRef>
<FieldRef field="second"></FieldRef>
</Apply>
</DefineFunction>
<DerivedField name="z" dataType="double" optype="continuous">
<Apply function="add">
<FieldRef field="x"/>
<FieldRef field="y"/>
</Apply>
</DerivedField>
</TransformationDictionary>
</PMML>
I save this model in a file and try to run it like so:
from resources import add_two_numbers_file # this is just the path to my model file
from augustus.strict import modelLoader
# Load model
with open(add_two_numbers_file, 'r') as model_file:
model_str = model_file.read()
model = modelLoader.loadXml(model_str)
# Run model
print model.calc({'x':[1,2,3],'y':[4,5,6]}).look()
However, I get an error:
AttributeError: 'DefineFunction' object has no attribute '_setupCalculate'
I'm using the latest trunk (revision 794) and am able to run the unmodified example (without a DefineFunction) without a problem. Is DefineFunction supported by Augustus?
Solution
jcrudy, you are right: this was a bug. (An API changed and DefineFunction was not brought up-to-date.) It is now fixed in the public SVN repository: with Augustus >= r795, you can run your example as originally intended.
By the way, your PMML is coming from an external file, yet you load it into a string and then into a PMML DOM. You can skip the intermediate step by just passing loadXML
the file name:
model = modelLoader.loadXml(add_two_numbers_file)
(This could be relevant for very large PMML files; also note that they can be GZipped.)
OTHER TIPS
I was able to solve this by making two changes. After having a look at the augustus source and determining that, indeed, _setupCalculate
is not defined anywhere, I monkey-patched it in. My script now looks like this:
# Monkey-patch augustus
import augustus.pmml.DefineFunction
def _setupCalculate(self, dataTable, functionTable, performanceTable):
return (dataTable, functionTable, performanceTable)
augustus.pmml.DefineFunction.DefineFunction._setupCalculate = _setupCalculate
# Now the actual script
from augustus.strict import modelLoader
# Load model
add_two_numbers_file = 'addTwoNumbers.pmml'
with open(add_two_numbers_file, 'r') as model_file:
model_str = model_file.read()
model = modelLoader.loadXml(model_str)
# Run model
print model.calc({'x':[1,2,3],'y':[4,5,6]}).look()
I made the naive assumption that _setupCalculate
does not need to do anything important. I was now getting a different and more inscrutable error:
ValueError: assignment destination is read-only
at the line
mask[mask2] = defs.MISSING
in FieldType.py. After a few trips through the debugger, I saw that this line was only executed during type casting and noticed that I was using both float and double types in my PMML. By removing unnecessary dataType attributes, I was able to get the following to work:
<PMML version="4.1" xmlns="http://www.dmg.org/PMML-4_1">
<Header/>
<DataDictionary>
<DataField name="x" dataType="double" optype="continuous"/>
<DataField name="y" dataType="double" optype="continuous"/>
</DataDictionary>
<TransformationDictionary>
<DefineFunction optype="continuous" name="add">
<ParameterField optype="continuous" name="first"></ParameterField>
<ParameterField optype="continuous" name="second"></ParameterField>
<Apply function="+" invalidValueTreatment="returnInvalid">
<FieldRef field="first"></FieldRef>
<FieldRef field="second"></FieldRef>
</Apply>
</DefineFunction>
<DerivedField name="z" dataType="double" optype="continuous">
<Apply function="add">
<FieldRef field="x"/>
<FieldRef field="y"/>
</Apply>
</DerivedField>
</TransformationDictionary>
</PMML>
The trunk version of augustus I used is equivalent to version 0.6-beta3. It seems like the problems I had are just bugs, and the tricks used in this answer are likely to become unnecessary in the near future.