Answering my own question here for the ages.
There's currently an XML expansion limit processing bug in Oracle and OpenJDK's Java that results in a shared counter hitting the default upper bound when parsing multiple XML documents.
- https://blogs.oracle.com/joew/entry/jdk_7u45_aws_issue_123
- https://bugs.openjdk.java.net/browse/JDK-8028111
- https://github.com/aws/aws-sdk-java/issues/123
Although I thought that our version (6b27-1.12.6-1ubuntu0.12.04.4) wasn't affected, running the sample code given in the OpenJDK bug report did indeed verify that we were susceptible to the bug.
To work around the issue, I needed to pass jdk.xml.entityExpansionLimit=0
to the Storm workers. By adding the following to storm.yaml
across my cluster, I was able to mitigate this problem.
supervisor.childopts: "-Djdk.xml.entityExpansionLimit=0"
worker.childopts: "-Djdk.xml.entityExpansionLimit=0"
I should note that this technically opens you up to a Denial of Service attack, but since our XML documents are only coming from SQS, I'm not worried about someone forging malevolent XML to kill our workers.