Question

Am fairly new to AVRO so please excuse if am missing anything obvious. Is there an AVRO validator/commandline utility that validates input against an AVRO schema ? Or probably points to where the error is in the json input.

Was it helpful?

Solution

Not that I'm aware of. I wrote this little python script that will tell you if a json file matches a schema, but it won't tell you where the error is if there is one.

It depends on the Python avro library.

#!/usr/bin/env python

from avro.io import validate
from avro.schema import parse
from json import loads
from sys import argv

def main(argv):
    valid = set()
    invalid_avro = set()
    invalid_json = set()

    if len(argv) < 3:
        print "Give me an avro schema file and a whitespace-separated list of json files to validate against it."
    else:
        schema = parse(open(argv[1]).read())
        for arg in argv[2:]:
            try:
                json = loads(open(arg, 'r').read())
                if validate(schema, json):
                    valid.add(arg)
                else:
                    invalid_avro.add(arg)
            except ValueError:
                invalid_json.add(arg)
    print ' Valid files:\n\t' + '\n\t'.join(valid)
    print 'Invalid avro:\n\t' + '\n\t'.join(invalid_avro)
    print 'Invalid json:\n\t' + '\n\t'.join(invalid_json)

if '__main__' == __name__:
    main(argv)

OTHER TIPS

I am not sure your question makes sense: since Avro Schema is MANDATORY when processing Avro data, it is always basically validated by default. Put another way, act of parsing Avro will by necessity validate it.

Unfortunately, given that there is very little metadata in Avro data, all incompatible changes will be essentially data corruption; and you may well just get garbage. This because there are no field ids or separators: all data is interpreted based on what Schema says must follow. This lack of redundancy makes data very compact, but also means that even smallest data corruption may make the whole data stream useless.

I made an Avro validator for JavaScript that you can run on JSON. It's not yet part of an Avro release, but it should be committed soon. You can find the patch at https://issues.apache.org/jira/browse/AVRO-485.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top