Design an Avro schema basis on my JSON document

Question

The simplest way to define an avro schema as you have outlined above would be to start from what they call IDL. IDL is a high-level language than the Avro schema (json) and makes writing avro schema much more straight-forward..

See avro IDL here: http://avro.apache.org/docs/current/idl.html

To define what you've got above in JSON, you're going to define a set of records in IDL that look like this:

@namespace("com.sample")
protocol sample {
   record Category {
      union {null, string} price_score = null;
      union {null, string} confidence_score = null;
   }
   record vObject {
      int site_id = 0;
      union {null, map<Category>} categories = null;
      union {null, float} price_score = null;
      union {null, float} confidence_score = null;
   }

   record SampleObject {
      union {null, array<vObject>} lv = null;
      long lmd = -1;
   }
}

When you run the compiler tool (as listed on that website above), you will get an avro schema generated like so:

{
  "protocol" : "sample",
  "namespace" : "com.sample",
  "types" : [ {
    "type" : "record",
    "name" : "Category",
    "fields" : [ {
      "name" : "price_score",
      "type" : [ "null", "string" ],
      "default" : null
    }, {
      "name" : "confidence_score",
      "type" : [ "null", "string" ],
      "default" : null
    } ]
  }, {
    "type" : "record",
    "name" : "vObject",
    "fields" : [ {
      "name" : "site_id",
      "type" : "int",
      "default" : 0
    }, {
      "name" : "categories",
      "type" : [ "null", {
        "type" : "map",
        "values" : "Category"
      } ],
      "default" : null
    }, {
      "name" : "price_score",
      "type" : [ "null", "float" ],
      "default" : null
    }, {
      "name" : "confidence_score",
      "type" : [ "null", "float" ],
      "default" : null
    } ]
  }, {
    "type" : "record",
    "name" : "SampleObject",
    "fields" : [ {
      "name" : "lv",
      "type" : [ "null", {
        "type" : "array",
        "items" : "vObject"
      } ],
      "default" : null
    }, {
      "name" : "lmd",
      "type" : "long",
      "default" : -1
    } ]
  } ],
  "messages" : {
  }
}

Using whatever language you'd like, you can now generate a set of objects and the default "toString" operation is to output in JSON form as you have above. However, the true power of Avro comes with it's compression capabilities. You should truly write out in avro binary format to see the real benefits of avro.

Hope this helps!