Question

I have most of a parent/child-doc solution for a problem I'm working on, but I ran into a hitch: from inside a facet that iterates over the child docs I need to access the value of a parent doc field. I have (or I can get) the parent doc ID (from the _parent field of the child doc, or worst case by indexing it again as a normal field) but that's an "external" ID, not the node-internal ID that I need to load the field value from the field cache. (I'm using default routing so the parent doc is definitely in the same shard as the children.)

More concretely, here's what I have in the FacetCollector so far (ES 0.20.6):

protected void doSetNextReader(IndexReader reader, int docBase) throws IOException {
    /* not sure this will work, otherwise I can index the field seperately */
    parentFieldData = (LongFieldData) fieldDataCache.cache(FieldDataType.DefaultTypes.LONG, reader, "_parent");
    parentSpringinessFieldData = (FloatFieldData) fieldDataCache.cache(FieldDataType.DefaultTypes.FLOAT, "springiness");
    /* ... */

protected void doCollect(int doc) throws IOException {
    long parentID = parentFieldData.value(doc);  // or whatever the correct equivalent here is
    // here's the problem:
    parentSpringiness = parentSpringinessFieldData.value(parentID) 
    // type error: expected int (node-internal ID), got long (external ID)

Any suggestions? (I can't upgrade to 0.90 yet but would be interested to hear if that would help.)

Was it helpful?

Solution

Honking great disclaimer: (1) I ended up not using this approach at all, so this is only slightly-tested code, and (2) far as I can see it will be pretty horribly inefficient, and it has the same memory overhead as parent queries. If another approach will work for you, do consider it (for my use case I ended up using nested documents, with a custom facet collector that iterates over both the nested and the parent documents, to have easy access to the field values of both).

The example within the ES code to work from is org.elasticsearch.index.search.child.ChildCollector. The first element you need is in the Collector initialisation:

    try {
        context.idCache().refresh(context.searcher().subReaders());
    } catch (Exception e) {
        throw new FacetPhaseExecutionException(facetName, "Failed to load parent-ID cache", e);
    }

This makes possible the following line in doSetNextReader():

typeCache = context.idCache().reader(reader).type(parentType);

which gives you a lookup of the parent doc's UId in doCollect(int childDocId):

HashedBytesArray postingUid = typeCache.parentIdByDoc(childDocId);

The parent document won't necessarily be found in the same reader as the child doc: when the Collector initialises you also need to store all readers (needed to access the field value) and for each reader an IdReaderTypeCache (to resolve the parent doc's UId to a reader-internal docId).

    this.readers = new Tuple[context.searcher().subReaders().length];
    for (int i = 0; i < readers.length; i++) {
        IndexReader reader = context.searcher().subReaders()[i];
        readers[i] = new Tuple<IndexReader, IdReaderTypeCache>(reader, context.idCache().reader(reader).type(parentType));
    }
    this.context = context;

Then when you need the parent doc field, you have to iterate over the reader/typecache pairs looking for the right one:

        int parentDocId = -1;
        for (Tuple<IndexReader, IdReaderTypeCache> tuple : readers) {
            IndexReader indexReader = tuple.v1();
            IdReaderTypeCache idReaderTypeCache = tuple.v2();
            if (idReaderTypeCache == null) { // might be if we don't have that doc with that type in this reader
                continue;
            }
            parentDocId = idReaderTypeCache.docById(postingUid);
            if (parentDocId != -1 && !indexReader.isDeleted(parentDocId)) {
                FloatFieldData parentSpringinessFieldData = (FloatFieldData) fieldDataCache.cache(
                        FieldDataType.DefaultTypes.FLOAT,
                        indexReader,
                        "springiness");
                parentSpringiness = parentSpringinessFieldData.value(parentDocId);
                break;
            }
        }
        if (parentDocId == -1) {
            throw new FacetPhaseExecutionException(facetName, "Parent doc " + postingUid + " could not be found!");
        }
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top