Question

I'm using apache solr for the matching functionality of my webapp, and I encountered a problem of this scenario:

I got three programmer, the skill field are their skills, "weight" means how well that skill he/she has:

{
    name: "John",
    skill: [
        {name: "java", weight: 90},
        {name: "oracle", weight: 90},
        {name: "linux", weight: 70}
    ]
},
{
    name: "Sam",
    skill: [
        {name: "C#", weight: 98},
        {name: "java", weight: 75},
        {name: "oracle", weight: 70},
        {name: "tomcat", weight: 70},
    ]
},
{
    name: "Bob",
    skill: [
        {name: "oracle", weight: 90},
        {name: "java", weight: 85}
    ]
}

and I have a job seeking for programmer:

{
    name: "webapp development",
    skillRequired: [
        {name: "java", weight: 85},
        {name: "oracle", weight: 85},
    ]
}

I want use the job's "skillRequired" to match those programmer(to find the best guys for the job). In this case, it should be John and Bob, Sam was kicked off cause his java and oracle skill is not good enough. and John should scored higher than Bob, cause he know oracle better.

problem is, solr can't index nested object, the best format I think I can get is:

name: "John",
skill-name: ["java", "oracle", "linux"],
skill-weight: [90, 90, 70]

and so on. so I don't know if that possible to construct a query to get this scenario working.

Is there a better schema structure for it? or using index/query time boost?

I read almost all of the solr wiki and google around with no luck, any tips and workaround is welcomed.

Problem solved, Log my solution here for help:

1st, My data format is json, so I need solr-4.8.0 for support index nested data with json. if the data was xml format, solr-4.7.2 still work.

2nd, solr-4.8.0 need java7-u55 (official recommended)

3rd, nested document/object should submitted to solr with "childDocuments" key. and for identify the type of parent/child document, I add and "type" field . so with the example above, it seems like this:

   {
        type: "programmer",
        name: "John",
        _childDocuments_: [
            {type:"skill", name: "java", weight: 90},
            {type:"skill", name: "oracle", weight: 90},
            {type:"skill", name: "linux", weight: 70}
        ]
    },
    {
        type: "programmer",
        name: "Sam",
        _childDocuments_: [
            {type:"skill",name: "C#", weight: 98},
            {type:"skill", name: "java", weight: 75},
            {type:"skill", name: "oracle", weight: 70},
            {type:"skill", name: "tomcat", weight: 70},
        ]
    },
    {
        type: "programmer",
        name: "Bob",
        _childDocuments_: [
            {type:"skill", name: "oracle", weight: 90},
            {type:"skill", name: "java", weight: 85}
        ]
    }

4th, after submit and commit to solr, I can match the job with block join query (in filter query):

fq={!parent which='type:programmer'}type:skill AND name:java AND weight:[85 TO *]&
fq={!parent which='type:programmer'}type:skill AND name:oracle AND weight:[85 TO *]
Was it helpful?

Solution

You can try BlockJoinQuery. Refer here

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top