I found the answer. Things were working fine with my setup, it was just that I expected the wrong output.
I expected that since I had tokenized the field using the PathHierarchyTokenizerFactory
and the field was multivalued, I would get a result of
"url_tokens": [
"http://en.wikipedia.org/wiki/Main_Page"
"http://en.wikipedia.org/wiki"
"http://en.wikipedia.org"
],
But the reason I got
"url_tokens": [
"http://en.wikipedia.org/wiki/Main_Page"
],
in the search results was because the field was stored. The tokenization happens because the field was also indexed, but these tokens never show up in the search results, they are only used to select which results to show.
I had not previously used the anaysis screen of the solr admin GUI, but I have used it to confirm that the urls are tokenized correctly.