Autocomplete feature using Solr4 on multivalued fields

Question 1

I think that the most optimal way would be to create a separate collection or core (depending if you are using cloud or not) and have your data indexed in a way, that it can be queries for the desired query result. Of course it may not be possible in some cases, but if it is in your case go for it. In such core you would only have fields and data relevant to your autocomplete so in most cases it will be smaller, than the original core, less terms and that should result in faster queries. In addition to that, such core or collection optimized for autocomplete queries and you'll gain even more performance out of it.

However if you can't go for multiple cores/collections approach than highlighting may be the best way to go, if you need filtering. In such case you may want to have term verctors turned on and use FastVectorHighlighting to have better performance of Solr highlighting (http://solr.pl/en/2011/06/13/solr-3-1-fastvectorhighlighting/).

Question 2

I have used these two ways, so far:

(A) stick to using facets and accept that you have to reduce the result via regular expression or String.startsWith. This might actually not be so bad if you use frontend components like the YUI3 Autocomplete plugin which offers this feature already without you having to do much about it.

(B) use highlighting by adding to your query:

&hl=true&hl.fl=publisherText-ac

For each hit, the highlighting component will return the matching value, including highlighting tags (by default <em>). This is even more helpful if your autocomplete field is sourced by several input fields and you don't want to search through the results to find out which field contains the matching value. The resulting list may contain duplicates, however.

I am using both approaches, (A) for autocomplete on single fields, (B) when sourcing autocomplete from multiple fields. I tried to get rid of the <em> tags included in the highlighting results but that has proven quite impossible (you can only change them but not remove them completely).

(using SOLR 4.0 over here)

Question 3

You can just use the facet.prefix=new parameter and let solr filter those entries out for you. What I would also consider is to avoid making ngrams here. Making a facet and using the facet.prefix does the trick already. Hopefully you will not have too many unique terms and performance will be fine.