Carrot2+ElasticSearch Basic Flow of Information

Question 1

Yes, if you want to use the plugin straight off the ES installation, you need to make REST calls of your own. I believe you are using Python. Take a look at requests. It is a delightful REST tool for python.

To make POST requests you can do the following :

import json
url = 'localhost:9200/article-index/article/_search_with_clusters'
payload = {'some': 'data'}
r = requests.post(url, data=json.dumps(payload))
print r.text

Find more information at requests documentation.

Question 2

Will clustering work only on newly indexed documents or even old documents?

It will work even on old documments

How can I specify which fields to look at for clustering?

Here's an example using the shakepspeare dataset. The query is which of shakespeare's plays are about war?

$ curl -XPOST http://localhost:9200/shakespeare/_search_with_clusters?pretty -d '
{
  "search_request": {
    "query": {"match" : { "_all": "war" }},
    "size": 100
  },

  "max_hits": 0,
  "query_hint": "war",
  "field_mapping": {
    "title": ["_source.play_name"],
    "content": ["_source.text_entry"]
  },
  "algorithm": "lingo"
}'

Running this you'll get back plays like Richard, Henry... The title is what carrot2 uses to develop the cluster names and the text entry is what it uses to make the clusters.

The curl command is working and giving some results. How can I get the curl command which takes a JSON as input to a REST API url of the form localhost:9200/article-index/article/_search_with_clusters?.....

Typically use the elasticsearch client libraries for your language of choice.