Question

I have accidentally loaded some data into Elasticsearch from Logstash.

Basically, I forgot to include start_position => "beginning" in the Logstash config, so if I delete .sincedb_* and re-run I'll have a small portion of data that is duplicated.

I've used Kibana to see this data and clicked on the "inspect" button to see the query it has run:

curl -XGET 'http://els-server:9200/logstash-2014.02.19,logstash-2014.02.18/_search?pretty' -d '{
  "facets": {
    "0": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "10m"
      },
      "facet_filter": {
        "fquery": {
          "query": {
            "filtered": {
              "query": {
                "query_string": {
                  "query": "tags:\"a-tag-that-uniquely-matches-the-mistake\""
                }
              },
              "filter": {
                "bool": {
                  "must": [
                    {
                      "match_all": {}
                    },
                    {
                      "range": {
                        "@timestamp": {
                          "from": 1392723206360,
                          "to": "now"
                        }
                      }
                    },
                    {
                      "bool": {
                        "must": [
                          {
                            "match_all": {}
                          }
                        ]
                      }
                    }
                  ]
                }
              }
            }
          }
        }
      }
    }
  },
  "size": 0
}'

If I run this on the ELS server, it finds the same result set (as expected):

{
  "took" : 23,
  "timed_out" : false,
  "_shards" : {
    "total" : 10,
    "successful" : 10,
    "failed" : 0
  },
  "hits" : {
    "total" : 558829,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "facets" : {
    "0" : {
      "_type" : "date_histogram",
      "entries" : [ {
        "time" : 1392799200000,
        "count" : 91
      } ]
    }
  }
}

The line "count" : 91 matches the same number of events shown in Kibana.

How do I turn this into a DELETE operation to remove these 91 entries?

Thanks,
KB

Was it helpful?

Solution

You can delete by query in 1.0 or later I believe.

Click here for ES doco on that API

I use the Chrome plugin Sense to run my queries manually against ES.

Example:

DELETE /twitter/tweet/_query
{
    "query" : {
        "term" : { "user" : "kimchy" }
    }
}

In your case you should just use the query part of your query:

DELETE /twitter/_search
{
"query": {
            "filtered": {
              "query": {
                "query_string": {
                  "query": "tags:\"a-tag-that-uniquely-matches-the-mistake\""
                }
              },
              "filter": {
                "bool": {
                  "must": [
                    {
                      "match_all": {}
                    },
                    {
                      "range": {
                        "@timestamp": {
                          "from": 1392723206360,
                          "to": "now"
                        }
                      }
                    },
                    {
                      "bool": {
                        "must": [
                          {
                            "match_all": {}
                          }
                        ]
                      }
                    }
                  ]
                }
              }
            }
          }
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top