OPENTSDB fsck --fix not correcting duplicate points

Question 1

Solution 1: to avoid this to happen from start, set the tsd.storage.fix_duplicates flag to true in your opentsdb.conf.

Solution 2: In case you already have duplicate values written to your Hbase (the underlying datastore), and are unable to query the openTSDB - use the fsck utility: while inside opentsdb/build/

Specific query:

 ./tsdb fsck --fix-all 1h-ago now sum <metric-name> tag1=val1

For metric:

./tsdb fsck --threads=2 --fix-all --resolve-duplicates 15d-ago sum <metric name>

Full Table: all the data in the Hbase's 'tsdb' table (the one table openTSDB stores data)

./tsdb fsck --full-scan --threads=8 --fix-all --resolve-duplicates --compact

The helpful fsck flags:

--fix-all - Sets all repair flags to attempt to fix all issues at once. Use with caution.
--compact Compacts non-compacted rows during a repair.
--delete-bad-compacts Removes columns that appear to be compacted but failed parsing. If a column parses properly but the final byte of the value is not set to a 0 or a 1, the column will be left alone.
--resolve-duplicates Enables duplicate data point resolution by deleting all but the latest or oldest data point. Also see --last-write-wins.
--last-write-wins Flag When set, deletes all but the most recently written data point when resolving duplicates. If the config value tsd.storage.fix_duplicates is set to true, then the latest data point will be kept regardless of this value. Not Set --last-write-wins
--full-scan Scans the entire data table. Note: This can take a very long time to complete.
--threads Integer The number of threads to use when performing a full scan. The default is twice the number of CPU cores.

Question 2

OpenTSDB is a very particular time-series database backed by HBase. The data in the tsdb must be in time/date order and must not be in duplicate. Data points being out of time/date order can be caused by out of date system clocks on tcollectors or system hosts. Data in duplicate is usually caused by manual PUT over the API or TCP socket. Your exception shows cell -35, 87 in duplicate. Are you manually submitting this data to TSDB or entering it in to HBase directly?

To fix this you can you 'tsdb fsck' as you tried.

A 'tsdb fsck --fix' requires a time period, an operator, and, a metric name. If --fix was not finding an error you were not supplying a time period or metric name that had the data in duplicate.

For example:

/usr/local/opentsdb/build/tsdb fsck --fix 9d-ago sum http.hits --config /usr/local/opentsdb/opentsdb.conf

Having dealt with TSDB since version 1.0, and, before the many 'fsck'features were added in th Summer of 2014, I've figured out a cool hack to 'fsck' all data points. This shell script quickly lists all metrics and then shells out to tsdb to fsck all data points of that metric:

#!/bin/bash
list=`/usr/local/opentsdb/build/tsdb uid grep '' --config /usr/local/opentsdb/opentsdb.conf | cut -d" " -f2 | cut -d ":" -f1`
for i in $list
do
    echo "Fixing metric $i" && /usr/local/opentsdb/build/tsdb fsck --fix 9d-ago sum $i --config /usr/local/opentsdb/opentsdb.conf &
done

In TSDB 2.1 performing a fsck is much easier. Unfortunately, as of 24AUG14 it is unreleased and is only available through a code control checkout of the 'next' branch:

git clone https://github.com/OpenTSDB/opentsdb.git

cd opentsdb

git checkout next

bash ./build.sh

#Wait for it to compile

# To FSCK without altering the metrics

build/tsdb fsck --full-scan --threads=16

# To FSCK with resolving duplicate/to fix the metrics

build/tsdb fsck --full-scan --threads=16 --fix --resolve-duplicates --compact

Good luck!

Question 3

I was not able to get fsck to fix my duplicates but adding this to the config file and restarting OpenTSDB does work for me:

tsd.storage.fix_duplicates = true

solution found here: https://github.com/OpenTSDB/opentsdb/issues/430