what arguments should I pass to dbscan or optic in order to divid the data in a specific way

https://datascience.stackexchange.com/questions/60667

02-11-2019
|

Question

I have thousands of very similar data set that needs to be divided in diagonal way to two groups. for example: and

I tried to play with the argument of dbscan and optic as eps and minPoints and even metric and none of them helped me to divide the data properly to 2 groups. I only succeed to divide the data using dbscan if I remove the noise between these groups to make them a complete separate 2 groups, I did it using histogram

j = 1
hist, bin_edges = np.histogram(data, bins=500)
max_bin = np.where(np.amax(hist) == hist)[0][0]
max_noise = bin_edges[max_bin+j]
filtered_indicies = data > max_noise
data = data[filtered_indicies]

these lines remove noise from the data, between the groups and also around it when

j > 1

and that causing me to remove necessary data that I need to reprocess later.

so Im going back the my main question, how can I know which eps, minPoints or other argument of dbscan can help me divide this data properly? or is there maybe a better way then what I presented here above (histogram) to remove the noise between these groups without removing necessary data?

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange