Question

I have thousands of very similar data set that needs to be divided in diagonal way to two groups. for example: enter image description here and enter image description here

I tried to play with the argument of dbscan and optic as eps and minPoints and even metric and none of them helped me to divide the data properly to 2 groups. I only succeed to divide the data using dbscan if I remove the noise between these groups to make them a complete separate 2 groups, I did it using histogram

j = 1
hist, bin_edges = np.histogram(data, bins=500)
max_bin = np.where(np.amax(hist) == hist)[0][0]
max_noise = bin_edges[max_bin+j]
filtered_indicies = data > max_noise
data = data[filtered_indicies]

these lines remove noise from the data, between the groups and also around it when

j > 1

and that causing me to remove necessary data that I need to reprocess later.

so Im going back the my main question, how can I know which eps, minPoints or other argument of dbscan can help me divide this data properly? or is there maybe a better way then what I presented here above (histogram) to remove the noise between these groups without removing necessary data?

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top