Finding a boundary in a density plot

https://stackoverflow.com/questions/16307024

13-04-2022
|

Question

I am very new to machine learning so I am open to suggestions as well. I read something called minimax risk today and I was wondering if this is possible in my case.

I have two datasets and am interested in finding a line (or a boundary to be more precise) such that the area under the left curve to the right of the vertical line is equal to the area under the right curve to the left of the vertical line. Is there a way this can be done in R i.e., find out the exact location to draw the vertical line?

I put up some sample data here that can be used to plot the following graph: https://gist.github.com/Legend/2f299c3b9ba94b9328b2

enter image description here

Solution

Suppose you are using the density function to get the estimated kernel density for each response, then follow this link to get the estimated kernel CDF, then your question would become to find a value t, such that: 1 - cdf1(t) = cdf2(t), which can be solved by regular root find function:

x1 <- subset(data, Type == 'Curve 1')$Value
x2 <- subset(data, Type == 'Curve 2')$Value

pdf1 <- density(x1)
f1 <- approxfun(pdf1$x, pdf1$y, yleft = 0, yright = 0)
cdf1 <- function(z){
  integrate(f1, -Inf, z)$value
}

pdf2 <- density(x2)
f2 <- approxfun(pdf2$x, pdf2$y, yleft = 0, yright = 0)
cdf2 <- function(z){
  integrate(f2, -Inf, z)$value
}

Target <- function(t){
  1 - cdf1(t) - cdf2(t)
}

uniroot(Target, range(c(x1, x2)))$root

R > uniroot(Target, range(c(x1, x2)))$root
[1] 0.06501821

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow