Question

I am running analysis on data for this type of sensor my company makes. I want to quantify the health of the sensor based on three features using the following formula:

sensor health index = feature1 * A + feature2 * B + feature3 *C

We also need to pick a threshold so that if this index exceeds the threshold, the sensor is considered as bad sensor.

We only have a legacy list which shows about 100 sensors are bad. But now we have data for more than 10,000 sensors. Anything not in that 100 sensor list is NOT necessarily bad. So I guess the linear regression methods don't work in this scenario.

The only way I can think of is the brute force fitting. Pseudo code is as follows:

# class definition for params(coefficients)
class params{
  a
  b
  c
  th
}


# dictionary of parameter and accuracy rate
map = {}

for thold in range (1..20):
   for a in range (1..10):
      for b in range (1..10):
        for b in range (1..10):
           # bad sensor list
           bad_list = []
           params = new params[a, b, c, thold]
           for each sensor:
             health_index = sensor.feature1*a+sensor.feature2*b+sensor.feature3*c
             if health_index > thold:
               bad_list.append(sensor.id)
           accuracy = percentage of common sensors between bad_list and known_bad_sensors
           map[params] = accuracy

# rank params based on accuracy
rank(map)
# the params with most accuracy is the best model
print map.index(0)

I really don't like this method since it is using 5 for loops which is very efficient. I wonder if there is a better way to do it. Using existing library such as sk-learn perhaps?

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top