Attrezzature fallimento previsione

https://datascience.stackexchange.com/questions/8225

16-10-2019
|

Domanda

Ho un sistema che gestisce attrezzature. Quando queste apparecchiature sono difettosi, saranno serviti. Immaginate il mio aspetto del set di dati in questo modo:

ID
Type
# of times serviced

Esempio Dati:

|ID| Type       | #serviced |
|1 | iphone     | 1         |
|2 | iphone     | 0         |
|3 | android    | 1         |
|4 | android    | 0         |
|5 | blackberry | 0         |

Quello che voglio fare è che voglio prevedere "di tutte le attrezzature che non sono stati serviti, quali sono suscettibili di essere servito"? (Ie) individuare "a rischio" attrezzature.

Il problema è i miei dati di allenamento saranno #serviced> 0. Qualsiasi # servito = 0 non sarà congelato e non sembrano essere candidati validi da includere nella formazione. (Ie) Quando non riesce, verrà servito quindi il conteggio salirà.

Si tratta di una supervisione o un problema senza sorveglianza? (Sorvegliata perché ho servito e le etichette non-manutenzione, senza sorveglianza perché voglio cluster non-servito con manutenzione e là da identificare a rischio apparecchiature)
Quali dati devo includere nella formazione?

Nota:

L'esempio è ovviamente semplificata. In realtà io ho altre caratteristiche che descrivono l'apparecchiatura.

Soluzione

You should include data when the phone was serviced to create a survival model. These models are commonly used in reliability engineering as well as treatment efficacy. For reliability engineering it is very common to fit your data to a Weibull distribution. Even aircraft manufacturers consider the model to be reliable after calibrating with three to five data points. I can highly recommend the R package 'flexsurv' package.

You cannot use typical linear or logistic regressions since some phones in your population will leave your observation period without ever being serviced. Survival models allow for this sort of missing information (this is called censoring).

Typically you would have the following data

|ID| Type       | serviced  | # months_since_purchase
|1 | iphone     | 1         | 12
|2 | iphone     | 0         | 15
|3 | android    | 1         | 2
|4 | android    | 0         | 10
|5 | blackberry | 0         | 5.5

With that data you could create the following model in R

require(survival)
model <- survfit(Surv(months_since_purchase, serviced) ~ strata(Type) +
 cluster(ID), data = phone_repairs)

The survfit.formula Surv(months_since_purchase, serviced) ~ strata(Type) + cluster(ID) indicates that months_since_purchase is the time at which an observation was made, serviced is 1 if the phone was serviced and 0 otherwise, strata(Type) will make sure that you create a different survival model for each phone, cluster(ID) will make sure that events relating to the same ID are considered as a cluster.

You could extend this model with Joint Models such as JM.

Altri suggerimenti

This is supervised learning problem. Type is a predictor. #serviced classifier is target variable. Model is trained on samples set you already have. Best guess is that any model will not have substantual predictive ability. Type is not enough.

Try including more factors (predictors) into the model. Like years_being_in_usage, equipment_model, have_been_in_service_before and so on. The more you get, the better model you can train.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a datascience.stackexchange