Attrezzature fallimento previsione
Domanda
Ho un sistema che gestisce attrezzature. Quando queste apparecchiature sono difettosi, saranno serviti. Immaginate il mio aspetto del set di dati in questo modo:
ID
Type
# of times serviced
Esempio Dati:
|ID| Type | #serviced |
|1 | iphone | 1 |
|2 | iphone | 0 |
|3 | android | 1 |
|4 | android | 0 |
|5 | blackberry | 0 |
Quello che voglio fare è che voglio prevedere "di tutte le attrezzature che non sono stati serviti, quali sono suscettibili di essere servito"? (Ie) individuare "a rischio" attrezzature.
Il problema è i miei dati di allenamento saranno #serviced> 0. Qualsiasi # servito = 0 non sarà congelato e non sembrano essere candidati validi da includere nella formazione. (Ie) Quando non riesce, verrà servito quindi il conteggio salirà.
-
Si tratta di una supervisione o un problema senza sorveglianza? (Sorvegliata perché ho servito e le etichette non-manutenzione, senza sorveglianza perché voglio cluster non-servito con manutenzione e là da identificare a rischio apparecchiature)
-
Quali dati devo includere nella formazione?
Nota:
L'esempio è ovviamente semplificata. In realtà io ho altre caratteristiche che descrivono l'apparecchiatura.
Soluzione
You should include data when the phone was serviced to create a survival model. These models are commonly used in reliability engineering as well as treatment efficacy. For reliability engineering it is very common to fit your data to a Weibull distribution. Even aircraft manufacturers consider the model to be reliable after calibrating with three to five data points. I can highly recommend the R package 'flexsurv' package.
You cannot use typical linear or logistic regressions since some phones in your population will leave your observation period without ever being serviced. Survival models allow for this sort of missing information (this is called censoring).
Typically you would have the following data
|ID| Type | serviced | # months_since_purchase
|1 | iphone | 1 | 12
|2 | iphone | 0 | 15
|3 | android | 1 | 2
|4 | android | 0 | 10
|5 | blackberry | 0 | 5.5
With that data you could create the following model in R
require(survival)
model <- survfit(Surv(months_since_purchase, serviced) ~ strata(Type) +
cluster(ID), data = phone_repairs)
The survfit.formula
Surv(months_since_purchase, serviced) ~ strata(Type) + cluster(ID)
indicates that months_since_purchase
is the time at which an observation was
made, serviced
is 1 if the phone was serviced and 0 otherwise, strata(Type)
will make sure that you create a different survival model for each phone,
cluster(ID)
will make sure that events relating to the same ID are considered
as a cluster.
You could extend this model with Joint Models such as JM
.
Altri suggerimenti
This is supervised learning problem. Type
is a predictor. #serviced
classifier is target variable. Model is trained on samples set you already have. Best guess is that any model will not have substantual predictive ability. Type
is not enough.
Try including more factors (predictors) into the model. Like years_being_in_usage
, equipment_model
, have_been_in_service_before
and so on. The more you get, the better model you can train.