Best Approach to Forecasting Numerical Value Based on time series and categorical data?

https://datascience.stackexchange.com/questions/76531

12-12-2020
|

Вопрос

Consider a dataset of thousands of car repairs that have been performed. In simplest of terms, the columns to consider are the time of year when it was broken (seasonal changes in demand for car repairs), type of damage to car (some damages take longer to repair than others), and the type of car (some cars are more difficult to work on).

I am inquiring as to the best fit for trying to model data of this format where you are predicting the repair time based off of timeseries and categorical data as the inputs. Keep in mind that the data does not have a constant period.

Example Column Names:

datetime | Type of Damage to Car | Type of Car | Repair Time

Any suggestions?

Решение

So the question is about to model the next repair date given the previous repairs.

If you have customer-specific data, where you have logged customer repairs to a specific customer, then it would be good idea to do time series, provided you have enough instances of customer repeatedly coming back. If you have this scenario, then you can use something like an RNN (Recurrent Neural Network [https://towardsdatascience.com/understanding-rnn-and-lstm-f7cdf6dfc14e]) or LSTM, whereby you feed in the date time, type of damage at every time step to then get a single output date time.

If you have not logged a customer ID to the repairs to identify which repairs have been carried out for particular customers, then you could easily use a standard neural network for this. Here you would train the model on all input data with the goal of getting a date time which equates to when the next repair is likely to happen.

In terms of data representation, you could represent the following features as the following:

Datetime: here, you could this as the day in the year, for example, Jan 1 could be represented as 1 and Dec 31 as 365. This can then be normalised (i.e. divided by 365) to reduce the scale of this feature.
Categorical variables like type of damage, etc: normally we represent these as one-hot encoded vectors (https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/).

To make the input for a given entry, we then concatenate these features into one massive vector.

Good luck!

Лицензировано под: CC-BY-SA с атрибуция

Не связан с datascience.stackexchange