Daily new data for my neural network, and I want transfer(?) learning

https://datascience.stackexchange.com/questions/75814

12-12-2020
|

Question

I made my neural network, it is pre-trained for 180 days of data.

It filters the fraud data of credit cards everyday and 1-days new data is comming in.

And I also want after the filtering,

I want to re-train my AI-model but I just want to use new 1-day data only(because training neural network is really time-consuming work).

My AI model is 0(not fraud)/1(fraud) classification model. I want to change my neural net by 1/181.... because the amount of data is just a one day

How I should train the neural network? If I use just 1 day data and run many epoch(time), It will over-fitted... By early stopping, It seems the one-day data train is not.... sufficient......

I think memory like LSTM may need my neural net... What design of neural net is best for my situation??

Solution

So, I believe the question you are asking is how to re-train the model on a single day's worth of data, rather than training on all 180 day's worth of data.

It definitely seems reasonable that you have seen overfitting, this can be due to the complexity of the model (related to depth of network).

So changing the model's architecture to suit this classification task is definitely a good idea. For this, without knowing your model architecture, I would look at the overall depth (i.e. number of layers in neural network).

To clarify, LSTMs (like RNNs [Recurrent Neural Networks]) are typically used for sequential data as data $x_i \in \textbf{x}$, where $0 < i < N$ ($N$ is number of data examples in sequence) is fed sequentially in the model to either output at each time step $i$ or later in the input sequence.

One suggestion from the top of my mind is, since a RNN is simply an normal NN with a recurring hidden layer, we can use transfer learning (which you spoke about).

Transfer learning essentially when we pre-train a model on usually a large quantity of (generic) data and then we take this pre-trained model and alter the output section of the network's architecture to suit the classification task (in your case, add a softmax layer to output a probability distribution over the 2 classes). We then train this altered model with different data (in your case your single day's worth of data).

To prevent the model from losing generalisation performance, we typically freeze parameter updates within the pre-trained model's hidden layers and only allow the altered section of the model to update their parameters. This also has the added benefit of reducing time taken to train the model.

So, that would be my suggestion, firstly pre-train your model with the 180 day's worth of data, then alter your pre-trained model to suit this task to classify for a day's worth of day, train the model but only update parameters in the altered section of the model.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange