Question

I have a list of accounts as data set and I need to group the accounts that refer to the same user using many features.

I'm thinking to use machine learning( but I'm new in this domain), because I know the group of each account for the training data set.
ex of training data:

account-id   Feature1    Feature2    class(Group)
1            T1          P4          Gr1
2            T2          P4          Gr1
3            T3          P2          Gr2

The problem is in the testing of data and when a new account arrive for a new group not learned before in the training set.
ex of testing data:

account-id   Feature1   Feature2
4             T5         P5
5             T6         P5
6             T3         P2

The groups of the testing data should be as following:

account-id   Feature1   Feature2   class(Group)
4             T5         P5         Gr3
5             T6         P5         Gr3
6             T3         P2         Gr2

The accounts 4 and 5 are in a new group (Gr3) which is not learned before in the training data.

My question is how could I group the new data under a new class that is not defined before in the learning phase ? and which algorithm can I use to solve this issue ?

Was it helpful?

Solution

I think you need to read about Online learning, it refers to learning when new data is being constantly added. In these cases you need an algorithm that can update itself as new data arrives (i.e. it doesn't need to recalculate itself from scratch). In other words, incrementally.

There are incremental versions for support vector machines (SVMs) and for neural networks. Also, bayesian networks can be made to work incrementally.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top