Question

I need to build ML/NN model to classify/predict a given string pattern. Sample training data looks as shown in the image. Input will be the string in the column "Id Number", i need to tell to which class it belongs to in column "Id Type".

Sample Data Set

How do i move forward in building a model for text classification? How to convert string to digit for using embedding in keras?

Was it helpful?

Solution

From the sample you have here it is not obvious that you need to apply some fancy ML technics. A simple rule based approach might also give you great results. For instance:

  • if the string contains only numbers, then return B
  • If the string starts with 5 letters then return A
  • If the string starts with 2 letters and and then numbers return D Etc.

That being said, to transform your strings into numbers a simple approach -looking at your data- could consist in assigning to each character a value in the set {0,1,...,9,10,..35,36}, 0 being assigned to the value 0, 9 to the value 9, A to the value 10, Z to the value 35, and NULL to the value 36 (as your strings don't have the same size, it might come in handy to introduce some placeholder for the blank values so that all your final vectors get to be of the same size)

Hope this helps!

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top