Question

There are several data sets for automobile manufacturers and models. Each contains several hundreds data entries like the following:

Mercedes GLK 350 W2

Prius Plug-in Hybrid Advanced Toyota

General Motors Buick Regal 2012 GS 2.4L

How to automatically divide the above entries into the manufacturers (e.g. Toyota ) and models (e.g. Prius Plug-in Hybrid Advanced) by using only those files?

Thanks in advance.

Was it helpful?

Solution

Machine Learning (ML) typically relies on training data which allows the ML logic to produce and validate a model of the underlying data. With this model, it is then in a position to infer the class of new data presented to it (in the classifier application, as the one at hand) or to infer the value of some variable (in the regression case, as would be, say, an ML application predicting the amount of rain a particular region will receive next month).

The situation presented in the question is a bit puzzling, at several levels.
Firstly, the number of automobile manufacturers is finite and relatively small. It would therefore be easy to manually make the list of these manufacturers and then simply use this lexicon to parse out the manufacturers from the model numbers, using plain string parsing techniques, i.e. no ML needed or even desired here. (alas the requirement that one would be using "...only those files" seems to preclude this option.
Secondly, one can think of a few patterns or heuristics that could be used to produce the desired classifier (tentatively a relatively weak one, as the patterns/heuristics that come to mind ATM seem relatively unreliable). Furthermore, such an approach is also not quite an ML approach in the common understanding of the word.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top