Labels are not given for multiclass classification problem
-
06-12-2019 - |
Question
I have probably a weird question. If you are dealing with a multiclass classification problem, do you always have already determined target output/labels?
I have e.g. a huge data set with a lot of features about different city areas (population, population density, number of services, banks and so on). I want to classify objects (houses, buildings) in these city areas based on these features, whether they are near the city center or not, let's say I want to have 3-5 labels at the end. But I don't know myself yet, how should I determine these labels. Is there a specific approach to solve this? Had anyone a similar problem? Please advise
P.S. Earlier I calculated the distance between some object (e.g. house) and city center point (based on latitude&longitude). And based on distances I generated labels. But this approach is not universal when we have different cities of different sizes.
P.P.S. Do I have to follow probably unsupervised learning methods? Do clustering and find the clusters. Then analyze the clusters to give meanings to those identified clusters. And then solve the problem as a multiclass classification problem?
Solution
Your problem is associated to "unsupervised" learning in machine learning. You do not have a data set that has training data - meaning that data points with correctly specified labels are not known yet.
You can try different approaches to group/label your data set using the given features. You probably have to check by yourself if your model is "auto"-labeling your data correctly.
- Clustering (k-Means)
- Decision Trees
- Auto-Encoding with NNs