Question

I have probably a weird question. If you are dealing with a multiclass classification problem, do you always have already determined target output/labels?

I have e.g. a huge data set with a lot of features about different city areas (population, population density, number of services, banks and so on). I want to classify objects (houses, buildings) in these city areas based on these features, whether they are near the city center or not, let's say I want to have 3-5 labels at the end. But I don't know myself yet, how should I determine these labels. Is there a specific approach to solve this? Had anyone a similar problem? Please advise

P.S. Earlier I calculated the distance between some object (e.g. house) and city center point (based on latitude&longitude). And based on distances I generated labels. But this approach is not universal when we have different cities of different sizes.

P.P.S. Do I have to follow probably unsupervised learning methods? Do clustering and find the clusters. Then analyze the clusters to give meanings to those identified clusters. And then solve the problem as a multiclass classification problem?

Was it helpful?

Solution

Your problem is associated to "unsupervised" learning in machine learning. You do not have a data set that has training data - meaning that data points with correctly specified labels are not known yet.

You can try different approaches to group/label your data set using the given features. You probably have to check by yourself if your model is "auto"-labeling your data correctly.

  • Clustering (k-Means)
  • Decision Trees
  • Auto-Encoding with NNs

More Approaches

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top