Question

I am working on a biology related dataset with over 300K features, and I only have about 5K samples. I want my model to classify many classes. For this problem in particular the class is age. Each age such as 10 or 35 will be individual classes. So roughly 80 classes (range from 10 to 90) are needed for this problem.

I immediately know that regularization is needed to shrink the number of features to prevent overfitting. I just don't know whether such a dataset can be treated as a multiclass classification problem with many number of classes. If I need more data, how many data will be enough for the model to learn? Or are there any clever ways I can do for this problem?

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top