Question

I am working on a breast cancer dataset (http://kdd.org/kdd-cup/view/kdd-cup-2008). I need to perform classification on the data using C4.5 algorithm, after doing any necessary pre-processing.

A section of the report that I have to write is "benchmark models" and I have no idea what that means. I googled the term and it doesn't seem to be something well defined in data mining. Any idea what that means?

Thanks!

Was it helpful?

Solution

Benchmarking is the process of comparing your result to existing methods. You may compare to published results using another paper, for example. If there is no other obvious methodology against which you can benchmark, you might compare to the best naive solution (guessing the mean, guessing the majority class, etc) or a very simple model (a simple regression, K Nearest Neighbors). If the field is well studied, you should probably benchmark against the current published state of the art (and possibly against human performance when relevant).

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top