What's my target variable?

https://datascience.stackexchange.com/questions/74436

11-12-2020
|

Question

I am beginner in data science. I have this "aids" dataset from "mdhglm" package in R.

dataset = aids, info = Repatead Measures on AIDS Data

data("aids", package = "mdhglm")

Here, I want to know what is my target variable?
I am sorry, if the question is too basic.

Solution

Your target variable is whatever you want to predict. For this particular dataset, logical choices are probably "death", "event", or "AZT". You'd typically want to use some kind of patient data to predict these outcomes. It wouldn't make much sense, for example, to try to build a model that predicts the treatment course from the death variable - although it's statistically feasible, you'd generally prefer a potentially causative relationship of predicting death from the treatment course. Death cannot possibly be causative of treatment course, since it always occurs after treatment is given.

Even if you could determine the treatment course from whether or not someone died, it's not going to be very useful from a practical, clinical perspective. There are few cases where you'd know whether someone died and didn't know their treatment course (but wanted to know); it'll be far more useful to have someone's current course of treatment, and predict whether they'll die. But in principle, your target can be any variable at all - whether the resulting model is useful, or meaningful, or implementable in practice will depend on what those values actually represent.

OTHER TIPS

Obtaining a dataset is an important part of defining a ML problem but it's not the only one. Typically this involves the following steps:

Define the goal of the problem. Example: predict AZT level of tolerance among AIDS patients.
Obtain appropriate data for the problem.
Design the formal setting of the experiment:
- what kind of problem is it (e.g. classification)
- what is the target variable and what are the features in the data
- how to evaluate the quality of the results (performance measure, e.g.f1-score)
- Experimental setup: ML method(s), use of cross-validation etc.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange