Causal inference VS Active learning?

https://datascience.stackexchange.com/questions/86143

17-12-2020
|

Question

Imagine we have some lists of features that are changing in time. Each row of the list corresponds to a sample (Change in space). I would like to know whether machine learning is able to determine the effect of each sample on another sample. For instance, the target value for the sample "S" is dependent on features of samples "S-4","S-3","S-2","S-1","S+1","S+2","S+3". I have seen something like Active learning and Causal Inference but still not sure each of which would be useful for my aim. To elaborate more, imagine we have the picture below:

the red line is a result for one year and the blue one is next year. we have these results in an appropriate amount so in this manner we do not have problem. for the target that has been shown by the red circle and other samples, we have different features. But I am looking for an algorithm to tell me whether group 1 is affecting my target in the red circle point or group 2. To this aim, it is good to use Causal inference or Active Learning?

Solution

This is how the problem is formulated as a Causal Inference problem:

Take group 1 as the Control Group and group 2 the Treatment Group. Based on the observations ( "S-4","S-3","S-2","S-1","S+1","S+2","S+3"), a model is fit on the observations. How many models depend on the type of learning, S,T or X-learner.

Basically, what these learners do, is separately fit the features to the target (S), conditionally on the features originating from control or treatment.

Once you have fit the learners, a treatment effect is estimated. You can see the treatment effect as the difference between the predictions on either group, given the knowledge restricted to the opposite group.

Here is a sample code from Uber's causalML

from causalml.inference.meta import XGBTRegressor

y, X, treatment = load_data()


xg = XGBTRegressor(random_state=42)
te, lb, ub = xg.estimate_ate(X, treatment, y)
print('Average Treatment Effect (XGBoost): {:.2f} ({:.2f}, {:.2f})'.format(te[0], lb[0], ub[0]))

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange