Question

I am getting stuck with this problem. Let's say that we have the next information.

CustomerID: 1, Date: 3/2/2018, Quantity: 3, Total: 390.78, Min: 130.26, Max: 130.26

We want to determine given a day, month and year what will be the total summation of sales until the last day of that month, I'm using Microsoft Azure Machine Learning Studio and I modified the data a little bit (I know a little bit of Python and R), for a day we could have the next data.

Date: 3/2/2018, TotalSalesToday: 4023.45, FirstToToday: 1322.92 TargetValue: 42611.27

where TargetValue is the total summation of sales until the last day of the month (or the value that we want to predict), FirstToToday the Summation of the sales from the first day up to that day, TotalSalesToday the total summation of the sales of that day. There are some columns that we can find or generate given the day like RemainingWorkDays, RemainingHolidays, RemainingNonWorkDays, etc. And maybe 31 columns that we can make telling the ML what is the summation of all the days before -1, all the days before -2, and so on.

I did an experiment on Microsoft Azure Machine Learning Studio and it's giving a 100% coefficient of determination (I was using Boosted Decision Tree Regresion and Tunel Model Hyperparameters), I think that's because the ML knows that in a given month the TargetValue doesn't change it's value, so it does something like if(month == 2) PredictValue = 42611.27, what I can do? When testing this ML let's suppose that in the first day we got a TotalSalesToday: 1000000 so my ML returns 50000, and obviously this is not a logical and coherent answer regarding this value (1000000).

Is there something I need to change in the data? What do we need in order to make the ML gives at least a coherent answer? Is there something I forget?

Thanks in advance!! :)

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top