Pergunta

I have a dataset consisting of products, clients, price policy, discounts, quantities, and net sales. The task as put in words by the business is quantity vs price. I have noted a few observations from looking at the dataset :

  1. Discounts: Discounts nullify the effect of any change in the Price policy. So in the end the net sales don't follow this variation. And i observe this for so many client-product pairs.

  2. Seasonality : Variation of quantity for client-product pairs simply follows a seasonality pattern and its not driven by any of the timeseries variables in the dataset. ( I should statistically verify this for now i just did a visual check).

Because at this point i dont see any logic behind how discounts are decided for the clients. Hence there is literally no affect on net sales vs price changes.

How should I model this ? Is this even a machine learning problem because there is simply no causal relation between the variables. If not Price vs Demand then what other things can I propose to the business ?

Edit : 1. Product-client scatter plots UNITARY_NET_SALES VS QUANTITY

enter image description here The first column of the plot shows products are demanded at the same quantities across varying net sales. So no price vs demand effect here.

  1. Timeseries plot for a particular product

enter image description here

Price_list and Discounts have the same behaviour. So whenever the business increases the prices they increase the discounts too, hence the overall affect on net sales is none. And Quantity simply follows a seasonal pattern.

  1. Describe on the columns enter image description here

75% of Quantity is less than 8 units !

Thank you !

Foi útil?

Solução

I've tangled with modeling pricing systems over the last two years and one of my key learnings applies here:

Available sales data is often a bad basis for straight-forward prediction tasks and the reason for this is fairly simple:

If you classify all prices (or transactions of a given product at a price x) into "Accepted" and "Not accepted" by the customer you will realize that the data provided by your customer only contains instances of "Accepted" prices.

Therefore a straight-forward modeling of y ~ x with y = demand and x = price is impossible because your y does not vary!

There are several ways around this however. In my comment I mentioned discount being a valuable information!

You have already noticed that discounts are not depended on logical variables, they are seemingly random, this isn't true!

Discounts in most organizations are very, very flexible and often applied manually based on negotiations. This means they are a great indicator of our target y "Acceptance"/"Non Acceptance".

Consider this:

Discount = Demand x Undiscounted_Price

This means that large discounts indicate that the demand is low / negative for the undiscounted price and low discounts indicate demand is high.

To truly discover this relation you might need to model codependent factors and then remove them by training new models on the residuals.

Edit:

An important thing to add especially for the B2B domain. Demand for a certain product is almost set in stone for a customer. Unlike consumers companies do not buy surplus or refrain from purchase due to the prices.

What they do is switch suppliers! This means that there is a really important unknown variable "Customer Demand for product X". You do not want model this variable but you need it to model what you actually want to do:

Share of Wallet or the percentage of the fixed customer demand that was satisfied by your company with the goal being to identify the price that will optimize that percentage. This is helpful to keep in mind because it constrains the performance of any model as you never know whether the historic demand you have in your data is already 0% or 100% of the total demand and thus could not decrease/increase regardless of price.

Licenciado em: CC-BY-SA com atribuição
scroll top