Deep learning - rule generation

https://datascience.stackexchange.com/questions/12808

16-10-2019
|

Question

I wanted to know if there is any methodology in Deep/Machine learning, where given a set of input/output values, it can derive rules for the same.

Lets say I generate training input and output by $y=x^2$

i/p  |  o/p
 0       0
 2       4
 .       .
1000   1000000

It sort of generate rule like, $y=x*x$

Solution

One way of stating what you are looking for is to find a simple mathematical model to explain your data.

One thing about neural networks is that (once they have more than 2 layers, and enough neurons total) they can in theory emulate any function, no matter how complex. This is useful for machine learning as often the function we want to predict is complex and cannot be expressed simply with a few operators. However, it is kind of the opposite of what you want - the neural network behaves like a "black box" and you don't get a simple function out, even if there is one driving the data.

You can try to fit a model (any model) to your data using very simple forms of regression, such as linear regression. So if you are reasonably sure that your system is a cubic equation $y= ax^3 + bx^2 +cx +d$ then you could create a table like this:

  bias   |   x  |  x*x  |  x*x*x  |     y
     1       0       0         0        0
     1       2       4         8        4
     1       3       9        27        9
     1       .         .        .       .
     1     100   1000000    10000   10000

and then use a linear regression optimiser (sci-kit learn's SGD optimiser linked). With the above data this should quickly tell you $b=1, a,c,d=0$. But what it won't tell you is whether your model is the best possible or somehow "correct". You can scan for more possible formulae by creating more columns - any function of any combination of inputs (if there is more than one) that could be feasible.

However, the more columns you add in this way, the more likely it is you will find an incorrect overfit solution that matches all your data using a clever combination of parameters, but which is not a good general predictor. To address this, you will need to add regularisation - a simple L1 or L2 regularisation of the parameters will do (in the link I gave to scikit-learn, the penalty argument can control this), which will penalise large parameters and help you home in on a simple formula if there is one.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange