finding relationships between a lot of variables

https://stackoverflow.com/questions/21885593

13-10-2022
|

Question

I'm going to keep this simple for the sake of the question but I will try to explain my issue as simple as possible.

I have a fairly large project I am working on with over 1000 variables. The user picks page's they want to fill out and in the order they choose. Each page has about 200 variables in it.

The average user has about 10 pages that they fill out.

Here is the tricky part I am trying to solve. There are a lot of variables on each page that are related to other variables throughout the same page but, more imporantly, on other pages that the user may decide to use as well. There is not one particular page a user will definitely use, it just depends on their preference.

So relationships look something like

page1_address == page2_address == page3_address == page4_address == etc.

page1_total = page1_var1 + page1_var2;
page3_total = page1_total + page2_var1;

if (page6_var1 > 0) &&  (page6_var2 < 10)
then page3_super = "something important"

So sometimes the relationships are just based on whether they are the same. But sometimes I want to find relationships between variables that are a little bit more complicated. Obviously this only works if all the variables are present.

So my question is, is there a particular way that I should go about creating these relational rules between variables?

What is the appropriate way of performing checks such as if a user filled out page4 first and then I need to auto fill page1 with the relational variables such as the address? What if I need to do a more complicated check but in the beginning not all the variables are present but now they are?

I'm not looking for a fool proof method, I know this is on a case by case basis. I am just looking for a direction to move towards. Any guidance would be appreciated.

Solution

You can represent your data as N-ary trees like here: N-ary trees. Or you can tyr Decision trees, may be they are more suitable for you: Decision trees. For Decision trees and data minig from them there are many great books and online courses and tutorials. You need to have at least basic experience with statistics.

Edit (from comments below):

Very good book is Data Mining with Decision Trees: Theory and Applications by Lior Rokach. Then you can try also Data Mining: Practical Machine Learning Tools and Techniques, Third Edition. If you want to do complex analysis of your data you need to study machine learning, statistics etc. But if you are not familiar with it, you must start somewhere and that is: Data structures and Statistics.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow