Simple maths for an algorithm to searching a table to find out if one type of entry has a causal effect upon another type?

StackOverflow https://stackoverflow.com/questions/22517487

Вопрос

OK so I have a database table which is recording two different types of events which seem to be related to one another. It records the date and time the event occurred, which type of event it was, and finally a description of the event.

So four columns in the table -

Date, Time, Type and Description.

So for the two different types of record in the table, I want to find out if a 1 or more specific Type-1 events are having the effect of creating a specific Type-2 event.

There is definitely a time delay between a certain Type-1 event occurring, and it causing a certain Type-2 event. So I'm starting with the time delay as a variable, set at 3 hours.

Also I am isolating the Type-2 event I think is caused by a Type-1 events. Lets call it Type-2F for example.

My initial thoughts were to do a first query on the table to list all the Type-1 events, and then do a second query just listing the occurrences of the Type-2F events.

Then I would:

  1. Iterate to the first Type-2F event and note the date and time.
  2. Then iterate through all the Type-1 events one by one, comparing the date and time to this first single Type-2F event, and if the Type-1 event occurs within the 3 hours before the Type-2F event, then adding a score to that SPECIFIC Type-1 event of +1.
  3. Then iterating to the second Type-2F event, and repeating the process of point 2.
  4. And continuing until all the Type-2F events have been iterated through(and for each Type-2F event, all the Type-1 events in turn iterated through), and +1 scores assigned to specific Type-1 events.
  5. Then go back through the list of all the Type-1 events and look at all those events which did not receive a +1 score, and give them a -1 score, as they clearly did not have the effect of creating a Type-2F event within 3 hours.

Finally I add up all the +1 and -1 scores for all the specific Type-1 events, and assuming there are say 26 types of Type-1 event, and many occurrences of each in the table, then I would end up with a score board with the highest numbers meaning they were the most likely to have caused the Type-2F event.

For example: Type-1K = +125 | Type-1B = +56 | Type-1Z = +13 | Type-1T = -35 etc...

So from this result I would take it that it is the Type-1K events which are most likely to be causing the Type-2F events (within a 3 hour limit).

I know this is very simplistic maths, but does this sound like a reasonable approach?

Many thanks.

Это было полезно?

Решение

You are actually stumbling into an entire field of Math and Science where people earn their full-time living discerning the truth and likelihood behind specific subsets of these types of questions. In biological systems, for example, you would be looking for someone in Bio-Statistics or Bio-Informatics. Depending on what you are trying to demonstrate and how much confidence you want to have in your answer, you may be completely missing important aspects of the inquiry, like for example, the distinction between correlation, relationship, and a causal relationship.

I'm not sure you are going to get an answer that is sufficiently insightful on a stack exchange site, but in any case, this is not the right site for a statistics question. You might try math.stackexchange.com

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top