Calculating relative frequencies in SQL

https://stackoverflow.com/questions/18166219

24-06-2022
|

Question

I am working on a tag recommendation system that takes metadata strings (e.g. text descriptions) of an object, and splits it into 1-, 2- and 3-grams.

The data for this system is kept in 3 tables:

The "object" table (e.g. what is being described),
The "token" table, filled with all 1-, 2- and 3-grams found (examples below), and
The "mapping" table, which maintains associations between (1) and (2), as well as a frequency count for these occurrences.

I am therefore able to construct a table via a LEFT JOIN, that looks somewhat like this:

SELECT mapping.object_id, mapping.token_id, mapping.freq, token.token_size, token.token
FROM mapping LEFT JOIN
     token
     ON (mapping.token_id = token.id)
WHERE mapping.object_id = 1;

  object_id   token_id   freq   token_size   token
+-----------+----------+------+------------+--------------
  1           1          1      2            'a big'
  1           2          1      1            'a'
  1           3          1      1            'big'
  1           4          2      3            'a big slice'
  1           5          1      1            'slice'
  1           6          3      2            'big slice'

Now I'd like to be able to get the relative probability of each term within the context of a single object ID, so that I can sort them by probability, and see which terms are most probably (e.g. ORDER BY rel_prob DESC LIMIT 25)

For each row, I'm envisioning the addition of a column which gives the result of freq/sum of all freqs for that given token_size. In the case of 'a big', for instance, that would be 1/(1+3) = 0.25. For 'a', that's 1/3 = 0.333, etc.

I can't, for the life of me, figure out how to do this. Any help is greatly appreciated!

La solution

If I understood your problem, here's the query you need

select
    m.object_id, m.token_id, m.freq,
    t.token_size, t.token,
    cast(m.freq as decimal(29, 10)) / sum(m.freq) over (partition by t.token_size, m.object_id)
from mapping as m
    left outer join token on m.token_id = t.id
where m.object_id = 1;

sql fiddle example

hope that helps

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow