Given a 2D lattice with coordinates $1 \leq x \leq c$ and $1 \leq y \leq d$, we define $f(x, y) = xy$.

We wish to find a boolean function $I(x,y)$ that determines in $O(1)$ time whether or not $(x,y)$ belongs to the set of points $S$ of size $k$ such that the sum $Z(S) = \sum_{(x,y) \in S} f(x,y)$ is smaller or equal to any other set of size $k$. One may use $O(c+d+k)$ time and space to construct $I$.

Is this possible? Is this a known problem (my search turned up nothing)? Can https://en.wikipedia.org/wiki/Divisor_summatory_function and its approximation help us?

Motivation: I work in NLP and am trying to find an optimal way of storing part of a word-word cooccurrence matrix in memory and part on disk. This matrix is very sparse. I'm making the simplifying assumption that the probability of two works co-occurring is proportional to their unigram frequencies. By ranking words in terms of frequency, we get the $c$ and $d$ terms. So the smaller the ranks $c$ and $d$, the more likely they are to co-occur, so this value should be stored in memory. Since there will be billions of lookups, $I$ needs to be fast.

Thanks!

没有正确的解决方案

许可以下: CC-BY-SA归因
不隶属于 cs.stackexchange
scroll top