complexity of constructing an inverted index list

https://stackoverflow.com/questions/12294304

30-06-2021
|

Question

Given n strings S1, S2, ..., Sn, and an alphabet set A={a_1,a_2,....,a_m}. Assume that the alphabets in each string are all distinct. Now I want to create an inverted-index for each a_i (i=1,2...,m). My inverted-index has also something special: The alphabets in A are in some sequential order, if in the inverted-index a_i has included one string (say S_2), then a_j (j=i+1,i+2,...,m) don't need to include S_2 any more. In short, every string just appears in the inverted list only once. My question is how to build such list in a fast and efficient way? Any time complexity is bounded?

For example, A={a,b,e,g}, S1={abg}, S2={bg}, S3={gae}, S4={g}. Then my inverted-list should be:

a: S1,S3
b: S2     (since S1 has appeared previously, so we don't need to include it here)
e: 
g: S4

Solution

If I understand your question correctly, a straightforward solution is:

for each string in n strings
    find the "smallest" character in the string
    put the string in the list for the character

The complexity is proportional to the total length of the strings, multiplying by a constant for the order testing.

If there is a simple way for testing, (e.g. the characters are in alphabetical order and all lower-case, a < will be enough), simply compare them; otherwise, I suggest using a hash table, each pair of which is a character and its order, later simply compare them.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow