Storing an Inverted index in mysql
-
14-07-2021 - |
题
I am working to create a very big inverted index terms. What method would you suggest?
First
termId - > docId
a doc2[locations],doc5[locations],doc12[locations]
b doc5[locations],doc7[locations],doc4[locations]
Second
termId - > docId
a doc2[locations]
a doc5[locations]
a doc12[locations]
b doc5[locations]
b doc7[locations]
b doc4[locations]
p.s Lucene is not an option
解决方案
The right table design depends on how you plan on using the data. If you plan on using strings like "doc2[locations],doc5[locations],doc12[locations]"
as is -- without any further postprocessing, then your First
design is fine.
But if -- as your question tacitly suggests -- that you may at times want to regard doc2[locations]
, doc5[locations]
, etc. as separate entities, then you should definitely use your Second
design.
Here are some use cases which show why the Second
design is better:
If you use
First
and ask for all docs withtermID = a
then you get back a string likedoc2[locations],doc5[locations],doc12[locations]
which you then have to split.If you use Second, you get each doc as a separate row. No splitting!
The
Second
structure is more convenient.Or, suppose at some point
doc5[locations]
changes and you need to update your table. If you use theFirst
design, you'd have to use some relatively complicated MySQL string function to find and replace the substring in all rows that contain it. (Note that MySQL does not come with regex substitution built in.)If you use the
Second
design, updating is easy:UPDATE table SET docId = "newdoc5[locations]" where docId = "doc5[locations]"