Should we store the Text & the Tag of an HTML String into 1 same column or into 2 seperate columns?

StackOverflow https://stackoverflow.com/questions/16324414

  •  13-04-2022
  •  | 
  •  

Question

I want to store the following HTML string into DB <b>This</b> is <i>my text</i>. There are 2 parts of this Html string:

  • The Text: "This is my text"
  • The Tags: <b>, </b>, <i>, </i>

In my opinion, there are 2 options to store the above Html string.

  • Option 1: Storing the Text & the Tags into 1 same column, so the TextTag column will have <b>This</b> is <i>my text</i>.

  • Option 2: Storing the Text & the Tags into 2 separate columns, so the Text column will have "This is my text" & the Tag column will have <b>x</b> x <i>x x</i>

What is the advantages & disadvantages of option 1 & 2.

I think, option 1 is easy to code but it will be hard when we do the search function since we need not to search the Tag part, so we have to use some kind of Regex in search which may slow down the query.

Option 2 is, on other hand, is harder to code, since we have to map the Tag columns to the Text column but easy to do he searching since we do not have to use Regex in search. Besides, option 2 cost a bit more traffic than option 1 since there some extra "x" in he query. But I believe Option 2 is more safe than 1, since we only store text not html in the main field so it could be safer.

If you are DB specialist, then what is your decision?

Was it helpful?

Solution

Though we are not DB specialist, in our opinion, 1> you should extract the main text from the content using any html extraction library. 2> Store the extracted content in main field, on which you are searching. 3> Store the whole value with HTML Tags i.e.

<b>This</b> is <i>my text</i> in the second field. Because if you have to render the HTML latter, then you will not have any extra burden of programming. You can simply look up in the main field with the search and then return the required html segment as it is.

Thanks

OTHER TIPS

I think the best option is a modified version of you OPTION-2, where in column 1 you store the text part of your data, and in column two you store a STRING which you use in BINARY FORMAT where 1 or Y or X = TRUE and 0 or N or O = FALSE, and in that way you can store much more than simply if BOLD or ITALICS needs to be applied.

For example if your second column was 10 bytes long, you could store the formatting options of ten different items like for example, Bold, Italics, Underline, Horizontal (Left, Right, Center) Vertical (Top Middle Bottom) or anything that you can quantify into one character or one byte.

To further explain See the diagram below.

enter image description here

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top