Question

I have a private messaging system for my users that I've created in php with a mysql backend. The system deletes old messages but generally holds over 500,000 messages. Currently all of the data is included in one table:

message_table
    message_id (int 11)
    message_from_id (int 11)
    message_to_id (int 11)
    message_timestamp (int 11)
    message_subject (varchar 50)
    message_text (text)

The majority of messages are very short so I'm considering changing the system to:

message_table
    message_id (int 11)
    message_from_id (int 11)
    message_to_id (int 11)
    message_timestamp (int 11)
    message_subject (varchar 50)
    message_short_body (varchar 50)
    message_text_id (int 11)

text_table
    text_id (int 11)
    text_body (text)

Then if a short message is entered it will be entered under 'message_short_body' and if longer will be added to 'text_table' and the 'text_id' stored as 'message_text_id'. When messages are access I would then have something like:

SELECT * FROM message_table LEFT JOIN text_table ON text_table.text_id = message_table.message_text_id IF message_table.message_text_id != 0 WHERE message_table.message_to_id = $user_id

I added "IF message_table.message_text_id != 0" and don't know if something like that is possible.

As a general rule is it possible to tell if this would reduce the size of the database / speed up queries ?

Was it helpful?

Solution

I added "IF message_table.message_text_id != 0" and don't know if something like that is possible.

Unless there actually is a row with text_id = 0 in your text_table, there is no need to do this. Simply omit the IF and use the following query:

SELECT IFNULL(text_table.text_body, message_table.message_short_body) AS body,
       …
FROM message_table
LEFT JOIN text_table ON text_table.text_id = message_table.message_text_id
WHERE message_table.message_to_id = $user_id

In terms of performance, it might be that the engine can optimize things more efficiently if you add your condition to the join conditions:

SELECT IFNULL(text_table.text_body, message_table.message_short_body) AS body,
       …
FROM message_table
LEFT JOIN text_table ON text_table.text_id = message_table.message_text_id
                    AND message_table.message_text_id != 0
WHERE message_table.message_to_id = $user_id

You could also try an approach using a subquery:

SELECT IF(message_text_id = 0, message_short_body, (
  SELECT text_table.message_short_body
  FROM text_table
  WHERE text_table.text_id = message_table.message_text_id)) AS body,
       …
FROM message_table
WHERE message_table.message_to_id = $user_id

This has the benefit of not executing the search in text_table if none is required, but the drawback of performing a separate query for each case with a long message. I would expect the above queries to be superior, but I'm not sure.

As a general rule is it possible to tell if this would reduce the size of the database / speed up queries ?

You'll have to benchmark, as it depends on the use case. If most of your queries retrieve data from the fields other than the text, then the smaller table will make those queries faster, yielding a performance gain. If, on the other hand, you usually want the body along withe the rest of the message, then you'll likely end up with worse performance.

You should also use benchmarks to distinguish between the different alternatives described above.

In terms of size of the database, you'll likely see an increase: the storage requirements for the text data are about the same, but the indices for the extra table will cost you.

I guess if this were my schema, I'd drop the message_text_id and instead have primary key of the text_table match that of the message_table. I.e. each key occurs either only in the message table or in both tables, and rows with the same key belong together. Whether or not the message is in the other table could be encoded by setting message_table.message_short_body to NULL in these cases.

OTHER TIPS

I added "IF message_table.message_text_id != 0" and don't know if something like that is possible.

The query you are looking for is like this:

SELECT
  IFNULL(t.text, m.short_text) AS text
  -- other columns may follow
FROM messages2 m
LEFT JOIN texts t on m.text_id = t.id
WHERE to_id = A_USER_ID

As a general rule is it possible to tell if this would reduce the size of the database / speed up queries ?

Yes it is possible! One can at least just test it. I've done that. I've created a test scenario with a message table with 500.000 entries. Every 10th of them has a long text. The message from_id and to_id are selected from a random of 50 users.

Part 1 : Speed

The second attempt, using a separate texts table, will give a BIGGGGGG speed up. The average query time for the first attempt was ~1.6 seconds. The second only ~0.28 seconds!!!!

To answer the question: Yes it is faster! :)

Part 2 : Database Size

The size of the database will slightly grow as one may have already been expected. The additional indexes from texts let my database grow at about ~10%

Conclusion: Storing big texts in separate table is a good idea. It will - in your case - improve query performance up to 80% with a slight more disk cost of ~10%.

Try this:

SELECT *, IFNULL(tt.text_body,  mt.message_short_body) textBody 
FROM message_table mt 
LEFT JOIN text_table tt ON tt.text_id = mt.message_text_id 
WHERE mt.message_to_id = $user_id;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top