I dont know where to begin: mysql select with inner selects on iterations too slow with bigger tables ( 4 tables on 2 databases)

dba.stackexchange https://dba.stackexchange.com/questions/264879

Question

Kontext is easy:

 A) A [discussion-thread.db1] has many [Posts.db1] 
 B) each Post has an [author.db2] and 1 to 10 [attachments.db1]

Additional Information: discussion, posting & attachment's-meta-data are on one database - the details of the author (= user) are on another database, maybe located physically at the other side of the planet

at the moment, I do have an "outer" SELECT for every posting and (up to) 11! iterative inner SELECTS:

SELECT Posts WHERE thread-id = $thread_row_id => WHILE $posts_row FETCH{
  SELECT author.db2 WHERE $posts_row_userId

    make the SELECT of attachments.db1 if $posts_row_a1, a2 ... a10 has avalue:
    (means up to 10x-iteration for the attachment-SELECT):
   SELECT attachments WHERE $post_row n
   nxt iteration

}

... this works for a mock-up, but with some more postings with up to 10 attachments on each posting this thing breaks. Unfortunately, I do not know how I can consolidate this into one query, so I can fetch over this array.

I would break the problem into two aspekts:

  1. the -up to 10- Attachments.db1 for every Posts.db1 entry
  2. the author.db2 for each of the Posts.db1 entries

To 1.) Do I need to make a JOIN for every single attachment (I do have a fixed value of maximum 10 attachments.id fields in the Posts.db1 table - so 10 little JOINs of the attachments.db1 table WHERE id = Posts.db1.a1 to Posts.db1.a10 could be done.

but to 2.) a) the postings - and even more the authors of a thread are in an overseeable number; b) but the database is another one - when running in docker swarm mode this could be physically located in another continent:

=> might it be a good idea to make an extra query to that database prior and get all the needed user-data of the authors into a sub-array - and joining them server-side?

---------- start big edit: adding the CREATE & SELECT queries------------

Background: its a discussion-board for project-collaboration for single SCRUM-teams (part of my remote scrum application), so: the load is not too heavy. (the discussionthreads are task-related, so a maximum of 50 postings with each 5 attachments would be a realistic metric (a more realistic average is probably 10 postings with each 3 attachments per thread)

Info: discussionthreads contain postings - postings each contain a userprofile AND up to 10 attachments

I) CREATE TABLES (with discussionthreads, postings & attachments are on the projectdatabase and userprofiles are on the metadatabase)

a) the discussion-threads (the base-query):

 $sql = CREATE TABLE IF NOT EXISTS discussionthreads(
ID int(255) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
Pid int(255) NOT NULL DEFAULT '0',
topic varchar(255) NOT NULL DEFAULT 'not set',
dscrption varchar(255) NOT NULL DEFAULT 'not set',
closed int(3) NOT NULL DEFAULT '0',
created timestamp NOT NULL,
usrID int(8) NOT NULL DEFAULT '0',
timestamp varchar(255) NOT NULL DEFAULT 'not set',
value int(5) NOT NULL DEFAULT '0',
item int(8) NOT NULL DEFAULT '0',
sprintid int(4) NOT NULL DEFAULT '0',
cmplxty int(3) NOT NULL DEFAULT '0',
innovation int(3) NOT NULL DEFAULT '0',
riskstmnt varchar(2550) NOT NULL DEFAULT '0',
dependsup int(3) NOT NULL DEFAULT '0',
dependsdwn  int(3) NOT NULL DEFAULT '0',
depstmnt varchar(2550) NOT NULL DEFAULT '0',
status int(8) NOT NULL DEFAULT '0'
);

b) the postings (WHERE postings.tid = discussionthreads.ID):

$sql = CREATE TABLE IF NOT EXISTS postings(
ID  int(255) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
usrsID int(8) NOT NULL,
tid int(255) NOT NULL,
Pid int(255) NOT NULL,
activ int(2) NOT NULL DEFAULT '1',
reAW int(255) DEFAULT NULL,
project varchar(133) NOT NULL DEFAULT 'not set',
wbsnom varchar(133) NOT NULL DEFAULT 'not set',
wbsnr varchar(33) NOT NULL DEFAULT '0',
actnom varchar(133) NOT NULL DEFAULT 'not set',
actnr varchar(33) NOT NULL DEFAULT '0',
username varchar(133) NOT NULL,
email varchar(133) NOT NULL DEFAULT 'not set',
topic varchar(133) NOT NULL DEFAULT 'not set',
text text NOT NULL,
created timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
qr text DEFAULT NULL,
im text NOT NULL,
ap text NOT NULL,
ss text DEFAULT NULL,
a1 int(2) DEFAULT NULL,
a2 int(2) DEFAULT NULL,
a3 int(2) DEFAULT NULL,
a4 int(2) DEFAULT NULL,
a5 int(2) DEFAULT NULL,
a6 int(2) DEFAULT NULL,
a7 int(2) DEFAULT NULL,
a8 int(2) DEFAULT NULL,
a9 int(2) DEFAULT NULL,
a10 int(2) DEFAULT NULL
);

c) the attachments (note: every posting can have up to 10 attachments assigned/JOINED via postings.a1, postings.a2 ... postings.a10)

*(WHERE attachments.id = postings.a1 / attachments.id = postings.a2 / .... / attachments.id = postings.a10)*

$sql = CREATE TABLE IF NOT EXISTS attachments(
id int(11) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
filestoredas varchar(255),
filelocation varchar(255),
fname varchar(255),
fdescription varchar(255),
ftype varchar(255),
fsize varchar(64),
fext varchar(64),
wbsid varchar(255),
wbsname varchar(255),
wbsdeliver varchar(255),
threadid varchar(255),
misc2 varchar(255),
miscn varchar(255),
timestored varchar(255) NOT NULL
); 

d) the user-profiles (its the only query from another database)

 $sql ="CREATE TABLE IF NOT EXISTS userprofiles(
id  int(16) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
usrsID int(16),
username varchar(133) NOT NULL,
email varchar(255) NOT NULL,
image_type varchar(25) NOT NULL DEFAULT 'nopic',
image longblob,
image_size varchar(25) NOT NULL DEFAULT '''''',
avatar varchar(133) NOT NULL DEFAULT '''''',
image_type2 varchar(25) DEFAULT 'nopic',
image2 longblob,
image_size2 varchar(25) NOT NULL DEFAULT '''''',
avatar2 varchar(133) NOT NULL DEFAULT '''''',
fname varchar(255) NOT NULL DEFAULT 'myFirstName',
lname varchar(255) NOT NULL DEFAULT 'myLastName',
phone varchar(255) NOT NULL DEFAULT '+00 0000 000-00',
daytime varchar(32) NOT NULL DEFAULT 'daytime',
tzone varchar(64) NOT NULL DEFAULT 'timezone',
position varchar(666) NOT NULL DEFAULT 'MyRole MyPosition in this project',
skills text NOT NULL DEFAULT 'MySkills related to MyRole',
interestedin text NOT NULL DEFAULT 'MyInterests private or business',
comment text NOT NULL DEFAULT 'No Stereotypes, but myCulture, myPassion, myPreferences',    
linklist text NOT NULL DEFAULT 'Some links. For work, for fun, for sharing some interests...',
thorie_nearness varchar(11) NOT NULL DEFAULT '0',
thorie_risk varchar(11) NOT NULL DEFAULT '0'
);  

II) perform queries

The single queries (as mentioned: a) two databases; b) at the moment I server-sided simply iterate over - and within the iterations I do have sub-queries..)

A) from projectdatabase:

SELECT * FROM discussionthreads WHERE ID=~url-tid-param LIMIT 1

SELECT * FROM postings WHERE tid=discussionthreads.ID ORDER BY created

SELECT * FROM attachments WHERE id = postings.a1 LIMIT 1
SELECT * FROM attachments WHERE id = postings.a2 LIMIT 1
SELECT * FROM attachments WHERE id = postings.a3 LIMIT 1
SELECT * FROM attachments WHERE id = postings.a4 LIMIT 1
SELECT * FROM attachments WHERE id = postings.a5 LIMIT 1
SELECT * FROM attachments WHERE id = postings.a6 LIMIT 1
SELECT * FROM attachments WHERE id = postings.a7 LIMIT 1
SELECT * FROM attachments WHERE id = postings.a8 LIMIT 1
SELECT * FROM attachments WHERE id = postings.a9 LIMIT 1
SELECT * FROM attachments WHERE id = postings.a10 LIMIT 1

B) from metadatabase

SELECT * FROM userprofiles WHERE id = postings.usrsID LIMIT 1

edit2:

  • I am running it via docker / docker-compose - the Dockerfile for the databases:
FROM mariadb:10.4

just FYI (may be..): the Containers for the web-app are all built from a Dockerfile starting with:

FROM php:7.2-apache
...
Was it helpful?

Solution

Lesson 1: How to do one-to-many relationship for your attachments.

Do not have an array of things (a1..a10) in a table. Instead, have a link from the other table back to the main table. More specifically:

CREATE TABLE attachments (
    ...
    post_id INT NOT NULL  -- for joining to postings via postings.ID
    ...

Indexes...

I assume each table has PRIMARY KEY(id) and id is an AUTO_INCREMENT. Or you have some column (or combination of columns) that comprises the PRIMARY KEY.

comment/answer from questioner: yes, that's right. id or ID is primary key auto_increments, no other columns are indexed

You also need, in attachments, INDEX(post_id).

With those, you can go from a post to all its attachments:

       FROM posts AS p
       JOIN attachments AS a  ON a.post_id = p.id

and go the other direction, namely from an attachment to the post that it belongs to:

       FROM attachments as a
       JOIN posts AS p  ON a.post_id = p.id

If, instead, an attachment can be associated with many posts, you need a many-to-many mapping table: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table

comment/ answer from questioner: yes, the latter the case: attachments are added from a sharepoint-like project-content-management-system - so: each of these files can and will be used several times for many posts and from many users...

Next...

Fix that relationship in your schema and start a new Q&A on some other topic. It is not proper to have more than one question at a time in this Forum. More specifically, Docker and Join are totally orthogonal to each other. I have addressed part of the JOIN question; I hope that getting you on the right track will help you focus on the next Question in that area. Meanwhile, if you have Docker questions (or other connectivity questions), start a separate Question for that.

comment /answer from questioner: I only wanted to share some context - I was not sure if it is of any interes that he databases are running via docker. (its all working fine - locally as well as deployed on a server somewhere in cyprus.. only this specific query ...). My next approach might be to use an intermediate_table instead of the a1,a2...a10 columns to connect attachments with the posts (with an indexed postid column, because the SELECT WHERE intermediate_table.postid = postings.ID) ...

So, for many-to-many:

CREATE TABLE PostsAttachments (
    post_id ...,
    att_id ...,
    PRIMARY KEY(post_id, att_id),
    INDEX(att_id, post_id)
    ) ENGINE = InnoDB;

and toss a1...a10. And do not have att_id in the post table, nor post_id in att table.

And it will take 2 JOIN to connect a post with all its attachments (or vice versa).

Use the syntax db1.post and db2.att and, depending on where you up this mapping table, the appropriate dbX to qualify it. Databases are a convenience to the schema designer; there is no impact on performance, etc.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top