Вопрос

I'm executing the query from SQLite Expert Personal on a local DB that is about 6mb in size. My machine is running Windows 7 with 4GB ram on an Intel i5 CPU (3.1GHz). I would expect this to complete within seconds because everything is local, but for some reason it executes in 277667 ms (around 4.5 minutes). Any ideas on why this takes so long for a relatively small data set? Let me know if you need any more information

TABLE "userlist" contains about 7k records and 4 columns
TABLE "employeeinfo" contains about 30k records and 8 columns

Query:

CREATE TABLE join1 AS
SELECT a.appname AS APPNAME, a.appid AS APPID,  a.perm AS PERMS, a.holdflag AS HOLDFLAG, b.FirstName AS USERFIRST, b.LastName AS USERLAST, b.DeptName AS USERDEPT,
b.TermDate AS USERTERMDATE, b.logonid AS USERHRLOGON, b.empnum AS USEREMPNUM, b.persontype AS USERPERSONTYPE, b.mgrlogonid AS MGRHRLOGON
FROM userlist AS a
LEFT JOIN
employeeinfo AS b
ON a.appid LIKE b.logonid;

UPDATE: My execution time went from 4.5 minutes down to 110 milliseconds after doing the following:

  1. Imported my data again, with the join columns(a.appid, b.logonid) both being all lowercase, so I could use '=' instead of LIKE
  2. Created an index on a.appid and b.logonid
  3. Increased the PRAGMA cache_size from 2000 (default) to 100000. I just read this only lasts for the current session. When I closed and reopened my DB, the cache_size was indeed back to 2000
  4. Increased the PRAGMA page_size from 1024 to 4096 (apparently this only makes a difference if you declare before creating your first table, can someone confirm?)
  5. Changed PRAGMA journal_mode from 'delete' to 'wal'. I then had to change this to 'memory' because the 'wal' type is not compatible with the version of Python I'm using (2.7.5) 6.
Это было полезно?

Решение

The expression a.appid LIKE b.logonid cannot be optimized, not even with a case-insensitive index on appid. Therefore, the database has to check every record in userlist against every record in employeeinfo, so there are 7K × 30K = 210M comparisons.

You should ensure that the strings in these table have a canonical capitalization so that you can use a plain = comparison. Alternatively, create an additional column where you store a lower-case version of the string.

Другие советы

You're not specifying wildcards so don't use LIKE in your ON condition - change it to =, and make sure that employeeinfo.logonid is indexed.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top