왜 내 tsv 지수는 사용하지 않는?

https://dba.stackexchange.com/questions/116549

29-09-2020
|

문제

을 postgresql 전체 텍스트 검색 기능을 작동합니다.

나는 두 테이블에 하나는 내가 방금 만든 테스트를 위해,실제 중 하나는 내가 할 수 있 검색:

테이블:

webarchive=# \d test_sites
                            Table "public.test_sites"
   Column    |   Type   |                        Modifiers
-------------+----------+---------------------------------------------------------
 id          | integer  | not null default nextval('test_sites_id_seq'::regclass)
 content     | text     |
 tsv_content | tsvector |
Indexes:
    "test_sites_pkey" PRIMARY KEY, btree (id)
    "idx_test_web_pages_content" gin (tsv_content)
Triggers:
    web_pages_testing_content_change_trigger AFTER INSERT OR UPDATE ON test_sites FOR EACH ROW EXECUTE PROCEDURE web_pages_testing_content_update_func()

"진짜"테이블:

webarchive=# \d web_pages
                                      Table "public.web_pages"
    Column    |            Type             |                       Modifiers
--------------+-----------------------------+--------------------------------------------------------
 id           | integer                     | not null default nextval('web_pages_id_seq'::regclass)
 state        | dlstate_enum                | not null
 errno        | integer                     |
 url          | text                        | not null
 starturl     | text                        | not null
 netloc       | text                        | not null
 file         | integer                     |
 priority     | integer                     | not null
 distance     | integer                     | not null
 is_text      | boolean                     |
 limit_netloc | boolean                     |
 title        | citext                      |
 mimetype     | text                        |
 type         | itemtype_enum               |
 raw_content  | text                        |
 content      | text                        |
 fetchtime    | timestamp without time zone |
 addtime      | timestamp without time zone |
 tsv_content  | tsvector                    |
Indexes:
    "web_pages_pkey" PRIMARY KEY, btree (id)
    "ix_web_pages_url" UNIQUE, btree (url)
    "idx_web_pages_content" gin (tsv_content)
    "idx_web_pages_title" gin (to_tsvector('english'::regconfig, title::text))
    "ix_web_pages_distance" btree (distance)
    "ix_web_pages_distance_filtered" btree (priority) WHERE state = 'new'::dlstate_enum AND distance < 1000000
    "ix_web_pages_priority" btree (priority)
    "ix_web_pages_type" btree (type)
    "ix_web_pages_url_ops" btree (url text_pattern_ops)
Foreign-key constraints:
    "web_pages_file_fkey" FOREIGN KEY (file) REFERENCES web_files(id)
Triggers:
    web_pages_content_change_trigger AFTER INSERT OR UPDATE ON web_pages FOR EACH ROW EXECUTE PROCEDURE web_pages_content_update_func()

여분의 비트는 고객께,두 가 content 열,그리고 tsv_content 열 gin() index 니다.가 있는 트리거를 업데이트 tsv_content 열 때마 content 열 수정합니다.

Note 기타 gin 인덱스 작동,그리고 사실 처음 했 gin (to_tsvector('english'::regconfig, content::text)) index 내용에 열을 뿐만 아니라,대신 두 번째 열,그 후 기다리고 있는 인덱스를 재건하에서 몇 번 시험하고,내가 결정을 사용하여 별도의 열전-저장 tsvector 값입니다.

쿼리 실행에 대한 테스트는 테이블을 사용하여 인덱스를 같이 내가 기대하는 것:

webarchive=# EXPLAIN ANALYZE SELECT
    test_sites.id,
    test_sites.content,
    ts_rank_cd(test_sites.tsv_content, to_tsquery($$testing$$)) AS ts_rank_cd_1
FROM
    test_sites
WHERE
    test_sites.tsv_content @@ to_tsquery($$testing$$);
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on test_sites  (cost=16.45..114.96 rows=25 width=669) (actual time=0.175..3.720 rows=143 loops=1)
   Recheck Cond: (tsv_content @@ to_tsquery('testing'::text))
   Heap Blocks: exact=117
   ->  Bitmap Index Scan on idx_test_web_pages_content  (cost=0.00..16.44 rows=25 width=0) (actual time=0.109..0.109 rows=143 loops=1)
         Index Cond: (tsv_content @@ to_tsquery('testing'::text))
 Planning time: 0.414 ms
 Execution time: 3.800 ms
(7 rows)

그러나, 동 쿼리가 실제 테이블에는 것을 결코 보이지 않는 결과 아무것도 지 기존의 일반 순차적으로 검색:

webarchive=# EXPLAIN ANALYZE SELECT
       web_pages.id,
       web_pages.content,
       ts_rank_cd(web_pages.tsv_content, to_tsquery($$testing$$)) AS ts_rank_cd_1
   FROM
       web_pages
   WHERE
       web_pages.tsv_content @@ to_tsquery($$testing$$);
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
 Seq Scan on web_pages  (cost=0.00..4406819.80 rows=19751 width=505) (actual time=0.343..142325.954 rows=134949 loops=1)
   Filter: (tsv_content @@ to_tsquery('testing'::text))
   Rows Removed by Filter: 12764373
 Planning time: 0.436 ms
 Execution time: 142341.489 ms
(5 rows)

나는 증가 내 일 메모리 3GB 신의 문제,그리고 그렇지 않습니다.

또한,그것은 것을 주목해야한 이들은 매우 큰 테이블~150GB 의 텍스트를 통해 4M 행(와 8M 행 추가 content/tsv_content 가 NULL).

이 test_sites 테이블의 1/1000 의 행 web_pages, 으로,그것은 약간 금지 실험할 때 모든 질의 여러 분입니다.

내가 사용하는 postgresql9.5(예,내가 그것을 컴파일,나는 원하는 ON CONFLICT).거기 보이지 않는 것에 대한 태그는 아직이다.

피 문제를 열고 9.5,그리고 나는 볼 수 없이는 하나의 결과다.

에서 신선한 완전한 재작성 지수,그래도 문제가 해결되지:

webarchive=# ANALYZE web_pages ;
ANALYZE
webarchive=# EXPLAIN ANALYZE SELECT
    web_pages.id,
    web_pages.content,
    ts_rank_cd(web_pages.tsv_content, to_tsquery($$testing$$)) AS ts_rank_cd_1
FROM
    web_pages
WHERE
    web_pages.tsv_content @@ to_tsquery($$testing$$);
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on web_pages  (cost=10000000000.00..10005252343.30 rows=25109 width=561) (actual time=7.114..146444.168 rows=134949 loops=1)
   Filter: (tsv_content @@ to_tsquery('testing'::text))
   Rows Removed by Filter: 13137318
 Planning time: 0.521 ms
 Execution time: 146465.188 ms
(5 rows)

참고로 나는 그대로 그냥 ANALYZEed 및 seqscan 사용할 수 없습니다.

해결책

잘 나가는 시간을 만드는 몇 가지 여분의 공간에서 디스크를 가진 DB 에 그것을,이동하는 다른 데이터베이스에 떨어져 다른 SSD.

나는 다음 달 VACUUM ANALYZE 전체 데이터베이스, 고,지금은 분명히 그것은 것으로 나타났는 인덱스입니다.

난 그 이전에 이를 모두 분석하고 진공 청소기로 청소 이 테이블, 지만,분명히 그것은 어떻게든 차이를 만들어 그것을 하에서는 일반적인 오히려 그런 다음 특정 테이블.Go 니다.

webarchive=# EXPLAIN ANALYZE SELECT
    web_pages.id,
    web_pages.content
FROM
    web_pages
WHERE
    web_pages.tsv_content @@ to_tsquery($$testing$$);
                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on web_pages  (cost=1185.79..93687.30 rows=23941 width=189) (actual time=41.448..152.108 rows=134949 loops=1)
   Recheck Cond: (tsv_content @@ to_tsquery('testing'::text))
   Heap Blocks: exact=105166
   ->  Bitmap Index Scan on idx_web_pages_content  (cost=0.00..1179.81 rows=23941 width=0) (actual time=24.940..24.940 rows=134996 loops=1)
         Index Cond: (tsv_content @@ to_tsquery('testing'::text))
 Planning time: 0.452 ms
 Execution time: 154.942 ms
(7 rows)

나도 할 수있는 기회를했 실행 VACUUM FULL; 지금 하는가에 대한 충분한 공간이 처리합니다.나는 공정한 비트의 행 이탈 테이블에 있으로 봤을 실험하는 동안 개발 그리고 나는 같은 시도를 통합하는 어떤 파일이 분열되었다는 것이다.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 dba.stackexchange