Question

We are developing a prototype for a BIG data product. We have almost 2 billion records. We have used PostgresSQL 9.5 as a back-end and Python as front-end.

We are using a 16*2.4 GHz processors with 160 GB RAM on Amazon servers.

Our bench mark for query results are 10 seconds maximum, although simple count queries on tables with index are taking approximately 30 minutes to 1 Hour .

To overcome our performance issues I required changes in configuration file:

max_connections = 20
shared_buffers = 14GB
effective_cache_size = 42GB
work_mem = 367001kB
maintenance_work_mem = 2GB
checkpoint_segments = 128
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 500

I created the monthly partition on sale_date column. Even though performance are still bad. I was reading some articles and found out that PostgreSQL uses a single CPU for query processing from a single connection.

To use all the CPUs for query processing we can use sharding, and pg_shard is the opensource component created by cistusdata for this exact purpose.

I have installed the citusdb as explained on multi-node-setup-page.

When I execute CREATE EXTENSION pg_shard;, I get this error:

ERROR: could not open extension control file "/usr/share/postgresql/9.5/extension/pg_shard.control": No such file or directory

I manually copied the pg_shard.control file in the extension folder and I then started getting this error:

pg_shard--1.2.sql not found.

Any help is appreciated. How can I solve this problem?

Was it helpful?

Solution

It looks like pg_shard files are not installed. You will need to build pg_shard from sources and install it using make install within pg_shard source folder.

Please be aware that pg_shard has reached end of life, and it is included in recently released (Citus 5.0).

You can also get Citus from PGDG if you are on RPM based system.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top