Slow insert speed in Postgresql memory tablespace
-
05-10-2019 - |
Question
I have a requirement where I need to store the records at rate of 10,000 records/sec into a database (with indexing on a few fields). Number of columns in one record is 25. I am doing a batch insert of 100,000 records in one transaction block. To improve the insertion rate, I changed the tablespace from disk to RAM.With that I am able to achieve only 5,000 inserts per second.
I have also done the following tuning in the postgres config:
- Indexes : no
- fsync : false
- logging : disabled
Other information:
- Tablespace : RAM
- Number of columns in one row : 25 (mostly integers)
- CPU : 4 core, 2.5 GHz
- RAM : 48 GB
I am wondering why a single insert query is taking around 0.2 msec on average when database is not writing anything on disk (as I am using RAM based tablespace). Is there something I am doing wrong?
Help appreciated.
Prashant
Solution
Fast Data Loading
- Translate your data to CSV.
- Create a temporary table (as you noted, without indexes).
- Execute a COPY command:
\COPY schema.temp_table FROM /tmp/data.csv WITH CSV
- Insert the data into the non-temporary table.
- Create indexes.
- Set appropriate statistics.
Further Recommendations
For large volumes of data:
- Split the data into child tables.
- Insert it in order of the column from which most of the
SELECT
statements will use. In other words, try to align the physical model with the logical model. - Adjust your configuration settings.
- Create a
CLUSTER
index (most important column on the left). For example:
CREATE UNIQUE INDEX measurement_001_stc_index ON climate.measurement_001 USING btree (station_id, taken, category_id); ALTER TABLE climate.measurement_001 CLUSTER ON measurement_001_stc_index;
Configuration Settings
On a machine with 4GB of RAM, I did the following...
Kernel Configuration
Tell the Kernel that it's okay for programs to use gobs of shared memory:
sysctl -w kernel.shmmax=536870912
sysctl -p /etc/sysctl.conf
PostgreSQL Configuration
- Edit
/etc/postgresql/8.4/main/postgresql.conf
and set:shared_buffers = 1GB temp_buffers = 32MB work_mem = 32MB maintenance_work_mem = 64MB seq_page_cost = 1.0 random_page_cost = 2.0 cpu_index_tuple_cost = 0.001 effective_cache_size = 512MB checkpoint_segments = 10
- Tweak the values as necessary and suitable to your environment. You will probably have to change them for suitable read/write optimization later.
- Restart PostgreSQL.
Child Tables
For example, let's say you have data based on weather, divided into different categories. Rather than having a single monstrous table, divide it into several tables (one per category).
Master Table
CREATE TABLE climate.measurement
(
id bigserial NOT NULL,
taken date NOT NULL,
station_id integer NOT NULL,
amount numeric(8,2) NOT NULL,
flag character varying(1) NOT NULL,
category_id smallint NOT NULL,
CONSTRAINT measurement_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
Child Table
CREATE TABLE climate.measurement_001
(
-- Inherited from table climate.measurement_001: id bigint NOT NULL DEFAULT nextval('climate.measurement_id_seq'::regclass),
-- Inherited from table climate.measurement_001: taken date NOT NULL,
-- Inherited from table climate.measurement_001: station_id integer NOT NULL,
-- Inherited from table climate.measurement_001: amount numeric(8,2) NOT NULL,
-- Inherited from table climate.measurement_001: flag character varying(1) NOT NULL,
-- Inherited from table climate.measurement_001: category_id smallint NOT NULL,
CONSTRAINT measurement_001_pkey PRIMARY KEY (id),
CONSTRAINT measurement_001_category_id_ck CHECK (category_id = 1)
)
INHERITS (climate.measurement)
WITH (
OIDS=FALSE
);
Table Statistics
Bump up the table stats for the important columns:
ALTER TABLE climate.measurement_001 ALTER COLUMN taken SET STATISTICS 1000;
ALTER TABLE climate.measurement_001 ALTER COLUMN station_id SET STATISTICS 1000;
Don't forget to VACUUM
and ANALYSE
afterwards.
OTHER TIPS
are you doing your insert as a series of
INSERT INTO tablename (...) VALUES (...);
INSERT INTO tablename (...) VALUES (...);
...
or as one multiple-row insert:
INSERT INTO tablename (...) VALUES (...),(...),(...);
second one will be faster significantly on 100k rows.
source: http://kaiv.wordpress.com/2007/07/19/faster-insert-for-multiple-rows/
Did you place xlog (the WAL segments) also on your RAM-drive? If not, you're still writing to disk. And what about the settings for wal_buffers, checkpoint_segments, etc. ? You have to try to get all your 100,000 records (your single transaction) in your wal_buffers. Increasing this parameter might cause PostgreSQL to request more System V shared memory than your operating system's default configuration allows.
I suggest you to use COPY
instead of INSERT
.
You should also fine tune your postgresql.conf file.
Read about on http://wiki.postgresql.org/wiki/Performance_Optimization