Slow insert speed in Postgresql memory tablespace

https://stackoverflow.com/questions/2927017

05-10-2019
|

Question

I have a requirement where I need to store the records at rate of 10,000 records/sec into a database (with indexing on a few fields). Number of columns in one record is 25. I am doing a batch insert of 100,000 records in one transaction block. To improve the insertion rate, I changed the tablespace from disk to RAM.With that I am able to achieve only 5,000 inserts per second.

I have also done the following tuning in the postgres config:

Indexes : no
fsync : false
logging : disabled

Other information:

Tablespace : RAM
Number of columns in one row : 25 (mostly integers)
CPU : 4 core, 2.5 GHz
RAM : 48 GB

I am wondering why a single insert query is taking around 0.2 msec on average when database is not writing anything on disk (as I am using RAM based tablespace). Is there something I am doing wrong?

Help appreciated.

Prashant

Solution

Fast Data Loading

Translate your data to CSV.
Create a temporary table (as you noted, without indexes).
Execute a COPY command: \COPY schema.temp_table FROM /tmp/data.csv WITH CSV
Insert the data into the non-temporary table.
Create indexes.
Set appropriate statistics.

Further Recommendations

For large volumes of data:

Split the data into child tables.
Insert it in order of the column from which most of the SELECT statements will use. In other words, try to align the physical model with the logical model.
Adjust your configuration settings.
Create a CLUSTER index (most important column on the left). For example:

    CREATE UNIQUE INDEX measurement_001_stc_index
      ON climate.measurement_001
      USING btree
      (station_id, taken, category_id);
    ALTER TABLE climate.measurement_001 CLUSTER ON measurement_001_stc_index;

Configuration Settings

On a machine with 4GB of RAM, I did the following...

Kernel Configuration

Tell the Kernel that it's okay for programs to use gobs of shared memory:

sysctl -w kernel.shmmax=536870912
sysctl -p /etc/sysctl.conf

PostgreSQL Configuration

Edit /etc/postgresql/8.4/main/postgresql.conf and set:

shared_buffers = 1GB
temp_buffers = 32MB
work_mem = 32MB
maintenance_work_mem = 64MB
seq_page_cost = 1.0
random_page_cost = 2.0
cpu_index_tuple_cost = 0.001
effective_cache_size = 512MB
checkpoint_segments = 10

Tweak the values as necessary and suitable to your environment. You will probably have to change them for suitable read/write optimization later.
Restart PostgreSQL.

Child Tables

For example, let's say you have data based on weather, divided into different categories. Rather than having a single monstrous table, divide it into several tables (one per category).

Master Table

CREATE TABLE climate.measurement
(
  id bigserial NOT NULL,
  taken date NOT NULL,
  station_id integer NOT NULL,
  amount numeric(8,2) NOT NULL,
  flag character varying(1) NOT NULL,
  category_id smallint NOT NULL,
  CONSTRAINT measurement_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);

Child Table

CREATE TABLE climate.measurement_001
(
-- Inherited from table climate.measurement_001:  id bigint NOT NULL DEFAULT nextval('climate.measurement_id_seq'::regclass),
-- Inherited from table climate.measurement_001:  taken date NOT NULL,
-- Inherited from table climate.measurement_001:  station_id integer NOT NULL,
-- Inherited from table climate.measurement_001:  amount numeric(8,2) NOT NULL,
-- Inherited from table climate.measurement_001:  flag character varying(1) NOT NULL,
-- Inherited from table climate.measurement_001:  category_id smallint NOT NULL,
  CONSTRAINT measurement_001_pkey PRIMARY KEY (id),
  CONSTRAINT measurement_001_category_id_ck CHECK (category_id = 1)
)
INHERITS (climate.measurement)
WITH (
  OIDS=FALSE
);

Table Statistics

Bump up the table stats for the important columns:

ALTER TABLE climate.measurement_001 ALTER COLUMN taken SET STATISTICS 1000;
ALTER TABLE climate.measurement_001 ALTER COLUMN station_id SET STATISTICS 1000;

Don't forget to VACUUM and ANALYSE afterwards.

OTHER TIPS

are you doing your insert as a series of

INSERT INTO tablename (...) VALUES (...);
INSERT INTO tablename (...) VALUES (...);
...

or as one multiple-row insert:

INSERT INTO tablename (...) VALUES (...),(...),(...);

second one will be faster significantly on 100k rows.

source: http://kaiv.wordpress.com/2007/07/19/faster-insert-for-multiple-rows/

Did you place xlog (the WAL segments) also on your RAM-drive? If not, you're still writing to disk. And what about the settings for wal_buffers, checkpoint_segments, etc. ? You have to try to get all your 100,000 records (your single transaction) in your wal_buffers. Increasing this parameter might cause PostgreSQL to request more System V shared memory than your operating system's default configuration allows.

I suggest you to use COPY instead of INSERT.

You should also fine tune your postgresql.conf file.

Read about on http://wiki.postgresql.org/wiki/Performance_Optimization

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow