Question

I have a table that keeps measurements of latencies between nodes running MPI tasks in a large cluster. The table looks like this:

CREATE TABLE latency(
from_rank int,
to_rank int,
from_host varchar(20),
to_host varchar(20),
from_cpu varchar(20),
to_cpu varchar(20),
latency float8);

CREATE INDEX ON latency(from_host, to_host);

Now after a large experiment I collected over 500 million rows of data. I find querying these data painfully slow, below is an example of a SELECT COUNT(*)

psql (9.4devel)
Type "help" for help.

routing=# \timing 
Timing is on.
routing=# SELECT COUNT(*) FROM latency;
   count   
-----------
 522190848
(1 row)

Time: 759462.969 ms
routing=# SELECT COUNT(*) FROM latency;
   count   
-----------
 522190848
(1 row)

Time: 96775.036 ms
routing=# SELECT COUNT(*) FROM latency;
   count   
-----------
 522190848
(1 row)

Time: 97708.132 ms
routing=#

I am running both the PgSQL server and client on the same machine, which has 4 Xeon E7-4870s (40 cores/80 threads in total) and 1 TB of RAM. The effect of Linux file caching is obvious: the first query took well over 12mins while the subsequent ones took about 1.5min.

Is there anything I can do to make the query run faster, since 1.5min isn't exactly responsive.

Thanks.

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top