Postgres：使用光标的更新表现令人惊讶

https://stackoverflow.com/questions/4776127

23-10-2019
|

题

考虑以下两个Python代码示例，这些示例实现了相同但具有显着且令人惊讶的性能差异。

import psycopg2, time

conn = psycopg2.connect("dbname=mydatabase user=postgres")
cur = conn.cursor('cursor_unique_name')  
cur2 = conn.cursor()

startTime = time.clock()
cur.execute("SELECT * FROM test for update;")
print ("Finished: SELECT * FROM test for update;: " + str(time.clock() - startTime));
for i in range (100000):
    cur.fetchone()
    cur2.execute("update test set num = num + 1 where current of cursor_unique_name;")
print ("Finished: update starting commit: " + str(time.clock() - startTime));
conn.commit()
print ("Finished: update : " + str(time.clock() - startTime));

cur2.close()
conn.close()

和：

import psycopg2, time

conn = psycopg2.connect("dbname=mydatabase user=postgres")
cur = conn.cursor('cursor_unique_name')  
cur2 = conn.cursor()

startTime = time.clock()
for i in range (100000):
    cur2.execute("update test set num = num + 1 where id = " + str(i) + ";")
print ("Finished: update starting commit: " + str(time.clock() - startTime));
conn.commit()
print ("Finished: update : " + str(time.clock() - startTime));

cur2.close()
conn.close()

表测试的创建语句是：

CREATE TABLE test (id serial PRIMARY KEY, num integer, data varchar);

该表包含100000行和真空分析测试；已经运行了。

我在几次尝试中始终获得以下结果。

第一个代码示例：

Finished: SELECT * FROM test for update;: 0.00609304950429
Finished: update starting commit: 37.3272754429
Finished: update : 37.4449708474

第二个代码示例：

Finished: update starting commit: 24.574401185
Finished committing: 24.7331461431

这对我来说非常令人惊讶，因为我认为应该完全相反，这意味着使用光标的更新应根据这个回答。

解决方案

我不认为测试是平衡的 - 您的第一个代码是从光标获取数据，然后进行更新，而第二个代码是通过ID盲目更新而无需获取数据。我假设第一个代码序列转化为一个fetch命令，然后是更新 - 因此，这是两个客户端/服务器命令周转，而不是一个。

（同样，第一个代码首先将每行锁定在表中 - 这将整个表拉入缓冲区高速缓存 - 尽管考虑到它，但我怀疑这实际上会影响性能，但您没有提及）

我认为，对于一个简单的表来说，通过CTID更新之间不会有太大不同（我认为这是如何 where current of... 工作）和通过主键进行更新 - pkey更新是一个额外的索引查找，但是除非索引是 巨大的 这并不是什么降解。

为了更新这样的100,000行，我怀疑大部分时间都会生成额外的元组并将其插入或附加到桌子上，而不是定位先前的元组以将其标记为已删除。

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow