org.postgresql.util.PSQLException: ERROR: canceling statement due to conflict with recovery

https://dba.stackexchange.com/questions/157381

04-10-2020
|

Pergunta

I currently have a PostgreSQL slave node that is presenting problems, that don't arise in the master node. Seems to be related with the node sync process.

Complete stacktrace:

org.postgresql.util.PSQLException: ERROR: canceling statement due to conflict with recovery
  Detail: User query might have needed to see row versions that must be removed.
    at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2103)
    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1836)
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
    at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:512)
    at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
    at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:273)
    at org.apache.tomcat.dbcp.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:82)
    at org.apache.tomcat.dbcp.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:82)
    at cl.waypoint.mailer.reportes.BasicReport.getSingleColumn(BasicReport.java:542)
    at cl.waypoint.mailer.reportes.BasicReport.getSingleColumn(BasicReport.java:518)
    at cl.waypoint.mailer.reportes.StatusSemanalClientes.updateIgnicion(StatusSemanalClientes.java:448)
    at cl.waypoint.mailer.reportes.StatusSemanalClientes.access$2(StatusSemanalClientes.java:447)
    at cl.waypoint.mailer.reportes.StatusSemanalClientes$TempAndDoorLocator.call(StatusSemanalClientes.java:414)
    at cl.waypoint.mailer.reportes.StatusSemanalClientes$TempAndDoorLocator.call(StatusSemanalClientes.java:1)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:722)

DB Version:

PostgreSQL 9.4.9 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit

OS Version:

Distributor ID: Debian
Description:    Debian GNU/Linux 8.6 (jessie)
Release:    8.6
Codename:   jessie

The question is: how can I debug the actual cause of the problem and/or solve/avoid it? Don't hesitate to ask for more info if needed to pinpoint the problem, right now just don't know what else to provide.

PS: Error does NOT appear if the offending query is repeated, so it's not strictly related to the query itself, but perhaps with a table update/sync transient.

Solução

What you see is a common experience for anyone running standbys and bigger queries on the master. There are a couple of possible causes:

There are also additional types of conflict that can occur with Hot Standby. These conflicts are hard conflicts in the sense that queries might need to be canceled and, in some cases, sessions disconnected to resolve them. The user is provided with several ways to handle these conflicts. Conflict cases include:

Access Exclusive locks taken on the primary server, including both explicit LOCK commands and various DDL actions, conflict with table accesses in standby queries.

Dropping a tablespace on the primary conflicts with standby queries using that tablespace for temporary work files.

Dropping a database on the primary conflicts with sessions connected to that database on the standby.

Application of a vacuum cleanup record from WAL conflicts with standby transactions whose snapshots can still "see" any of the rows to be removed.

Application of a vacuum cleanup record from WAL conflicts with queries accessing the target page on the standby, whether or not the data to be removed is visible.

When a query will be killed for one of the above reasons is hard to predict, but usually there are some that appear more often. It is also configurable, for options read the documentation page further.

The most important option is hot_standby_feedback, which helps against the most common cause (VACUUM removing recently-dead rows). It has its own costs (might cause table bloat), but the outcome is usually still better than running the same queries on the master.

Outras dicas

We did experience the same couple of times, where in we wrote to master , and reads were from slave. The above errors started appearing when we read from slave but at the same instant some other process due to a functional constraint removed that data. Hence in order to resolve such issues, we added our own "read your own writes" method, wherein there is a requirement for an application thread to read what was just written to db. In that case that particular thread that writes to master will read from master. Yes, the flip side to this is that there is a slight compromise in read scalability. We moved all reads to slave (stand by)where we were sure of consequent transactions not changing the data.There are more complex workarounds, such as changing transaction isolation level to repeatable read, but the application code gets complex and hard to maintain. HTH.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a dba.stackexchange