systemctl startup script gives “WSREP: Failed to recover position” - manual startup works

https://dba.stackexchange.com/questions/134762

01-10-2020
|

Question

We have a Galera crash. 3 Nodes on RHEL7.

I can start one node with this command

# sudo -u mysql /usr/libexec/mysqld --wsrep-cluster-address='gcomm://'
160408 13:36:24 [Warning] Could not increase number of max_open_files to more than 1024 (request: 16171)
/usr/libexec/mysqld: Query cache is disabled (resize or similar command in progress); repeat this command later

versions

# rpm -qa | grep maria
mariadb-galera-server-5.5.41-2.el7ost.x86_64
mariadb-libs-5.5.41-2.el7_0.x86_64
mariadb-5.5.41-2.el7_0.x86_64
mariadb-galera-common-5.5.41-2.el7ost.x86_64

in log you see succeeful startup

160408 13:36:24 [Note] WSREP: Service thread queue flushed.
160408 13:36:24 [Note] WSREP: Assign initial position for certification: 26805, protocol version: -1
160408 13:36:24 [Note] WSREP: wsrep_sst_grab()
160408 13:36:24 [Note] WSREP: Start replication
160408 13:36:24 [Note] WSREP: Setting initial position to 7c5a3689-fccd-11e5-9960-a65e0f1c364a:26805
160408 13:36:24 [Note] WSREP: protonet asio version 0
160408 13:36:24 [Note] WSREP: Using CRC-32C (optimized) for message checksums.
160408 13:36:24 [Note] WSREP: backend: asio
160408 13:36:24 [Note] WSREP: GMCast version 0

160408 13:36:24 [Note] WSREP: Setting initial position to 7c5a3689-fccd-11e5-9960-a65e0f1c364a:26805
160408 13:36:24 [Note] WSREP: protonet asio version 0
160408 13:36:24 [Note] WSREP: Using CRC-32C (optimized) for message checksums.
160408 13:36:24 [Note] WSREP: backend: asio
160408 13:36:24 [Note] WSREP: GMCast version 0
160408 13:36:24 [Note] WSREP: (1f7e4c1c-fd7e-11e5-b462-ba7c9cb2c37e, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
160408 13:36:24 [Note] WSREP: (1f7e4c1c-fd7e-11e5-b462-ba7c9cb2c37e, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
160408 13:36:24 [Note] WSREP: EVS version 0
160408 13:36:24 [Note] WSREP: PC version 0
160408 13:36:24 [Note] WSREP: gcomm: connecting to group 'galera_cluster', peer ''
160408 13:36:24 [Note] WSREP: Node 1f7e4c1c-fd7e-11e5-b462-ba7c9cb2c37e state prim
160408 13:36:24 [Note] WSREP: view(view_id(PRIM,1f7e4c1c-fd7e-11e5-b462-ba7c9cb2c37e,1) memb {
        1f7e4c1c-fd7e-11e5-b462-ba7c9cb2c37e,0
} joined {
} left {
} partitioned {
})
160408 13:36:24 [Note] WSREP: gcomm: connected
160408 13:36:24 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
160408 13:36:24 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
160408 13:36:24 [Note] WSREP: Opened channel 'galera_cluster'
160408 13:36:24 [Note] WSREP: Waiting for SST to complete.
160408 13:36:24 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
160408 13:36:24 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 1f7ec4c4-fd7e-11e5-b432-235d75b9c06b
160408 13:36:24 [Note] WSREP: STATE EXCHANGE: sent state msg: 1f7ec4c4-fd7e-11e5-b432-235d75b9c06b
160408 13:36:24 [Note] WSREP: STATE EXCHANGE: got state msg: 1f7ec4c4-fd7e-11e5-b432-235d75b9c06b from 0 (galera-root-mgmt-zone.local)
160408 13:36:24 [Note] WSREP: Quorum results:
        version    = 3,
        component  = PRIMARY,
        conf_id    = 0,
        members    = 1/1 (joined/total),
        act_id     = 26805,
        last_appl. = -1,
        protocols  = 0/5/3 (gcs/repl/appl),
        group UUID = 7c5a3689-fccd-11e5-9960-a65e0f1c364a
160408 13:36:24 [Note] WSREP: Flow-control interval: [16, 16]
160408 13:36:24 [Note] WSREP: Restored state OPEN -> JOINED (26805)
160408 13:36:24 [Note] WSREP: Member 0.0 (galera-root-mgmt-zone.local) synced with group.
160408 13:36:24 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 26805)
160408 13:36:24 [Note] WSREP: New cluster view: global state: 7c5a3689-fccd-11e5-9960-a65e0f1c364a:26805, view# 1: Primary, number of nodes: 1, my index: 0, protocol version 3
160408 13:36:24 [Note] WSREP: SST complete, seqno: 26805
160408 13:36:24 InnoDB: The InnoDB memory heap is disabled
160408 13:36:24 InnoDB: Mutexes and rw_locks use GCC atomic builtins
160408 13:36:24 InnoDB: Compressed tables use zlib 1.2.7
160408 13:36:24 InnoDB: Using Linux native AIO
160408 13:36:24 InnoDB: Initializing buffer pool, size = 128.0M
160408 13:36:24 InnoDB: Completed initialization of buffer pool
160408 13:36:24 InnoDB: highest supported file format is Barracuda.
160408 13:36:24  InnoDB: Waiting for the background threads to start
160408 13:36:25 Percona XtraDB (http://www.percona.com) 5.5.40-MariaDB-36.1 started; log sequence number 330123185258
160408 13:36:25 [Note] Plugin 'FEEDBACK' is disabled.
160408 13:36:25 [Warning] Failed to setup SSL
160408 13:36:25 [Warning] SSL error: SSL_CTX_set_default_verify_paths failed
160408 13:36:25 [Note] Server socket created on IP: '0.0.0.0'.
160408 13:36:25 [Note] Event Scheduler: Loaded 0 events
160408 13:36:25 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.5.41-MariaDB-wsrep'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server, wsrep_25.11.r4026
160408 13:36:25 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
160408 13:36:25 [Note] WSREP: REPL Protocols: 5 (3, 1)
160408 13:36:25 [Note] WSREP: Service thread queue flushed.
160408 13:36:25 [Note] WSREP: Assign initial position for certification: 26805, protocol version: 3
160408 13:36:25 [Note] WSREP: Service thread queue flushed.
160408 13:36:25 [Note] WSREP: Synchronized with group, ready for connections
160408 13:36:25 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

All data is available and SQL queries are working on all databases and tables.

After that I stop MariaDB with mysqladmin -u root -p shutdown.

Now wish to start it with systemd, because Puppet expect it to be started with systemd.

But systemctl start mariadb does not work

in log

160408 13:32:16 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql/data
160408 13:32:16 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/data/wsrep_recovery.y9g49q' --pid-file='/var/lib/mysql/data/galera-root-mgmt-zone.local-recover.pid'
160408 13:32:16 [Warning] Could not increase number of max_open_files to more than 1024 (request: 16171)
/usr/libexec/mysqld: Query cache is disabled (resize or similar command in progress); repeat this command later
160408 13:32:18 mysqld_safe WSREP: Failed to recover position:
''

Solution

The datadir = /var/lib/mysql/data was 100% full. After extending the volume systemctl start mariadb worked very well.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange