Percona mysqld recebeu sinal 11

https://dba.stackexchange.com/questions/105459

26-09-2020
|

Pergunta

Pesquisei e pesquisei no StackExchange, mas não consigo encontrar nada relevante para esse problema específico (existem outros problemas de erro 11, mas não para isso, pelo que posso dizer).

Temos um servidor que não está realmente fazendo muitas leituras ou gravações.O servidor é uma VM com 16 núcleos e 16 GB de RAM.

O MySQL trava após algum tempo desconhecido e o log de erros mostra:

09:17:14 UTC - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/

key_buffer_size=33554432
read_buffer_size=131072
max_used_connections=21
max_threads=502
thread_count=12
connection_count=12
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1131343 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
hread pointer: 0x3329180
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f4a0043ee70 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0x7cbffc]
/usr/sbin/mysqld(handle_fatal_signal+0x4a1)[0x699251]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f4a3278b340]
/lib/x86_64-linux-gnu/libc.so.6(+0x9818e)[0x7f4a3114b18e]
/usr/sbin/mysqld(_ZN8Protocol14net_store_dataEPKhm+0x47)[0x52e3b7]
/usr/sbin/mysqld[0x52fea4]
/usr/sbin/mysqld(_ZN8Protocol24send_result_set_metadataEP4ListI4ItemEj+0x52c)[0x52fbbc]
/usr/sbin/mysqld(_Z18mysqld_list_fieldsP3THDP10TABLE_LISTPKc+0x22a)[0x5e8d2a]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x2498)[0x5a6988]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x13a)[0x643a4a]
/usr/sbin/mysqld(handle_one_connection+0x4a)[0x643b3a]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f4a32783182]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f49d00078a0): is an invalid pointer
Connection ID (thread ID): 83475
Status: NOT_KILLED

You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.
150629  9:17:14 [Warning] Using unique option prefix myisam-recover instead of myisam-recover-options is deprecated and will be removed in a future release. Please use the full name instead.
150629  9:17:14 [Warning] Using unique option prefix myisam-recover instead of myisam-recover-options is deprecated and will be removed in a future release. Please use the full name instead.
150629  9:17:14 [Note] Plugin 'FEDERATED' is disabled.
150629  9:17:14 InnoDB: The InnoDB memory heap is disabled
150629  9:17:14 InnoDB: Mutexes and rw_locks use GCC atomic builtins
150629  9:17:14 InnoDB: Compressed tables use zlib 1.2.3
150629  9:17:14 InnoDB: Using Linux native AIO
150629  9:17:14 InnoDB: Initializing buffer pool, size = 500.0M
150629  9:17:14 InnoDB: Completed initialization of buffer pool
150629  9:17:14 InnoDB: highest supported file format is Barracuda.
InnoDB: Log scan progressed past the checkpoint lsn 15732651240
150629  9:17:14  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Doing recovery: scanned up to log sequence number 15737893888
InnoDB: Doing recovery: scanned up to log sequence number 15743136768
InnoDB: Doing recovery: scanned up to log sequence number 15748379648
InnoDB: Doing recovery: scanned up to log sequence number 15752740649
InnoDB: Transaction 986C1E8 was in the XA prepared state.
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 0 row operations to undo
InnoDB: Trx id counter is 986C300
150629  9:17:15  InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
InnoDB: Apply batch completed
InnoDB: Last MySQL binlog file position 0 89114508, file name /var/lib/mysql/mysql-bin.000057
InnoDB: Starting in background the rollback of uncommitted transactions
150629  9:17:16  InnoDB: Rollback of non-prepared transactions completed
150629  9:17:16  InnoDB: Waiting for the background threads to start
150629  9:17:17 Percona XtraDB (http://www.percona.com) 5.5.43-37.2 started; log sequence number 15752740649
150629  9:17:17 [Note] Recovering after a crash using /var/lib/mysql/mysql-bin
150629  9:17:17 [Note] Starting crash recovery...
150629  9:17:17  InnoDB: Starting recovery for XA transactions...
150629  9:17:17  InnoDB: Transaction 986C1E8 in prepared state after recovery
150629  9:17:17  InnoDB: Transaction contains changes to 1 rows
150629  9:17:17  InnoDB: 1 transactions in prepared state after recovery
150629  9:17:17 [Note] Found 1 prepared transaction(s) in InnoDB
150629  9:17:17 [Note] Crash recovery finished.
150629  9:17:17 [Warning] 'user' entry 'root@host' ignored in --skip-name-resolve mode.
150629  9:17:17 [Warning] 'proxies_priv' entry '@ root@host' ignored in --skip-name-resolve mode.
150629  9:17:17 [Note] Event Scheduler: Loaded 0 events
150629  9:17:17 [Note] /usr/sbin/mysqld: ready for connections.
Version: '5.5.43-37.2-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  Percona Server (GPL), Release 37.2, Revision 6202fee

Eu não acho que isso esteja relacionado ao Percona porque ele também faz isso com o MySQL normal (carreguei o Percona para ver se isso resolveria o problema).

O general.log mostra apenas o seguinte durante esse período:

150629  9:17:13 76946 Query     set autocommit=1
150629  9:17:14 76946 Query     UPDATE `api_dictionaryitem` SET `name` = '', `group_id` = 4, `count` = 2248844, `last_occurence` = NULL WHERE `api_dictionaryitem`.`id` = 8
150629  9:17:17     1 Connect   debian-sys-maint@localhost on
150629  9:17:18     4 Quit
150629  9:17:22    36 Connect   user@localhost on mydb

O banco de dados inteiro tem apenas 5,3 MB

Passei horas tentando descobrir isso, mas está além de mim neste momento.

EDITAR:Sim, eu sei que a ocorrência está escrita incorretamente, está na fila de tickets para correção, mas é consistente em todo o código no momento, portanto não é a causa.

EDITAR 2:Adicionando MySQL Config - que foi gerado usando a ferramenta de geração do Percona.Eu estava usando isso como uma configuração "segura".

# Generated by Percona Configuration Wizard (http://tools.percona.com/) version REL5-20120208

[mysql]

# CLIENT #
port                           = 3306
socket                         = /var/run/mysqld/mysqld.sock

[mysqld]

# GENERAL #
user                           = mysql
default-storage-engine         = InnoDB
socket                         = /var/run/mysqld/mysqld.sock
pid-file                       = /var/run/mysqld/mysqld.pid

# MyISAM #
key-buffer-size                = 32M
myisam-recover                 = FORCE,BACKUP

# SAFETY #
max-allowed-packet             = 16M
max-connect-errors             = 1000000
skip-name-resolve
sql-mode                       = STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_AUTO_VALUE_ON_ZERO,NO_ENGINE_SUBSTITUTION,NO_ZERO_DATE,NO_ZERO_IN_DATE,ONLY_FULL_GROUP_BY
sysdate-is-now                 = 1
innodb                         = FORCE
innodb-strict-mode             = 1

# DATA STORAGE #
datadir                        = /var/lib/mysql/

# BINARY LOGGING #
log-bin                        = /var/lib/mysql/mysql-bin
expire-logs-days               = 14
sync-binlog                    = 1

# CACHES AND LIMITS #
tmp-table-size                 = 32M
max-heap-table-size            = 32M
query-cache-type               = 0
query-cache-size               = 0
max-connections                = 500
thread-cache-size              = 50
open-files-limit               = 65535
table-definition-cache         = 1024
table-open-cache               = 2048

# INNODB #
innodb-flush-method            = O_DIRECT
innodb-log-files-in-group      = 2
innodb-log-file-size           = 64M
innodb-flush-log-at-trx-commit = 1
innodb-file-per-table          = 1
innodb-buffer-pool-size        = 500M

# LOGGING #
log-error                      = /var/log/mysql/error.log
slow-query-log                 = 1
slow-query-log-file            = /var/log/mysql/slow.log
log=/var/log/mysql/general.log

# Autoextend
innodb_data_file_path           = ibdata1:128M;ibdata2:10M:autoextend

Solução

O que me chama a atenção é a Consulta registrada no log geral

150629  9:17:14 76946 Query     UPDATE `api_dictionaryitem` SET `name` = '', `group_id` = 4, `count` = 2248844, `last_occurence` = NULL WHERE `api_dictionaryitem`.`id` = 8

quando você recebeu o sinal 11

09:17:14 UTC - mysqld got signal 11 ;

Eu suspeito que você encontrou algum bug estranho ao executar a tabela api_dictionaryitem Neste caso.Por que ?

Veja a resposta ao sinal 11 que você recebeu, que irei detalhar

Pode ser porque você atingiu um inseto.
Também é possível que este binário ou uma das bibliotecas às quais ele foi vinculado esteja corrompida, construída incorretamente ou configurada incorretamente.
Este erro também pode ser causada por hardware com defeito.
Faremos o possível para obter algumas informações que, esperamos, ajudem a diagnosticar o problema, mas como já travamos, algo está definitivamente errado e isso pode falhar.

Vamos ver o que o mysqld estava fazendo antes de recorrer a um rastreamento de pilha

stack_bottom = 7f4a0043ee70 thread_stack 0x30000
/usr/sbin/mysqld(my_print_stacktrace+0x2c)[0x7cbffc]
/usr/sbin/mysqld(handle_fatal_signal+0x4a1)[0x699251]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f4a3278b340]
/lib/x86_64-linux-gnu/libc.so.6(+0x9818e)[0x7f4a3114b18e]
/usr/sbin/mysqld(_ZN8Protocol14net_store_dataEPKhm+0x47)[0x52e3b7]
/usr/sbin/mysqld[0x52fea4]
/usr/sbin/mysqld(_ZN8Protocol24send_result_set_metadataEP4ListI4ItemEj+0x52c)[0x52fbbc]
/usr/sbin/mysqld(_Z18mysqld_list_fieldsP3THDP10TABLE_LISTPKc+0x22a)[0x5e8d2a]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x2498)[0x5a6988]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x13a)[0x643a4a]
/usr/sbin/mysqld(handle_one_connection+0x4a)[0x643b3a]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f4a32783182]

Veja a linha que diz

/usr/sbin/mysqld(_ZN8Protocol14net_store_dataEPKhm+0x47)[0x52e3b7]

A consulta no momento do impacto foi

UPDATE `api_dictionaryitem` SET
`name` = '',
`group_id` = 4,
`count` = 2248844,
`last_occurence` = NULL
WHERE `api_dictionaryitem`.`id` = 8

O que net_store_data estava fazendo naquele momento?

O livro Understanding MySQL Internals diz algo sobre net_store_data e valores NULL nas páginas 74,75 sob o título Respostas do servidor subtítulo Campos de dados

MySQL Internals

Os campos de dados são componentes críticos em muitos pacotes de resposta do servidor.Um campo de dados consiste em uma sequência de especificador de comprimento seguida pelo valor real dos dados.A sequência do especificador de comprimento pode ser entendida estudando net_store_length() ...

O que é interessante é que net_store_data() chama net_store_length() na linha 482 em protocol.cc.Esse código se parece com isso

{
  to=net_store_length_fast(to,length);
  memcpy(to,from,length);
  return to+length;
}

Na linha 489, outra versão de net_store_data() se parece com isto:

{
  char buff[20];
  uint length=(uint) (int10_to_str(from,buff,10)-buff);
  to=net_store_length_fast(to,length);
  memcpy(to,buff,length);
  return to+length;
}

Na linha 498, outra versão de net_store_data() se parece com isto:

{
  char buff[22];
  uint length=(uint) (longlong10_to_str(from,buff,10)-buff);
  to=net_store_length_fast(to,length);
  memcpy(to,buff,length);
  return to+length;
}

Esses trechos de código podem ser encontrados em SourceCodeBrowser

O código para net_store_length

00450 {
00451   uchar *packet=(uchar*) pkg;
00452   if (length < 251)
00453   {
00454     *packet=(uchar) length;
00455     return (char*) packet+1;
00456   }
00457   *packet++=252;
00458   int2store(packet,(uint) length);
00459   return (char*) packet+2;
00460 }

pode ser encontrado aqui

Como net_store_data é o ponto único de falha no rastreamento de pilha, o que há no SQL que teria desencadeado tal falha?

Olhe novamente para o SQL

UPDATE `api_dictionaryitem` SET
`name` = '',
`group_id` = 4,
`count` = 2248844,
`last_occurence` = NULL
WHERE `api_dictionaryitem`.`id` = 8

A única coisa que me parece fora do lugar é o

`last_occurence` = NULL

Veja novamente o código de net_store_length.Observe a linha

00452   if (length < 251)

O que há de tão especial no número 251?De acordo com o livro Compreendendo os componentes internos do MySQL, o último parágrafo da página 74 diz:

Pode-se perguntar por que o comprimento de 1 byte é limitado a 251, quando o primeiro valor reservado no net_store_length() é 252.O código 251 tem um significado especial.Indica que não há valor de comprimento ou dados após o código, e o valor do campo é SQL NULL.

Ah!!!O valor NULL está sendo maltratado em net_store_length() ao lidar com o last_occurence campo.Se o last_occurence é DATETIME ou TIMESTAMP, o uso de NULL provavelmente não será permitido.

Talvez mudando

`last_occurence` = NULL

para um valor de data falso, mas legítimo

`last_occurence` = '1970-01-01 00:00:00'

provavelmente acertaria esse problema na cabeça.

A primeira resposta do rastreamento de pilha estava certa: This could be because you hit a bug.

Por favor leia o Documentação do MySQL sobre como usar um rastreamento de pilha.

ATUALIZAÇÃO 06/07/2015 10:47 EST

Anteriormente você comentou sobre a definição da tabela

mysql> desc api_dictionaryitem;
+----------------+--------------+------+-----+---------+----------------+
| Field          | Type         | Null | Key | Default | Extra          |
+----------------+--------------+------+-----+---------+----------------+
| id             | int(11)      | NO   | PRI | NULL    | auto_increment |
| name           | varchar(255) | NO   |     | NULL    |                |
| count          | bigint(20)   | YES  |     | NULL    |                |
| last_occurence | bigint(20)   | YES  |     | NULL    |                |
| group_id       | int(11)      | NO   | MUL | NULL    |                |
+----------------+--------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)

Você mencionou isso last_occurence é BIGINT.Já que é BIGINT, que tem 8 bytes e está sendo definido como NULL, pense no que isso faz.

O primeiro byte possui 0xFB (251) seguido por 7 bytes de material indefinido.O código para net_store_length Eu mostrei em minha resposta que há algo para número inteiro começando na linha 458

00450 {
00451   uchar *packet=(uchar*) pkg;
00452   if (length < 251)
00453   {
00454     *packet=(uchar) length;
00455     return (char*) packet+1;
00456   }
00457   *packet++=252;
00458   int2store(packet,(uint) length);
00459   return (char*) packet+2;
00460 }

A função é chamada int2store.Eu não acho que ele possa lidar com um BIGINT corretamente, já que os 7 bytes após o sinalizador NULL podem ser qualquer coisa.Tenha em mente que o sinal 11 é uma falha de segmentação.Neste caso, é devido ao processo de bytes desconhecidos ou espúrios.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a dba.stackexchange