Question

I need suggestions on how to diagnose this issue. I'm running two Magento 2 stores on two separate Amazon EC2 Ubuntu 16.04.7 Micro instances. Both were running Magento 2.3.5 successfully for a long time, and I have now updated to 2.4.2 and PHP 7.4. Since the update, they crash the hole EC2 instances 2-3 times per day. Both of them have the same issue. Sometimes I can see the CPU spike go up to 100% before the crash. I'm analysing traffic using GoAccess to see if there are any suspicious bots. There is not a huge amount of traffic. 3500 hits per day, 95 unique visitors, 18 MB of data. From crawlers/bots I get 1200 hits per day for 343 unique bots and 20 MB of data is transfered.

I'm using 2 GB of swap (1.5 GB free normally) and 1 GB of RAM (500 MB free during normal use).

They have around 3 GB of disk space which isn't filled up during the crash.

Here are my PHP settings /etc/php/7.4/cli/php.ini

post_max_size = 128M
memory_limit = 2048M
max_execution_time = 18000
max_input_time = 6000
zlib.output_compression = On
session.gc_maxlifetime = 1440000
display_errors = Off

/etc/php/7.4/fpm/php.ini

max_execution_time = 18000
max_input_time = 6000
memory_limit = 1024M
date.timezone = Europe/Stockholm 
session.gc_maxlifetime = 1440000
display_errors = Off

The last entry in the Magento system.log is

13:04: main.INFO: Consumer "async.operations.all" skipped as required connection "amqp" is not configured. Unknown connection name amqp [] []

14.05: EC2 monitoring lose contact with the server 14.11 Nginx stops logging, but mostly HTTP 499. The last log message is someone trying to exploit magmi.ini (Magento 1 bug?)

GET /en/sales/guest/form/magmi/conf/magmi.ini HTTP/1.1" 301

So, what is the most likely issue here? I'm going to try and set up a Cloud Watch for the RAM as well.

Was it helpful?

Solution 3

I have done multiple changes and now I have a stable Magento 2.4.2.

  1. Blocked all SQL terms in my nginx conf. This also got rid of them in the "Top search" section of Magento admin. So nice!

    location ~* ^(.*).*(catalogsearch)(.*)$ {
          if ( $args ~* ^(.*).*(CONCAT|ELSE|UNION|SELECT|THEN.*CH*R)(.*) ) { return 444; }
          if ( $args ~* ^(.*).*(SELECT.*SLEEP)(.*) ) { return 444; }
          if ( $args ~* ^(.*).*(ORDER.*BY)(.*) ) { return 444; }
          if ( $args ~* ^(.*).*('A=0)(.*) ) { return 444; }
    
          try_files $uri $uri/ /index.php$is_args$args;
    }
    
  2. Installed GoAccess to see which were the most common "Not found URLs". There were many attempts to exploit security breaches here. Obviously these are not users I would like to have around my servers at all so..

  3. Installed fail2ban to instantly ban all IPs trying to run above SQL injections and exploits.

  4. Installed CrowdSec to ban IPs from spiders, ssh attempts, flood attempts and rouge user IPs from the common IP database. Both CrowdSec and fail2ban are banning the IPs in the firewall/iptables, so really low level.

  5. Changed PHP cli settings back to lower levels so that I wouldn't get a lot of stuck PHP tasks: memory_limit = 256M max_execution_time = 30 max_input_time = 60

Despite all of this I still got some crashes every second day or so. I then saw a linux syslog error message that php max_children was set too low (5). I changed it to pm.max_children = 10 in /etc/php/7.4/fpm/pool.d/www.conf And now I haven't had any crashes in a week.

Also I have a lot more control over all suspicious activity going on around my servers.

OTHER TIPS

The 'Consumer "async.operations.all" skipped as required connection "amqp" is not configured. Unknown connection name amqp' message is not the cause of the system crash, this error appears because you didn't set up a connection to RabbitMQ and the 'async.operations.all' queue cannot be processed unless there is a valid connection, but shouldn't result in any case in a system crash.

In order to debug this issue, I would suggest installing NewRelic to monitor the system and see which are the errors or events that are logged at the time when the system goes down, then start the 'investigation' from what you will see in the results.

This may not be the reason but you allow for a CLI process to utilize up to 2G of memory on a micro instance. You allow the FPM process up to 1G of memory. This means if you have a rogue or long running process, or multiple concurrent visitors, you can quickly exceed your servers resources. A micro instance only has 1G of memory available, and it has to run all the services plus the OS.

Also remember that each FPM thread has the same limits. So if you allow the thread pool to grow to 10 threads, each can utilize up to 1G of memory. Each thread would represent one visitor for the duration of a single request. This means you’ve allocated all memory on the system to a single FPM thread.

I second the new relic monitoring. You should also upsize your server to get more room. With new relic you can see what your average memory usage per thread and request is, so you can fine tune the PHP configuration (maybe you only need 128MB for your store, or maybe you need the full 1G).

Licensed under: CC-BY-SA with attribution
Not affiliated with magento.stackexchange
scroll top