Question

I have an autoscaling cloudformation that I think I have set up to replace failed instances based on StatusCheckFailed_Instance. I want to test this. Can I test this by terminating one of the EC2 instances? Thanks!

Was it helpful?

Solution

Instance Status Check might fail for one of the following reasons:

Memory Errors

  • Out of memory: kill process
  • ERROR: mmu_update failed (Memory management update failed)

Device Errors

  • I/O error (Block device failure)
  • IO ERROR: neither local nor remote disk (Broken distributed block device)

Kernel Errors

  • request_module: runaway loop modprobe (Looping legacy kernel modprobe on older Linux versions)
  • "FATAL: kernel too old" and "fsck: No such file or directory while trying to open /dev" (Kernel and AMI mismatch)
  • "FATAL: Could not load /lib/modules" or "BusyBox" (Missing kernel modules)
  • ERROR Invalid kernel (EC2 incompatible kernel)

File System Errors

  • request_module: runaway loop modprobe (Looping legacy kernel modprobe on older Linux versions)
  • fsck: No such file or directory while trying to open... (File system not found)
  • General error mounting filesystems (Failed mount)
  • VFS: Unable to mount root fs on unknown-block (Root filesystem mismatch)
  • Error: Unable to determine major/minor number of root device... (Root file system/device mismatch)
  • XENBUS: Device with no driver...
  • ... days without being checked, check forced (File system check required)
  • fsck died with exit status... (Missing device)

Operating System Errors

  • GRUB prompt (grubdom>)
  • Bringing up interface eth0: Device eth0 has different MAC address than expected, ignoring. (Hard-coded MAC address)
  • Unable to load SELinux Policy. Machine is in enforcing mode. Halting now. (SELinux misconfiguration)
  • XENBUS: Timeout connecting to devices (Xenbus timeout)

It seems to me that #1 is the easiest to implement to fail on demand. You can add web hook or launch a shell script with the delay to start some process which will result in OutOfMemory failure, to confirm that your autoscaling configuration works as configured.

Terminating the instance will not help to test your configuration, as when you gracefully terminating the instance, it is removed from the pool of available instances, and the check will not be performed.

More details on status checks can be found here: Troubleshooting Instances with Failed Status Checks

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top