EC2Rescue to troubleshoot operating-system-level issues

Please Note: This article is part of our historical archive. Because it was published a while ago, some of the information, links, or context may now be outdated.

EC2Rescue is a tool to troubleshoot operating-system-level issues on Amazon EC2 Linux instances.

Here, at Bobcares, we assist our customers with several AWS queries as part of our AWS Support Services.

Today, let us see how to use EC2Rescue to diagnose and troubleshoot problems.

EC2Rescue

With EC2Rescue, we can correct operating-system-level issues.

It can also collect advanced logs, system utilization reports, and configuration files, in case we need to analyze.

EC2Rescue addresses the following for Linux:

Collect system utilization reports.
Collect logs and details.
Detect system problems.
Automatically remediate system problems.

EC2Rescue to troubleshoot operating-system-level issues

Moving ahead, let us see how to troubleshoot an unreachable Amazon EC2 Linux instance.

To do so, our Support Techs recommend the steps below:

1. Initially, we launch a new Amazon EC2 instance in the virtual private cloud (VPC) using the same Amazon Machine Image (AMI) and in the same Availability Zone as the impaired instance.

The new instance becomes the rescue instance.

Another option is to use an existing instance that we can access if it uses the same AMI and is in the same Availability Zone as the impaired instance.

2. Then we detach the Amazon Elastic Block Store root volume (/dev/xvda or /dev/sda1) from the impaired instance.

3. We then attach the EBS volume as a secondary device ( /dev/sdf) to the rescue instance.

4. Eventually, we connect to the rescue instance via SSH.

5. Here, we create a mount point directory (/rescue) for the new volume we attach to the rescue instance.

$ sudo mkdir /rescue

6. We mount the volume at the above directory.

$ sudo mount /dev/xvdf1 /rescue

We can use the lsblk command to view the available disk devices along with their mount points.

Suppose the volume mount fails. Then, we check dmesg | tail. If the logs suggest conflicting UUID, we use the option -o nouuid.

7. Now we change the root directory (chroot) to the new volume:

$ sudo -i
# for i in proc sys dev run; do mount --bind /$i /rescue/$i ; done
# chroot /rescue

8. After that, we download and install the EC2Rescue Tool for Linux on an offline Linux root volume:

$ curl -O https://s3.amazonaws.com/ec2rescuelinux/ec2rl.tgz
$ tar -xvf ec2rl.tgz

9. By listing the help file, we can verify the installation:

$ cd ec2rl-<version_number>
$ ./ec2rl help

10. We then proceed to run EC2Rescue for Linux with no options to run all modules as sudo:

$ sudo ./ec2rl run

11. The result will be in /var/temp/ec2rl:

cat /var/tmp/ec2rl/<logfile_location>/Main.log

12. After analyzing the results we enable remediation for the supported modules:

$ ./ec2rl run --remediate

13. Once done, we exit from chroot and unmount the secondary device:

$ exit
$ sudo umount /rescue

If the unmount isn’t successful, we stop or reboot the rescue instance to enable a clean unmount.

14. Then we detach the secondary volume (/dev/sdf) and then attach it to the original instance as /dev/xvda (root volume).

15. Eventually, we start the EC2 instance, and verify the instance is responsive.

[Stuck with the steps? Feel free to contact us anytime]

Conclusion

In short, we saw how our Support Techs use EC2Rescue to correct operating-system-level issues.

EC2Rescue to troubleshoot operating-system-level issues

EC2Rescue

EC2Rescue to troubleshoot operating-system-level issues

9. By listing the help file, we can verify the installation:

Conclusion

Submit a Comment Cancel reply

Subscribe to our newsletter

Footer newsletter