Save Your Linux Machine From Certain Death

栏目: IT技术 · 发布时间: 5年前

Save Your Linux Machine From Certain Death

Recovering your root password and more

Apr 30 ·7min read

Save Your Linux Machine From Certain Death — Photo by Regine Tholen on Unsplash

Troubleshooting damaged systems is an essential skill of every SysAdmin, SRE, or DevOps engineer. Every one of us runs into OS-related issues from time to time and it’s better to be prepared when things go terribly wrong.

It’s especially beneficial to be able to identify and act on the issue quickly to prevent any significant damage. To help with that in this article, we will go over a few common problems that you might encounter as well as ways to gather information, troubleshoot, and solve these issues.

Note: This article uses RHEL 8 / CentOS . But, the examples/concepts below can be applied to any Linux distribution.

Recovering R `oot` Password

What if you lose the root password and you don't have access to a privileged user? If you still have access to the machine, then there is a way to solve this inconvenient situation.

First, start by rebooting the machine. When the machine starts, hit any key to access the boot menu:

In the boot menu, hit e to edit boot options. Using the arrows, move to the line starting with linux and append rd.break . This breaks the boot process early on.

Optionally, you can also append enforcing=0 , to pause SELinux enforcing. Next, hit CTRL+X to let the machine boot.

After a few seconds of booting, you should get the shell. At this point, you have access to the system in read-only mode.

So, to change anything in the system — like the root password — we need to make the filesystem read-write . We can do that by running mount -o remount,rw /sysroot .

The next thing we need to do is enter the root jail using chroot /sysroot — this changes the root of the filesystem to /sysroot instead of / . This is required so that any further commands we run will be in regards to the /sysroot directory. Now we can change the root password using passwd .

If you added enforcing=0 to boot options, you can now hit CTRL+D (or type exit ) and let the system fully boot. If not, run touch /.autorelabel to trigger the SELinux system relabel.

This is needed because changing the password results in /etc/password having an incorrect SELinux security context. Therefore, we need to relabel the whole filesystem during the next boot (this can take some time, depending on the size of the filesystem).

As an alternative solution, you could also access Linux’ debug-shell . This can be done, again, by accessing GRUB during boot and appending systemd.debug-shell instead of rd.break .

When you let the system boot with this option, you will end up in a normal shell session, which isn’t very helpful. If you, however, try to access terminal 9 using CTRL+ALT+F9 , you will open debug-shell with full root permissions.

Here, you can change the password normally. At this point, you can switch back to a normal shell ( CTRL+ALT+F1 ) and log in.

You shouldn’t forget to stop the debug-shell though, as it is a huge vulnerability to the system. You can do that by running systemctl stop debug-shell.service (you can still switch back to debug-shell but it will be unresponsive; killed-off).

Fixing Unmountable Filesystems

Creating new partitions, creating filesystems, mounting filesystems, etc. are common tasks for most SysAdmins.

But, even though these are basic tasks, it’s easy to make a mistake that may render your system unbootable. Let’s see how you can solve problems related to unmountable filesystems.

As with previous solutions, we start by rebooting the machine, accessing the boot menu and editing it, this time appending systemd.unit=emergency.target . This tells your system to boot into an emergency target instead of the default one (multi-user or graphical).

When the system boots and we get the shell, we login as root and we again remount the filesystem using mount -o remount,rw / . Now we can try mounting all filesystems by running mount -a .

If there is a problem with mounting a specific filesystem, you might see an error message like mount: /wrong-mount: mount point does not exist. or mount /wrong-mount: special device /dev/sdb1 does not exist. . These kinds of issues need to be fixed inside /etc/fstab :

After fixing the issue in /etc/fstab , run systemctl daemon-reload , so that systemd picks up the changes. Now, run mount -a again. If the issue was indeed fixed, you should see no error (no news, is good news). You can now exit using CTRL+D and let the system boot normally.

Aside from a mistyped device or mount point name, you might also encounter issues with VDO (Virtual Data Optimizer) or Stratis, which require extra mount arguments.

E.g. x-systemd.requires=vdo.service or x-systemd.requires=stratisd.service , without which the system won’t boot properly.

Another common and easily fixable mistake might be a missing quote when using UUID="... to specify the device (use /etc/fstab syntax highlighting, it can save you a lot of problems).

Troubleshooting SELinux Problems

This one is not a life and death kind of a situation, but it can cause a lot of problems, so it’s beneficial to be able to identify it quickly when it happens.

It’s important to realize that most of the time, SELinux is doing its job correctly. But it might just happen that you are trying to achieve something SELinux doesn’t expect.

Some of the problems you might encounter may include issues with incorrect file context, for example, after moving a file from one place to another. Sometimes the issue might be with overly restrictive policies (SELinux booleans) or blocked service ports.

One can troubleshoot all of these problems by first temporarily changing SELinux to non-enforcing mode using setenforce 0 and retrying the action that wasn’t working previously.

If the problem was fixed by switching SELinux to non-enforcing mode, then we know that the problem was caused by an SELinux violation.

Now, if we turn SELinux back on using setenforce 1 , we can try to analyze and fix the violation.

First, install setroubleshoot-server using yum -y install setroubleshoot-server . This troubleshooting server will listen to /var/log/audit/audit.log and send summary messages to /var/log/messages .

Next, to analyze these messages, run grep sealert /var/log/messages which should give you messages like this:

As an example here, I configured httpd to run on port 8012 which is blocked because of SELinux service’s allowed ports. If we were not aware of this, then it would be quite hard to find the root cause of this issue.

The output above can help with that. We can see a description of the SELinux violation as well as a command that can help us troubleshoot further, so let’s try it out:

This produces a full report of what caused the violation. Including a suggested (not necessarily the most appropriate) fix.

If you have some experience with SELinux, you might realize that the most appropriate way to fix this issue is to add the relevant port to the SELinux service ( http_port_t ). This can be done by running semanage port -a -t http_port_t -p tcp 8012 .

This pattern of replicating the violation, looking for sealert messages in var/log/messages and viewing the report, and analyzing the report can be applied to any SELinux violation/problem, not just the one example above.

Alternatively, you can also search directly in /var/log/audit/audit.log using ausearch . The specific command you would want to run: ausearch -m AVC -ts recent . This shows all recent denials.

The output should look something like this (same information, but a little less user friendly):

Getting Logs From a Crashing System

By default, logs stored in /run/log/journal are not persisted across system reboots. That might become a problem if you need to debug logs on a crashing system.

To preserve journal logs, we need to modify /etc/systemd/journald.conf . More specifically, the Storage parameter:

By uncommenting and changing Storage to persistent , we tell systemd to store all logs in var/log/journal . Aside from this change, we also need to run systemctl reload systemd-journald to make sure that the change takes effect.

Even though this change will persist logs on your system, it won’t keep all of them forever. By default, journald is configured to not exceed 10% of the filesystem or leave the system with less than 15% of free space.

Now, to actually inspect the previously-stored logs. First, switch to root user. Run journalctl --list-boots . This will give you a list like this:

Based on the dates and times, choose from which boot you want to see logs. For example, to view logs from the boot with id -2 with log level err or higher:

If the logs above are not enough to troubleshoot your issues, then there are other log files to check:

/var/log/messages
/var/log/boot.log

Alternatively, if you can’t boot your machine normally, then you can access emergency.target as shown above, a view logs there in the same way.

Conclusion

There is a lot more that can go wrong with a Linux machine than what I have shown in the sections above. These examples/approaches, however, can be applied to a variety of other problems that you might encounter.

Also, not all of them are life-and-death kind of situations, but it’s always preferable to be able to solve them rather quickly, especially if this problematic machine is a production system.

Solving most of the issues depends on getting the right information and being able to restore previous configurations, therefore, it’s crucial to always store logs and to backup critical files on your system before modifying them

This article was originally posted at martinheinz.dev

以上所述就是小编给大家介绍的《Save Your Linux Machine From Certain Death》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

Save Your Linux Machine From Certain Death

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

计算机程序设计艺术（第3卷）

Donald E.Knuth / 苏运霖 / 国防工业出版社 / 2002-9 / 98.00元

第3卷的头一次修订对经典计算机排序和查找技术做了最全面的考察。它扩充了第1卷对数据结构的处理，以将大小数据库和内外存储器一并考虑；遴选了精心核验的计算机方法，并对其效率做了定量分析。第3卷的突出特点是对“最优排序”一节的修订和对排列论与通用散列法的讨论。一起来看看《计算机程序设计艺术（第3卷）》这本书的介绍吧!

码农工具

Save Your Linux Machine From Certain Death