(I am writing this to pay back to the Web, because it helped me to solve a problem and my post might help someone else.)
Two weeks ago a disk of my Hetzner root server was slowly degrading. I asked for an exchange, got the go from Hetzner support, removed the disk from RAID1. Hetzner installed a new disk and booted up. I added the disk to the RAID, resync, done.
Request Rescue-System |
Here is what I did to fix it:
Check if it boots into the rescue system
Log in to the Hetzner management site (https://robot.your-server.de/) and enable the rescue-system. Then request an automatic hardware reset.Connect to the server, log in as root. This means, there is no hardware problem and no persistent routing problem.
Request Hardware-Reset |
I need to know what happens on the console during boot
Asked Hetzner support to attach a LARA remote console. The console shows the local screen output and accepts keyboard input. It is even possible to re-configure the BIOS.Request hardware reset and watch the console. I see the memory test, scanning for devices, 2 disks found (fine), then "Booting from local disk"...
Remote console Java applet |
Check the file system and the re-install the bootloader
Request the rescue-system and then a hardware-reset. Connect to the rescue system.Possible to mount the RAID?
# mount /dev/md1 /mnt
Yes. Do a file system check (first umount the file system):
# ummount /mnt
# fsck /dev/md1;
Shows some (many) errors. Fixed them by staying on the "y" key. Could use the auto repair option of fsck (-y).
Re-Install the bootloader
This is a Hetzner installimage-setup, so there should be a grub bootloader. Check for the /boot/grub/ folder. Again mount the RAID.# mount /dev/md1 /mnt
# ls /mnt/boot/grub
It is there, so there is a good chance, that the bootloader is grub, not lilo. Now re-install grub on the disk. Actually on both disks, just in case one is missing.
Make a chroot environment:
# mount /dev/md1 /mnt
# mount -t none -o bind /dev /mnt/dev
# mount -t proc -o bind /proc /mnt/proc
# mount -t sysfs -o bind /sys /mnt/sys
# chroot /mnt
Grub:
# grub
Look for the file stage1 to find the boot partitions
grub> find /grub/stage1
(hd0,1)
(hd1,1)
Install the bootloader on both partitions. Both are regarded as hd0 from the point of view of the bootloader at boot time.
grub> device (hd0) /dev/sda
device (hd0) /dev/sda
grub> root (hd0,1)
root (hd0,1)
Filesystem type is ext2fs, partition type 0xfd
grub> setup (hd0)
setup (hd0)
Checking if "/boot/grub/stage1" exists... yes
Checking if "/boot/grub/stage2" exists... yes
Checking if "/boot/grub/e2fs_stage1_5" exists... yes
Running "embed /boot/grub/e2fs_stage1_5 (hd0)"... 17 sectors are embedded.
succeeded
Running "install /boot/grub/stage1 (hd0) (hd0)1+17 p (hd0,1)...
succeeded
Done.
The same for the other disk
grub> device (hd0) /dev/sdb
grub> root (hd1,1)
grub> setup (hd0)
grub> quit
And reboot - works.
(Maybe my mistake was not to re-install the bootloader after swapping the disk. I expected, that the RAID1 resync would make both disks identical on the sector level. Maybe this assumption is wrong. The fsck problems may also indicate, that the boot sector was affected by disk problems, who knows. These things happen, especially to part time admins.)
_happy_grubbing()