So my server in our datacenter was running low on disk space. And I finally took the plunge to upgrade the raid drives.
The drives are configured in RAID-1. This is where you have 2 drives reading/writing in sync. Should one drive fail, then the mirror becomes the active. When you replace the failed drive the RAID1 array will rsync the drive. So to extend disk space the plan would be to swap out 1 disk, let them resync, then swap out the other and repeat. Simple no? Sadly not so.
So after shutting down the server and replacing the first disk, I kicked started the sync process. All seemingly went well but after shutdown and replacing the 2nd drive, the server would not start. Turns out that even with RAID, the boot partition (EFI) does not get sync’ed. So in this case I had be replace the original drive back to boot up. As the new drive was added to the RAID array, it needed to be removed before I could repartition it and add both an EFI partition as well as the linux raid partition. After resync’ing I had to use the grub-installer to install the bootloader to the new drive’s partition. Another key task was to update the /etc/fstab file. This controls which drive gets mounted to where, including the boot mounts. Turns out there was a misconfiguration. Ideally it’s best to use UUID for determining which drive to mount. In cases where the hard drive is replaced the drive id can change rendering issues with missing mounts and boot errors, which happened to me.
Finally have a shutdown and upgrading the remaining nvme drive, repartitioning and resync all was well.
What started in the afternoon eventually took 4 hours to resolve!
The moral of the story? Always schedule your hardware upgrades earlier in the day and make a large coffee. Alternatively, one could just go cloud native and not have to worry about this. A few clicks et volia disk space upgraded!