Server Maintenance and Upgrades.

The file server has been running for years without issue.  It still is.  The other day I thought I’d do a disk test on the drives in the raid array. In order to have enough drives in the system I had to add a raid card.  These are the cards found in Dell servers that can be modified to turn on IT mode allowing SATA drives.  When you get them you modify the firmware in DOS of all things, but it is super simple.  And then you buy 1 or 2 cables with 4 SATA ends for the drives.  I say 1 or 2 because there are two ports on the card where you can plug in a 4 port SATA cable enabling 8 total.  With a 2nd card you could have 16 drives.  With other cards you could have even more.  The motherboard has 6 SATA connectors.  Then adding the card gives you 14 total drives.  If you add another card with 4 that’s 18 drives, which some people use.  In my case I have 6 onboard and the card with one cable bringing it up to 10 total drives.

Today is when I pulled cheapo SATA card and swapped it for one of these Dell Perc cards.  I shut down the server and looked at the hardware and noticed a few other things.  I didn’t have all the memory slots filled and I had a second 1GB NIC installed.

To test I first inserted the new Dell raid card leaving everything intact.  I booted up the server, ssh’d in, and checked to see if the system saw the newly added raid card.  It did.  I then went to the router and removed the server’s assigned IP address which happened to be going to the addon NIC.  I then assigned that IP address to the onboard NIC and set that to work with Wake-on-LAN.  I then swapped the cables from the old cheapo 4 port SATA card for those on the Dell raid card.  I then did a test boot.  Everything worked.

I then brought it back down and grabbed another 4GB stick of RAM and filled the last memory slot.  Instead of booting right into Linux on the server I decided to do a quick memory test.  To do this I pulled the previously existing RAM, leaving only the new stick, and booted to the Grub menu and chose to do a memory test.  I let this run and found no errors so I shut the server back down and inserted the RAM sticks I’d just removed.  I then booted up into Linux and check to see if everything was there and working.  It was.

As I said above the reason I did all this was I had decided to run a disk test on all the drives to ensure they were in good working order.  I generally do this by using SSH to log in and then run gnome-disks.  The problems I encountered were twofold.  Due to the cheap nature of the old 4 port SATA card I couldn’t run the disk test.  Smart drive tests weren’t available.  I need to monitor these drives because they have important data on them.  If one begins to fail I need to replace it quickly.  If I can’t test them I can’t determine their health.

The second problem was that after using SSH to get in I tried to run gnome-disks from the terminal, which I do all the time, and it came up with an error about the display.  The other day I’d had this issue running similar tests for a client remotely.  I found that it didn’t happen when I ran as a regular user but did error out when run as root using sudo.

I found online some info about how to export the variable at the command line and run it. That command is below

export XAUTHORITY=$HOME/.Xauthority

After running that I could then sudo gnome-disk and begin the test.

All the drives passed.

I did notice the power supply fan was beginning to show wear so I replaced the 650w with a 850w semi-modular one.