March 23, 2017

Muscle Memory

It’s really a bitch.  It does have the consequence of producing some very bad results.  Working hard to get something done quickly to solve a problem that you aren’t familiar with then finding the solution then putting that solution in place where you use muscle memory instead of thinking through the actual issue.  Here’s what happened.  Man was I perplexed and when I discovered what I’d actually done was a pissed at myself.

My main website is self hosted.  I use letsencrypt to provide the certificates.  Every so often you get a message that the certificate is out of date and needs to be renewed.  I did this by creating a script that runs via a cronjob off the root account every 2.5 months.  One script triggers as root every 2 and a half months.  Not a big deal except I was getting emails telling me that my certificates were expiring.  This shouldn’t have happened as that script should have taken care of it.

I decided to go in and run the script manually as root to see why.  I received a message that one of the sites that the certificate handles was missing.  What had happened is that the domain name doesn’t actually serve its own pages.  It redirects to my main site.  So, in my wisdom at some point I just decided to delete the folder and the apache sites-available configuration file for that website hoping that the redirect could do it on its own.  Nope, as letsencrypt let me know.

I then decided to put it back into place by creating the /var/www/html/<site in question>.  After that I copied the site’s .conf and -le-ssl.conf files with the site in question’s name.  Here’s where muscle memory kicked in.  I edited the file making the correct changes but I failed to realize that I was editing the wrong .conf files and instead changed the main sites files and as a result all I got was a “Index of /” message.  So, in my haste to get the certificate processed properly by letsencrypt I hurridly edited the wrong file and failed to check my work.  Well, maybe I did check the site, but with browsers like Firefox it will keep a bunch of stuff in memory rather than renewing the pages.  Even if you hit the refresh it doesn’t always work.  I’ve had to exit firefox and even reboot the computer.  Anyway I think I failed to check.  So, for at least a few days, maybe more than a week, my site was unavailable.

I didn’t really know what the problem was.  I decided to backtrack all the complicated settings and options.  For instance, since it is a wordpress site I checked the files to ensure they were there.  It bothered me that maybe my daily backups were being created with damage.  I backup the website every early morning.  I also backup the mysql database associated with it.  If it happened to be file corruption that meant that my archives were being backed up with corrupt files.  So, I checked the /var/www/html/<main site>.  Everything looked OK.  I tried to get into the admin login.  No luck.  I checked my other sites that appeared to be working and they seemed to be pretty much showing me my files were good to go.

My next thought was to reboot the server.  After a reboot it didn’t solve the issue.  My next thought was to trigger a file system check on the next reboot using sudo touch /forcefsck.  That didn’t resolve it either.  I’m not sure why but I rebooted a total of 3 times.

Progressing from that I decided that maybe the mysql database was corrupt.  I generally don’t play around with database administration these days.  Maybe I should.  I checked my main sites site configuration script and found the proper database name and password.  Then I loaded mysql at the command prompt and couldn’t remember how to open the database that I needed.  A quick search of google showed me what I should have known use <database>; .  BTW, I kept forgetting to end the line with the semi-colon.  I opened the database and told it to list the tables (had to look that up too).  Everything looked fine.  So I thought I should check the database’s integrity.  Didn’t know how to do that so back to google search.  Found it.  Wow, I have a lot of databases in there and a lot of tables.  That consistency check told me that everything was fine.

At this point I was going to shut down the server and take the boot drive and run a file system check on another computer.  That didn’t happen as I thought I’d wait till the very last thing if I couldn’t resolve it.

My next thought was that my pfsense router software had just been updated and maybe it was the culprit.  I kept thinking back to the idea that the ssl certificate was the problem.  So I disabled the port forward in the firewall and tried to access the server internally using the local IP address.  Same issue.  I tried it in google chrome as oppposed to firefox.  Same thing.  I tried it on the Mac using my iMac and my Macbook Pro.  Same issue.  I rebooted my main workstation.  Same issue.

That’s when I decided to check to see if the apache2 configuration files were properly formatted.  I checked the various files locating first the ports.conf.  I used that some time ago and later abandoned it in favor of doing it the proper way which is to use configuration files for each site in the /etc/apach2/sites-available folder.  I viewed the contents of the main site’s configuration files and right away noticed that when I was trying to resolve the issue with letsencrypt renewals for the secondary redirected site I’d edited the wrong configuration files.  I quickly corrected that.  I then recreated the necessary configuration files for the redirected site and edited those properly.  I then restarted apache2:  sudo systemctl restart apache2

BINGO!  That was it.  Wow, what a royal screw up one that if I hadn’t decided to check the site just to see if it was up, could have persisted for a long time.  I now have committed to looking at the sites every morning when I get in as part of my routine.  Hopefully this debacle will teach me a lesson and get me to always double check my changes before walking away for the day.