My webservers have been quite stable, probably 3-4 months now without rebooting. I have 2 copies of a disk on the main server, but since I will be going to China soon, I thought I would create a third copy and put it in a third server as a further backup. if both servers go down, the third one can rebooted with the correct IP quickly.
I use gmirror to duplicate the master disk as a backup, but somehow if I stop the gmirror before rebooting the machine, then the slave would not be bootable. so I always reboot and take one drive out and swap another one in. The risk here is that if I take the wrong drive out (master), the new drive might automatically start gmirror (since it was not stopped the previous round), it can destroy the good copy in a few seconds. Gmirror does a bit by bit copying so only a few bits on the drive will render it not bootable.
This is what happened today. The main server has 8 sata ports, one drive (I thought was the slave) was on sata6. But unix says it was ad0, not ad6! the machine booted from this one, instead of sata4. After a while I realized the machine booted from the wrong drive and rebooted again (I should have issued “gmirror forget gm0” to disable gmirror). I made sure the master was sata4 and slave sata6, this was still before I realized this motherboard was confused with sata numbers. I rebooted with the other good copy! and it was destroyed in seconds…Now I lost two main drives! I should have booted with one single drive (the good one) first…or tried a different machine.
Luckily most data (wordpress posts, both html files and mysql data) are backed up daily and automatically transferred (through rsync) to the backup server. I had to use a backup server disk on the main server. Unluckily, I did not backup some files, eg. http.conf! I had create a new one…restore web files…trying to remember what else was missing…this took most of the day today.
setting up rsync again took me hours…it simply refused to work! finally I created a new user and it worked in a few minutes…something wrong my regular user name.
What a horrible mistake! actually I made the same mistake twice!
Trying to get a third copy and ending up destroying two good ones!
本来是主/付服务器都一直很稳定， 不要备份算了。 但是昨天把副的搞了第3份， 很顺利， 5分钟搞定。
怕回国期间万一出问题， 想把主机也备份吧。 现在只有2份， 在同一个机器里。
看到一个HD上有Tape， 以为是Slave， 换了一个HD进去， 结果这个换了的启动， 把那个好的备份破坏了（自动启动Gmirror， Bit by bit 考到好盘了 — 只要几秒， 好盘就没有了， 不能启动了）。
我 看了看机器， 又把另一个好的放进去，放在4号Sata， 将要备份的放在6号，但是，又被写了！ 最后才发现， 一般的电脑都是SATA号小的启动成为Master， 但是这个怪！ 明明是SATA6， 在 Unix被认识成Ad0！ 不是AD６， 酿成大错！ 我没有别的备份， 只有网页＋Mysql的数据库每晚自动备份到副机。 我为啥以前没有发现？ 而且为啥第一个被破坏了， 还要在试一次？ 换个机器就没事了啊。
但是Httpd.conf都没有最新的！花了几个小时重新写。 还好， 网页都回来了。 但是最后发现2011年上的照片没有了（只上了一会议的， 还好）。 Http 的Log也都没有了 （这样没有今年的Access Stats了）。 其他少了啥？ 还么有发现。
以后好多Config File得定期备份， 如/etc/rc.conf, httpd.conf, named 的DB file等。 要不搞死人了。