Many computer systems are set up with RAID to help guard against data loss or the service interruption that a hard disk crash can cause. In theory, it is a great idea. In practice, there are several details that if ignored can make the whole practice useless, and potentially create a false sense of safety.
Like a Backup, RAID must be tested.
This is something I did when setting up my own RAID. I set up the system with data, and then disconnected a drive and deleted the partition. I then checked to see that I got a warning, and went through the rebuild procedure. I also saw that the system would boot and run in a degraded state.
The system I just mentioned was a Linux based software RAID. That is my preference. It is hardware neutral. It gives an email alert to degraded state, as well as a visual warning at boot. Boot time warnings are of little value, since a high paid tech will not likely sit and watch system notices go by during boot. System notices in the desktop tray also go unnoticed.
The reason I raise the issue of testing a RAID system is because I was recently called upon to help someone with an erratically performing computer system running Windows 7. It turns out that the erratic behavior was because the hard drive was failing. That shouldn’t have been to bad of a problem, since the last time the system was fixed, the owner had the shop put the system disk in a RAID configuration. Everything should be safe, right?
Wrong, it turns out that even though there were two disks on the RAID controller, the mirror disk was never really added to the RAID array. The system was never mirrored. I couldn’t restore the system from the mirror because it never got even a single bit of data. Tabula rasa. I’m happy to inform that all the important data on that system was on the many other attached disks on that system, and not on the system disk. This user escaped data loss, this time.
What About Catastrophe?
I mean big catastrophe? Even if you have RAID, you still need offsite backup. Flood, fire, theft, earthquake, twister, or building collapse, computer malware – all of these things can claim your data. Off site storage of important!
Preferably, you want something with versioning. You want a backup system where you can go back in time to restore a file that was deleted or mangled in some way. When we did tape backups (and some may still do so) you would rotate the tapes in the backup set. Keeping each for several weeks at a time. You would also keep monthly and quarterly snapshot tapes. Sometimes keep a yearly tape in long term archive.
The price reduction in online storage makes tape backups obsolete in many cases. Tapes had some disadvantages. They were hardware dependent – you needed the right drive to restore them, so you kept a spare drive with your disaster recovery kit. Switching tapes was a chore that required diligence and a trusted person. The tapes themselves posed a data security risk and had to be kept secure.
Some people tried using USB hard drives. These were bulky, and susceptible to damage and loss. In addition, the USB interface was slow, and backups of large data stores could take a very long time. Data transfer over USB maxed out at about 9 GB / hour. Backups could take many hours, and you didn’t have good RTO (restore time objective) and RPO (restore point objective) because of the slow data transfers. Nevertheless, a lot of business owners saw them as good enough because the drives were cheap. What they didn’t factor in was the risks and the costs of all the handling.
Online Backup Better But Watch Out For …
Cheap online data storage is nice. It can remove a lot of the worries we had with backup tapes. Data can be encrypted before transfer. Backup can be continuous too. But continuous backup schemes are usually just offline mirroring. That’s great for guarding against physical catastrophe such as fire, flood, etc. What it generally does not guard against is file deletion or file corruption.
True online backup solutions generally cost more than mirroring (file sync) solutions. They retain data sets for weeks. They reduce bandwidth (network usage) by transferring only changes to files – block level. Therefore they are a safer solution than the file sync / mirroring solutions, but they cannot attain the RPO that file sync can. You may lose a few hours of work with this type of backup solution, and open files may pose an issue unless the software is able to do shadow copy.
How To Lower RTO In Catastrophe
Online backup services require time to get your data back out. For large stores, that can be a while. Some offer disk delivery service for next day. You can however, get a file server that is hardened against catastrophe. This leaves you with salvagable data disks should fire or flood ravage your premises.
In addition, these run a bulletproof operating system – Linux. IOSafe has file servers (more than just file servers, actually) that are hardened against fire and flood. They have RAID level 1, and several other features that a small business can benefit from. They require very little maintenance, unlike a Microsoft server. IOSafe NAS are also a lot less costly than a Microsoft server. Combined with a offsite backup service, this can represent the best centralized storage option for a business.
Evaluate the importance of your data. Evaluate how soon you need to regain access to that data. Evaluate how many hours of work / data you can afford to lose. Evaluate what down time will mean to your customers.
Based upon your assessment, decide what you need for backup. You may need to combine several of the solutions mentioned to get good coverage. You may also need to make some compromises and assume some risk if the coverage is too costly for your ideal solution.
Whatever you decide upon, find out and document how to recover from catastrophe. Store that documentation in a safe place off site. Test that those systems work – that you get failure notices, and that you can actually do a data restore.
Periodically (yearly) review your data recovery systems and data value. adjust your safety systems accordingly.
0 comments on “Is your RAID safe?”Add yours →