The Single Point of Failure
As I started to write my blog post for last Friday, I found my RAID array was dead. Despite my best intentions, I fell victim to the single point of failure.
It started off as a curiosity. I came home one evening to find that my Dropbox account didn't sync. Turned out to be due to a loss of communication between my iMac and my RAID system. A flick of a switch and I was back in business, and I chalked it up to some random power fluctuation.
That was my first mistake. The system is behind a UPS and nothing else seemed out of place. Even if the power had gone out, the UPS would keep things chugging along.
Trouble-shooting the Problem
Later, I experienced the communication problem between my RAID array and my iMac a few more times. Within two days, my RAID array simply would not operate. I thought of the most likely problems and the worst case problems.
Of all the problems to have, the nicest one would be due to a bad cable. Maybe the eSATA port I had installed in my iMac by OWC had gone bad. I tried switching to using a FireWire cable and the system worked fine – for about two minutes. Then it failed again. So much for the easy problem.
The next step was to contact the RAID enclosure vendor, Oyen Digital. I didn't want to tink around with this thing. Instead, I'd rather just buy another enclosure and move my disks into it. Every time it initially started working, I could see everything was in its place. There were no warnings of drive failures, so I believe my data was safe. The problem was with the enclosure itself.
The support tech suggested that it could be due to a power supply failure. That was only $7.95, so I bought it and had it shipped overnight. $30 for overnight, but well worth it if this resolved my problem.
You'd think my next step would be to buy another enclosure, but it's not that simple. Most RAID enclosures are proprietary to an extent because of the encoding chip inside. The company that manufactured my RAID enclosure has since upgraded its product and uses a different chip. I cannot buy another enclosure, from this vendor or any other, that will work with the disks in my array. I have to send it back for repair, which means that I'll be out of service for at least a week considering shipping times, perhaps longer.
The Single Point of Failure Strikes Again
I've spent most of my life in the Information Technology business. That means I know how to protect data. Of course, it also means that I suffer from Vocational Irony – a professional who is unable to help himself, much like how the cobbler's children have no shoes.
- I know that power outages can cause damage, so I have a UPS
- I know that data gets corrupted, so I keep backups
- I know that disks fail, so I have a RAID system
The problem is there is only so much risk you can mitigate or you spend all of your money trying to prevent the inevitable. I know that a RAID system is not invulnerable, but I believed that one of the disks inside the array would fail before the enclosure itself failed. That's where I was wrong. The enclosure, of course, is the single point of failure in my system. It's the Achille's Heel. If it fails, all else fails.
Yes, I have backups of my photos. Most of them. Some of the most recent images, I'm not so sure, but they're still on my CF cards. That means I would have only lost my edits and metadata, but the images are safely backed-up somewhere.
Backups are never complete, though. My RAID system also holds a plethora of other data. Music, eBooks and movies are among the most prevalent. Much of this I can get from iTunes, but not all of it. Then there are programs that I've bought, specific videos and other training from individual sources. Again, I can download some of that, but not all of it.
I didn't spend the money on a duplicate system for my backup. It was always something I was going to do, but hadn't done it yet. The fact is that this stuff costs money and I spent the last year laid-off from my previous employer and working to make ends meet, so the backup system I wanted was a luxury. At least it seemed so at the time. It seems like a necessity right now.
It's Not Enough to Have a Backup
Having a backup, whether partial or complete, isn't enough. You can't run your system (in most cases) from a backup. You need to be able to restore that backup somewhere. Until I get my RAID enclosure repaired or replaced, I have nowhere to restore my backups.
The resolution to this problem is to spend money.
First, I need to get a larger backup system equivalent to all of my data files. The current system of a couple of terabyte drives is both inadequate and inelegant. Since my RAID array offers 6 terabytes of usable storage, I'll get another 6 terabyte system for backup even though I'm not using all of that capacity. In fact, I may get a larger backup system because I foresee my data storage needs growing as I buy more digital entertainment and because I'm now shooting with a D800 that creates 36mp files.
Second, I need to return my RAID enclosure to Oyen Digital for repair. Of course, the warranty was good for two years and that ended last November. Nothing of mine ever seems to break down while it's under warranty. I've no idea how much that will cost. Considering that the enclosure itself sells for under $300, I'm hoping the repair will be under that price.
Finally, I need to buy a new storage system. This one is really going to cost me money. Repairing my old enclosure just buys me time. It's already proven that RAID systems can and do fail. I'm not sure if Oyen Digital will be able to repair it in another couple of years, as technology moves forward and the parts necessary to repair it may become unavailable. Sustaining a technological product beyond it's end of life is unwise.
Ideally, I'd like to have a SAN. If money were no object, that's what I'd buy for my home as I'm also in the process of purchasing this technology in my day job. Realistically, I just can't afford it. I know I'm going to buy something else that will also have a single point of failure and ultimately die.
Dealing with Setbacks
If there's a moral to this story, it's that problems arise despite your best preparations and intentions. All you can do is accept the problems and deal with them. Although I'm bothered by the enclosure failure, I also realize that my data isn't gone. Repairing the enclosure is the next step, but not the last possibility. If the folks at Oyen Digital find there is some reason they cannot repair my enclosure (which I think is unlikely at this stage), then there are data recovery folks who could get my information and move it to another drive. Now that would be costly, but it's yet another option.
As I've shared this story on social media over the past couple of days, the inevitable happened. People crawl out of the woodwork to tell you all the things you did wrong, why you should never buy a proprietary RAID system, and other stuff like that.
I ignore those people.
First, telling someone what they did wrong isn't really helpful in a situation like this one. It's more likely those folks just like to use another person's misery to make themselves feel superior. Besides, I already know what I did wrong and I accepted the risks associated with it.
Second, just try to avoid getting a proprietary RAID system these days. People warn about Drobo as a proprietary system, but so are the other enclosures on the market. By that, it means that the drives cannot be yanked out of an enclosure from one vendor and put into the enclosure of another vendor as it nothing else happened. It all boils down to the chip used to encode them and those chips vary. There are some folks who will build their RAID out of Linux systems, but that's just a geeked-out conundrum of its own. Most people don't want to build such a system, it takes up quite a bite more space, runs much slower and uses more power. It isn't better. It's just a different kind of problem in itself.
Finally, there's risk in everything. We all have different ways of dealing with it. I'm actually still happy that I used this RAID enclosure because it still saved my data from complete loss. Getting access to it may be cumbersome right now, but my data isn't blown away. That little thought tells me that I didn't really do everything wrong. In a week or so, I'll be chugging along just fine after spending money on some new disk system.
There is still light at the end of the tunnel.