John Parkinson: Spread Data Backup Around Your Network

Over the 20-plus years I’ve been using computers, I’ve experienced only three hard drive failures-not bad, considering how many hours of hard drive time I’ve logged. Still, each of those failures was a major event, though I back up my data regularly and keep redundant copies of every critical file. It’s not that I ever lost anything permanently-just that retrieving files was time-consuming and costly. On several occasions I even had to effectively rebuild my hard drive after a major software failure-something that happens if you habitually test beta applications. So I’m always on the lookout for a better way to protect my data, and I’ve tried just about everything….or so I thought.

Too Big for Online Backup
My current “working set” totals about 30 GB (that doesn’t count the 2 TB of images and other content residing on network-attached storage connected to my home network), and it’s moderately volatile-about 10 percent of my files change each week, and the total grows by about 500 MB a month. The biggest culprit, of course, is my Outlook personal folders (.pst) file, which contains all my saved e-mail and attachments-that’s about 2 GB and growing.

This just about rules out the use of online backup services-I’ve tried them in the past, but the costs and bandwidth requirements for such a large amount of frequently changing data make them prohibitive now. My current solution is a weekly backup to a network drive and a daily incremental backup to a portable USB drive powered by my laptop…which works fine, unless I forget to leave all my equipment powered on, or my laptop drive gets fried. Fortunately, the laptop will boot from a USB stick, so I have a workaround if the drive fails while I’m traveling (at the cost of a second copy of Windows, of course). But it was a big job to set up this “automation,” and it’s still not truly automatic. Some files can’t be copied if they’re “in use,” so I have to shut down certain applications (Outlook again) during backup. And recovery (aka copying files back to a new hard drive), should it ever come to that, would require at least an hour.

The Best Laid Plans
All this is why most users-at home and at work-fail to back up their data regularly, despite good intentions, and why corporate IT support people dread “failed hard drive” incidents. Lost or stolen laptops present even tougher challenges-lose your laptop and, unless you’ve backed up to another drive somewhere, your data is gone for good. What’s more, most encryption schemes are relatively easy to break, so even “protected” data isn’t safe if it falls into the wrong hands.

What’s the solution? Is there some affordable product or service that will protect data while it’s on a vulnerable platform; make backups without user interaction or performance impact; and ensure fast, easy recovery if necessary? Maybe, just maybe, there finally is.

Back in the early days of client/server computing, someone somewhere (I think it was at Berkeley but I can’t find any documentation) suggested a strangely simple backup scheme called LOCSS-Lots of Copies Saves Stuff. The idea was that by keeping several copies of every file scattered across the network, there’d always be a good copy available somewhere. It was ahead of its time-it required a good directory system, plenty of bandwidth and cheap storage-but it was great in theory. It worked at the file level, so when part of a file changed you had to update every copy of the entire file, but you could extend the principle by chopping a file into fragments, giving the fragments unique but meaningless identifiers, then scattering the fragments and only updating those that had changed. In practice this would involve some computing cost for large, constantly changing files, but overall it would yield a good performance tradeoff, especially as desktop compute cycles became more plentiful and less expensive.

Now imagine encrypting each fragment as you stored it. Because the file would be scattered in so many places, it would be hard to get all the pieces back together without using the directory engine, and even if you did (in theory, you could use file system metadata), you still couldn’t read the contents without the decryption engine and keys.

And now suppose you could do all this 100 percent reliably. Could you find a place to store all the copies of all the fragments you’ve created? Turns out the answer is yes.

The median corporate desktop configuration these days is 120 GB heading toward 200 GB, but most corporate desktops use less than 20 GB for operating system, applications, local storage, swap file-the works. So we have all this unused disk space on the network, which is generally at least a 10-MB Ethernet connection and most often 100-MB Fast Ethernet. You can easily afford to store three or four copies of every file there, even if you don’t bother with on-the-fly fragment compression.

Objections, Anyone?

Desktops are added and removed from the network unpredictably. No problem. Your new system will reconstruct fragments that go missing from lingering copies. You’ll have to be a bit more careful with the directories, but even they can be distributed as encrypted fragments, so you’re unlikely to lose a whole directory.
It won’t work for laptops. Actually, it will. As long as the overhead of resynchronizing changed fragments isn’t overwhelming, you can sync up each laptop every time it shows up on the network-you just can’t use it for storage of other people’s files. And if it’s a laptop that stays connected to the network, you can just treat it as if it were a desktop.
Too big a performance hit on the network, desktop and/or file server. That depends. On an overloaded shared Ethernet network segment you could see a traffic increase of several percentage points-but in most cases the performance impact is less than three percent, and often less than one percent. You won’t even notice that unless you’re already running at your limit (but most corporate networks aren’t).
Too complicated. Not at all. In fact, after you make a few simple decisions-how much storage to dedicate, how many copies to keep-it’s entirely invisible to the user. Just like part of the file system.
Too expensive. Not as expensive as lost data and the potential damage that can result from it-think security breaches, compliance failures, lost productivity and the like-nor as costly as all those conventional backup and recovery systems everyone buys but nobody uses.

Just When You Thought It Was Safe….
So what’s the downside? This data backup solution, called SANware, comes from a tiny company struggling to get the word out about its product and approach. I think it’s worth a look. So go tell your support team or your favorite VAR to check it out at www.revstor.com. And tell Russ I sent you.

John Parkinson has been a business and technology consultant for more than 20 years.

John Parkinson: Spread Data Backup Around Your Network

John Parkinson

Company

Categories