Overcoming the Data Glut
How to Increase the Reliability of Your IT Infrastructure Using Predictive Analytics REGISTER >
We deal with a lot of data daily--collecting it from a bevy of sources in thousands of formats; standardizing it all into a common format and cleaning it up as much as possible; updating databases and archives; and so on.
Not surprising for an information commerce business, but also not without its challenges.
When you move such massive volumes of data, you need a lot of bandwidth in both wide-area and local-area networks. And you need a lot of storage. That's where the challenges start.
We have demanding data-performance requirements, complex data life cycles and complex retention requirements, so many of the usual strategies don't work well for us.
When I started looking at the data management challenge a couple of years ago, we were in a race between the growth in the data volumes to be stored and the declining cost of storage at various performance levels. It was clearly a race we were in danger of losing.
If we continued on the path we were taking, we'd spend our entire capital budget on storage within a few years. Something had to give.
We looked for a strategy that would allow us to store and protect our data, but at a reasonable cost. We aren't done yet--we may never be "done"--but we've made progress. Here's what worked--and some issues yet to be resolved.
Limiting the data that sits on the highest performing and most expensive storage is essential, but it isn't easy. When you've been used to having fast access to your data, it's hard to learn to think ahead and schedule recall from lower performing storage systems for those few occasions when high performance matters. A combination of better definition of management classes and retention periods has helped, but the hardest part has been changing habits--and then keeping up the new storage management disciplines over time.
I was surprised at how much data gathering and analysis was required to get our new management and retention policies in place. And wasn't I ready for the amount of backsliding that occurs if we don't both automate the enforcement of those policies and regularly report on how well they are being followed.
But we are making progress--we will eliminate nearly 50 percent of our Tier-1 storage requirements this year and make significant progress on Tier 2 and Tier 3. These are critical gains. It will save us enough capital to invest in better tools and monitoring capabilities, things we now know we will need to succeed.
We've made less progress on the life cycle of the platforms. Every four years we have to replace the arrays; if we keep them longer, the drive failures rate increases dramatically.
Even if the drives did not fail, data movement speeds improve enough over a four-year period that I will want to swap out the storage array (which I can't usually upgrade) to take advantage of these faster speeds.
So a refresh every four years is going to take place. What we don't like is (a) how much data we have to move (our largest arrays hold over 400 TB) and (b) that at every refresh we have to repurchase the software that runs the arrays and the tools that manage them.
It's as if every time I replace a server I have to re-buy the OS license, rather than transfer it. We are no longer willing to do that, leading to some interesting conversations with our vendors.
By the end of 2009 we will be making some critical decisions about the direction our storage platforms will take for the next five years. It's going to be interesting.