Easing Hardware Lifecycle Management
You have probably seen the trade ads and promotional materials from Sun Microsystems for its Project Blackbox data center in a shipping container concept. I'm intrigued by this strange idea that you can modularize many of the elements of a data center. Prompted by an idea from my boss, I've been playing with a variant of this modular concept for a couple of months.
You can pack a lot of technology into a 40-foot shipping container: That's enough space for 700 blade servers or around 9,000 terabytes of storage. Even with some spares and failover capacity built in, you can do a decent amount of work with a few containers.
I'm not thinking of actually using shipping containers to house our infrastructure, but the modularity approach got me to thinking. One of our business-as-usual operational challenges is the replacement cycle for our infrastructure technology. Hardware is a capital asset and gets depreciated over a number of years before being replaced. Because technology changes so fast, the depreciation period is short compared with longer-lived assets like buildings and power and cooling plants.
We, in effect, rebuild a significant fraction of our infrastructure annually. Even with an efficient planning and provisioning process, new equipment gets located all over the data center, and the commissioning and decommissioning process carries risks of unintended service interruptions. I'd like to make this easier and more reliable, so here's what I'm thinking.
Suppose I modularize my infrastructure by age of equipment. In the first year, I put in pod No. 1 all the equipment I plan to use for the next three years; the second year, new equipment (also expected to be used for three years) is placed in pod No. 2; in the third year, a third portion of newly acquired equipment goes in pod No. 3, and so on.
In year four, when the equipment in pod No. 1 is fully depreciated, I take it out of service and replace it with new equipment, for which I should pay less for the same capability or get more capability for the same cost. In year five, I replace the contents of pod No. 2, and so on.
If I can make this work, most of the risky physical activity (connecting, disconnecting and moving equipment) and risky logical activity (provisioning and configuration) will be in just one pod, and none of this activity will touch the "in use" production infrastructure. In most cases, I won't ever have to move equipment within a pod or between pods.
There are some challenges. First, I need a lot of discipline around capacity planning, because I don't want to make many changes to the other pods. Second, I need a lot of standardization within each pod, so I can get as much leverage as possible from my engineering teams and have an appropriate mix of compute, store and connect technologies at each age.
Third, I need to invest in interconnection capability, so workloads can be moved from pod to pod as the equipment refresh cycles proceed. And fourth, I need to think hard about actual physical layouts within a pod.
These are not simple problems, but I can manage them more easily with a modular approach than with the existing situation in our data center. It also would simplify rolling upgrades if I decide to switch vendors or change architectures. I'm getting really close to my ideal of balancing stability with agility.
I'm going to start trying this soon. Anyone out there ahead of me?