Allied Building Products Rethinks DR Strategy
No-Size-Fits-All! An Application-Down Approach for Your Cloud Transformation REGISTER >
A major building products distributor uses Hurricane Sandy as a starting point for developing a robust disaster-recovery strategy and infrastructure.
By Samuel Greengard
A year after Hurricane Sandy slammed the east coast of the United States, leaving behind somewhere in the neighborhood of $68 billion in damage, organizations are continuing to rethink and reshape the way they approach disaster recovery (DR). Allied Building Products Corp., a distributer of building materials—including roofing, siding, insulation, wallboard, windows and doors—is among the businesses that have stormed ahead with a more robust DR strategy.
The company, which boosts $2 billion in annual revenues and operates nearly 200 locations across the U.S., found its East Rutherford, N.J., data center under about four feet of water after the hurricane struck on October 29, 2012. "All of our networking and communications systems were contained in the facility, including physical servers, storage subsystems and other equipment," says Scott Fischer, information technology director for Allied. The firm lost data center operations at about 10 p.m. that night. Adding to the company’s misery, a secondary data center holding replicated data was also knocked off line because it too was located in the storm zone.
The event had serious consequences. During the outage, the lack of the ERP system meant that sites around the country couldn't look up inventory in the warehouse or at another store. As a result, there was no way to provide customers with an expected delivery time during the outage. "Although the Website wasn't impacted and we don't do a lot of e-commerce, employees and customers were affected in other ways," Fischer explains. For example, employees had to hand write orders and tickets and submit them later.
Fischer immediately began to piece together a plan for resuming operations. He and other top IT officials at Allied assembled at a SunGard facility in Philadelphia that held the organization's backup data. There, they began bringing enterprise systems back online and restoring data, based on tiers, to physical and virtual servers. At the peak of the restore process, the IT group worked around the clock and managed half a dozen restore operations simultaneously. The team spent approximately 40 hours getting the systems back up and running. By the evening of October 31, the company was once again able to function, but it took nearly four months to complete a new data center.
Fischer says Allied learned a number of lessons from Hurricane Sandy. "We had adopted the attitude that a disaster of this magnitude wouldn't occur—and so we didn't take the advice to build out a more robust disaster recovery infrastructure," Fischer says. "SunGard restored everything that was in our contract, but we discovered that we didn't necessarily have all the equipment and backups in place that we needed. Some applications and data weren't covered."
That led to a complete review of the DR environment and the addition of new virtual servers and storage devices. Allied remapped the way it handles storage tasks and how it uses local storage. It also pointed out a need for additional services, including SunGard's managed recovery process. "We now have everything in our contract that would allow us to recover 100 percent of our applications," Fischer notes. In addition, Allied relocated its secondary data center from 10 miles away to Scottsdale, Ariz. "There is almost no chance that we will be adversely impacted in the event of a future regional disaster," he points out.
In the end, Fischer says that the disaster was a "relatively small bump" in the road and that, on the upside, it led to a far better disaster recovery strategy and infrastructure. Allied Building Products is now equipped to navigate a disaster or data center failure with minimal downtime or adversity. "Disaster recovery is not something to take lightly,” says Fischer. "We learned that having systems down for 40 hours is not acceptable in today's electronic and highly transactional environment. Systems and data must continue running."