David Morris, Vice President of Product and Global Marketing at FalconStor, had been thinking a lot about the future of storage. He wondered what it would take to retain data for more than a century. After all, most electronic media die long before that.
It quickly became apparent that the existing system-centric approach would demand the copying of information to new systems 10 to 25 times over the data retention lifecycle. That opens the door to human error, data and operating system incompatibilities, and hardware incompatibility.
Further challenges include data accessibility and application availability. Will the application or database needed to access the data still exist in 50 years? Probably not. Even if the application vendor is still going in 50 or 100 years, the application and its architecture will almost certainly have shifted sufficiently to be incompatible with 50-year-old data, much less 100-year-old data.
“This thought experiment leads us to understand that we need a tandem data and application strategy to access data in the future,” said Morris.
This approach stands in stark contrast to the current mode of the continuous development of application architectures and software coding techniques and languages. Further, data and applications sit on top of operating systems and hardware that are changing rapidly over time. Thus, stored data and applications are stuck with OS and hardware dependencies that quickly move into obsolescence. Applications and data, therefore, inevitably become inaccessible in a relatively short time.
Data needs to come first
Upon consideration of these arguments, a new architecture is needed for storing and archiving data that obviates any system-centric drawbacks. Morris suggested inverting the problem and approaching it from the opposite direction. Instead of a system-centric approach, he advocates a data-centric approach. Data must be given priority over hardware or OSes, as it has value and an extended lifecycle that far exceeds any storage system or platform.
One technology that offers promise in this regard is Linux containers. Containers have already delivered value in the DevOps space. They encapsulate the process of creating distributable units for virtually any application, deploying them at scale into any environment, and streamlining the workflow and responsiveness of agile software organizations.
Containers are a traditionally stateless runtime environment. If they are created with all the existing capabilities but have persistence added, stateful containers could be transformed into data storage containers, not just application execution containers.
“Leveraging Linux-based containers to deliver a data-centric approach breaks traditional data storage limitations and enables enterprises to store archive data across data centers, public, private, hybrid, and multi-cloud data storage environments along with the data’s application,” said Morris.
These containers would use virtualization at the application layer versus the systems layer. This allows them to disaggregate data storage and applications from the system-level components and operating system. The result would be a persistent, long-term data preservation container that is agnostic, heterogeneous, and portable. The big issue would be ensuring that these containers remain compatible with future technology advancements.
From the storage perspective, the runtime environment of containers allows applications to be stored in tandem with data. Storing both within the same container maintains future accessibility to the application and its data, rather than archiving data which then may become inaccessible due to application changes.