Databases Are the Weak Point in Big Data Projects
By Karen A. Frenkel
“Modern” database algorithms are still based on 1970s technology, creating a need for updated math to deal with the scale and performance of big data.
MySQL was built in 1995 when the fastest Intel processor was the Pentium Pro. MySQL needs an updated architecture to take advantage of modern hardware.
Database deployment images are configuration-dependent, causing an explosion of packages. Businesses need databases that observe and adapt to any physical, virtual or cloud instances without installation packages for each platform.
DBAs are on a time-consuming tuning treadmill due to inflexible systems, creating a need for databases with online self-optimization for the ebb and flow of data workloads.
Databases typically only handle a single type of workload. Instead, enable the database to concurrently host multiple workloads (ingest, transactions, analytics) without destroying performance.
Fixed algorithm choice only allows a single set of behaviors regardless of the workload. Today’s databases must offer multiple algorithms that can be switched on-the-fly based on workload requirements.
Compression is needed to maximize storage capacity and minimize costs, but current architectures utilize IO, sacrificing performance and scale. Architectures that offload compression overhead are needed to separate CPU threads so the impact on performance is minimal.
Due to algorithm limitations, databases eventually run off the “performance cliff” despite the best possible configurations. It’s time databases show predictable performance at scale and allow for orderly capacity planning.
Database scaling hits a brick wall. Make scaling to billions normal with state-of-the-art algorithms that reduce IOPS through intelligent caching, thereby eliminating unneeded reads and writes.
Multiple issues coalescing in a single infrastructure cause unpredictable data performance. With a more streamlined infrastructure that eliminates production, backup and geo-location silos, companies can avoid platform fragmentation.
ETL process slows business analytics and reduces the depth of insights due to data size versus processing time trade-offs. Next-generation databases need to utilize the same production platform and data set in-place for analytics without ETLs.
Databases use antiquated techniques to process data, creating fragile environments. Today’s businesses need databases that separate memory and disk structures from each other to ensure database integrity.