Why Separating Good Data From Bad Is a Challenge
How to Increase the Reliability of Your IT Infrastructure Using Predictive Analytics REGISTER >
Good data doesn’t always equate to good quality data, as data may be accurate but irrelevant to the business’ particular need for it.
By D.P. Morrissey
Data may be among the most important assets to a business, yet some databases are often treated like an afterthought. An organization’s data is all too often discarded to some forgotten corner of the data center, collecting the cyber-equivalent of dust and cobwebs.
When that data is finally recognized by business and IT leaders as having potential value, it’s often in a complete state of disarray, and its value gets overshadowed by the sheer number of work hours needed to uncover what is useful and what is irrelevant.
Cleaning up databases can be a time-consuming undertaking, but in an age of data as a business driver, it’s a necessary step to staying relevant. It’s a major project but spring cleaning your databases today could lead to a bounty of data in the future. Yet not all data is considered equal, so understanding the difference between good and bad must be addressed before dressing down your dirty data.
Good data is defined as relevant and accurate data that leads business decision makers to the specific insights they need to make critical decisions or execute key transactions or processes, according to Rex Ahlstrom, a data expert and Chief Strategy Officer at BackOffice Associates with more than 25 years’ experience as an IT leader.
"As data falls into both structured (within corporate systems) and unstructured (residing in data lakes or other sources) categories, companies must know how to evaluate both types to ensure it is usable and applicable for what they are trying to accomplish," he said. "Additionally, it’s important to understand how to best evaluate data that the enterprise owns vs. third party data, such as social media data."
The Hunt for Quality Data
“Good data doesn’t always equate to good quality data, as data may be accurate but irrelevant to the business’ particular need for it,” said Ahlstrom. “The hallmark of good data is that it empowers stakeholders to address the decision at hand—adding quality on top of that foundational layer produces the best result with the insights needed to make the right decisions for the business.”
"Bad data can be described as data that drives business decision makers in the wrong direction for making critical decisions. By using data, you’re either trying to arrive at an informed decision or run a specific business process. If it’s failing, it’s likely because you have the wrong data, poor data quality or you may not be looking in the right place for the data," he said.
“For example, if a pharmaceutical company requires a refrigerated truck to transport specific drugs and the data communicating this requirement from an ERP system was incorrect, that bad data costs the company billions of dollars in product recalls, FDA compliance issues and a host of other headaches," Ahlstrom said.
Setting data standards is critical in measuring the quality of your organization’s data, he said. Understanding data relevancy is also paramount for ensuring that the data being evaluated and used is relevant to a decision being made or a transaction, he said.
"In the case of third party data from external sources, much of this information is aggregated from the Internet in a manner beyond an organization’s control. In this case, it is important to do due diligence to confirm that the data is coming from a valid, trusted source before basing major decisions on it."
Once data standards are in place and an organization has confirmed it has relevant data to work with, the next step is to apply techniques to cleanse the data and understand where anomalies exist. Then, businesses can apply governance processes to address issues and improve the data so decisions are made based on good data as a whole, Ahlstrom said.
As the process of cleaning data continues, Ahlstrom said, be sure to look at this data from others’ perspectives:
*You can't protect what you don't know exists. To determine what data is sensitive or critical, don't just look at your applications from an IT point of view.
* Engage with others to assess needs from differing perspectives: business operations, customers, regulators/auditors and shareholders. Keep this list updated because it evolves.
Despite efforts to sanitize and make data useful, more than one-third of collected data is considered useless. If the trend of indiscriminately storing data continues, it will cost businesses $3.3 trillion per year, according to a study from research firm Vanson Bourne.
"Understanding and acknowledging that a data hoarding culture exists is a first step in addressing the problem," said Ben Gibson, CMO of Veritas, which commissioned the study by research firm Vanson Bourne.
Although more organizations recognize the problem, most do not know what data to start evaluating, what risk it may contain and where the value is discovered, he said. "Once they have visibility into that environment, they can make decisions faster, with more confidence, and bring in other business stakeholders to move forward with a well-conceived plan."
Real-Time Access to Data
"Before now, companies had to use their IT departments for data-related requests, which involved requesting complex reports from technical staff and took long amounts of time to produce. By the time the business users received the reports, the information was likely out of date," Ahlstrom said.
Technology has come a long way to provide direct, real-time access to data. There’s now the ability to analyze big data sets with in-memory computing, Hadoop, data lakes, he said. Companies are able to leverage the scale and power of cloud computing platforms to dig into data and analyze it for its deeper value.
"The science of data has made huge strides with industry experts performing research on it and developing advanced algorithms to derive better value from data. All of these areas are contributing greatly to the advancement of the data market, and there is so much on the horizon," he said.
CIOs have an incredibly important role in ensuring data strategy turns into action, which ultimately turns into revenue, cost savings and ways to base decisions faster and better than the competition.
"Data is of such critical importance to organizations that they can no longer manage it in siloed, one-off approaches that are based on individual departmental needs," according to an Experian Data Quality report. "Data needs to be managed by a central owner within the business …To meet the increasing demands of consumers, businesses have to improve the people, processes and technology around data management across their organization. They need to eliminate silos and accurately assess data challenges.”
Easier said than done.