The Emergence of the Citizen Data Scientist

By Guest Author  |  Posted 09-07-2016 Print Email

By becoming citizen data scientists, business users are more invested in the data discovery process as they’re given the tools for performing deep analysis.

By George Whitby

Business Intelligence is changing. It used to be enough to deliver information in the form of reports, dashboards and visualizations. However, over the last few years a new kind of BI consumer has emerged. This is a business user who demands more than just KPIs and data dumps into Excel; who expects the latest analytical technologies to be available at their fingertips: citizen data scientists.

A data scientist is an employee who specializes in extracting knowledge and insight from (often big) data. Typically this is quite a niche and specialized role, performed by only a few people within a business. Citizen data scientists, on the other hand, are traditional business users who are being given the tools to perform this deep analysis themselves.

Why has this happened? One reason is that the traditional data scientist role requires a combination of statistical knowledge, intimate understanding of business processes and a high level of technical ability. The problem is that combining all these skills is really difficult, making quality data scientists a rare and expensive commodity.

On top of this, BI and analytics are becoming more prevalent. Whether it is in your online bank account, sleep tracker app or Fitbit, BI and analysis of data is now commonplace in our lives. 

This means BI consumers are now starting to expect robust reporting as a minimum. They aren’t afraid of trying to explore the data deeper to gain more meaningful understanding themselves. It also conveniently provides demonstrable value of analytics in their lives. In turn, this has put analytics on the agenda in the boardroom.

Citizen Data Scientists Produce Meaningful Analytics

As a result, data discovery tools are proliferating fast. Products like SAP’s Predictive Analytics are tailored toward enabling traditional end users to do meaningful analytics. They provide an intuitive UI, but sit on advanced in-memory databases. This means that not only do users have potent front-end tools, but also the computing power behind them to enable large and complicated queries.

Additionally, big data is now being meaningfully brought into data analysis engines. Tools like HANA VORA enable the querying of Hadoop ‘big-data’ clusters in existing tools. This democratizes big data. Users can now get the questions they want answered themselves.

Imagine you’re an area manager, figuring out which of your customers to target in the next campaign. With SAP Predictive Analytics, you can perform clustering analysis on your customers, using statistical modelling to divide them into groups with similar characteristics. This is particularly useful when you have several analytical options at your disposal,  and where the ideal method of grouping customers may not be immediately obvious and may change by market:

  • You could look at emotional loyalty, weighing the number of purchases against length of ownership.
  • You could compare revenue against frequency of purchase to group your customers by profitability.
  • You could group customers by customer lifecycle, measuring how long they have owned their current product against the average ownership length of their products, enabling you to pinpoint those most likely to be looking for their next purchase.

What’s key is that you, as the area manager, are able to run the analysis and compare the results on your own. You can chop and change the analytical models in an intuitive tool, without needing to email IT to modify the model.

The Benefits of Data Science

Giving end users this ability provides some powerful advantages:

One Version of the Truth: A shared repository means that users have access to the same definitions, business rules and KPIs providing a consistent view of the world. Now, when there is a debate, you can be sure it is about the interpretation of the data rather than the data itself.

Corporate Security: Another advantage of running analysis on your enterprise system is that you get to exploit your existing IT defenses. This ensures that sensitive data sets aren’t being downloaded to laptops but remain securely behind your firewall.

Ownership of the Results: Perhaps most importantly, by becoming the data scientist, the business users become invested in the data discovery process. This leads to them owning the results, meaning they will champion not only the insight they have found, but the data science tool they used to discover it.

So what does this all mean? Citizen data science is on the agenda in 2016 and for the foreseeable future. Expect it to be right up there in conference headlines just as we have seen ‘big data’, ‘simple’, and ‘Internet of things’ in recent years.

But, for all of the recent business fads, this one has some real meat behind it. Whether ‘citizen data science’ sticks around as an emerging trend or not, it is reflective of a real and imminent trend in BI. User empowerment is here to stay.

George Whitby is a BI and Analytics Consultant at Bluefin, a Mindtree company.



 

Submit a Comment

Loading Comments...