The Importance of Evaluating Data Maturity

Gauging your company's data maturity prior to doing deep technical work can save a lot of time and frustration down the road

The first step in operationalizing data is setting the right goals and asking the right questions. While aligning on goals can be deceptively difficult, the concept itself is often straight-forward. However, many companies tend to neglect the seemingly ambiguous next step: understanding the viability of their data.

Gauging data maturity prior to doing deep technical work can save a lot of time and frustration down the road. Knowing the readiness of a business’ data infrastructure will help determine:

Ability to define mission critical key metrics
Ability to extract insights
Ability to produce predictive models which drive better decisions

Consider the following scenario:

The company uses a price forecasting model so the sales team can improve margin. However, after some time, the pipelines from which the data is sourced are changed. Ultimately, stakeholders must decide if rebuilding the model according to the new infrastructure is worth the projected returns.

Note in the above that the business problem is well defined (i.e., improve sales margin), and that the viability of the data infrastructure directly impacts sales performance. In this business case, evaluating data maturity is key to the success of the sales initiative.

How can we evaluate data maturity?

There are multiple aspects of data maturity to consider, including runtime considerations, feature engineering, and integration of sources. The biggest issues when it comes to data maturity and viability are usually related to data quality.

In evaluating the data, we want to address the following issues:

Quantify data quality with metrics: How much faith can the business have in the accuracy of its data?

Take a business with multiple siloed data sources. After analysts comb through many data sources to get a general picture of what’s available for use, stakeholders need a way to quantify the business readiness of the available data.

Clarify unexpected caveats: Are there data quality questions that need to be asked?

A naïve inspection of the data may miss special conditions set by external vendors. A standardized method for raising questions about these conditions should be applied initially to save hours of fumbling down the road.

Prioritize: Which parts of the data do I want to spend my limited time looking at?

Prioritization of data initiatives is often complicated by complex architecture and seemingly arbitrary configuration of pipelines. Part of a good data offering is to convert a messy landscape into a concise map with clear directions.

[Have you considered what a data team can do for your business?]

Data Quality

Simple initial data quality checks include:

Anomaly detection for numerical data

Get rows with data 3 standard deviations from the mean (this could be rolling mean for time series). Often, there will be rows of data where anomalies exist. Sometimes it is expected, and a business simply has a service provider that “just does things a certain way”. Other times, anomalies need to be flagged or removed. Either way, decisions on anomalous data are best made at the data source before processing is done further down the pipeline.

Cardinality

Evaluate if a non-numerical feature has too many unique values compared to the number of instances. Measures may eventually be taken to reduce the unique count of the feature, or the feature may be dropped. Evaluations on cardinality is one quick way to evaluate the viability of a feature early on, clarifying the viability of future data science offerings.

Text answer quality

Frequently, there are multiple versions of the same value for a text feature: e.g., ‘fixed\shared’ and ‘fixed shared’ both represent the same value but have different formats. A custom data mapping may be devised to clean this kind of data if the meaning of the text is clear. The reparability of text answer quality may vary vendor to vendor, and ultimately impacts the viability of data solutions.

Report missing data

Empty cells of tabular data are a common occurrence, and a decision needs to be made on how rows with empty cells are actioned. Are any particular variables necessary? A bird’s-eye view on row count per feature is an immediate way to help evaluate the quality of data before proceeding further.

Enjoying this insight?

Subscribe Now!

Feature Evaluation

While simple data quality evaluations are critical, a deeper evaluation of data maturity should also include feature inventory and feature analysis:

Does the company have the sources necessary to get dependent and independent variables?

How robust is the company’s data architecture and how viable are data engineering offerings? Does the infrastructure viably support the extraction of critical dependent/independent variables?
What dependent variable reflects the business case directly (e.g., churn metrics, revenue)? How is its data quality? Help businesses define their dependent variables.
What variables influence the dependent variables? Which independent variables matter most (e.g., user metrics for customer experience)? How is the data quality of the independent variables?
Does the data structure need to be changed and can it be done in a scalable way? For example, if a data source has variable column count (e.g. month) which needs to melted, this might not be simple to scale for the business requirement.

[Success Story: Creating a Single Source of Truth with a Governed Data Lake]

Outcome

What would an ideal outcome of a data viability evaluation look like?

A “yes” or “no” to answer the question of viability, and if the offering has a Return on Investment (ROI) worth pursuing

Stakeholders often wonder “can we do it?” when it comes to the business problem in mind. Perhaps significant resources have already been invested into the data infrastructure. How does the remaining required effort stand in relation to the ROI?

A sizing of the ROI

A good high-level view includes scope. In addition to a specific number, the major components that go into costs, revenue, and money saved should be sized for the potential initiative. For example, stakeholders will appreciate a report on the costs and projected returns of a server migration, broken down by component.

Qualitative list of top pain points in data quality

Many business problems are going to have unique challenges that don’t fit cleanly into a universal framework. Generally, an investigation into the configuration and sourcing of available data should identify data quality bottlenecks, which helps keep the roadmap to success clear and actionable.

Data engineering/Data quality requirements and suggestions

After an initial diagnosis of pain points, actions may be recommended as prerequisites for the business initiative. This may include quality of life changes such as database configuration suggestions. Having a clear description of specifications helps data teams form plans with confidence.

Data quality metrics organized by feature and data source

Data quality metrics around completeness, variance, and anomalies assist in the prioritization of data sources. Metrics serve as strong arguments for business decisions.

Feature inventory categorized by dependent and independent variables

Before initiatives and data sourcing can be prioritized in relation to each other, having a map of the data landscape in list form helps prevent personnel from overlooking crucial information.

Next Steps

If the data is considered viable for the business problem at hand, and the ROI is worthwhile, the next step is to consider methods to engineer the data to specifically solve the identified problem.

Not every effort will end with the best-case scenario where the data fits viability criteria. No matter the case, the company will have learned more about its data and the issues at hand and will have a more organized technical roadmap for future initiatives.

Curious about what data science can do for your company? Contact us today or visit our Analytics & Insights page to learn more about our services.

Let's Talk

Learn More About RevGen

Subscribe to our Newsletter

Get the latest updates and Insights from RevGen delivered straight to your inbox.

The Importance of Evaluating Data Maturity

How can we evaluate data maturity?

Data Quality

Enjoying this insight?

Feature Evaluation

Outcome

Next Steps

Related Insights

It’s Official: RevGen is a Great Place to Work

Applying Enterprise Architecture Across the Lifecycle

AI-Powered Quality Control in Manufacturing: A Game Changer

Let's Talk

Learn More About RevGen

Subscribe to our Newsletter