DataOps and Data Quality

7 min readSep 7, 2019

DataOps is a culture, not a process. To repeat a quotation from one of my earlier posts, Building a DataOps Team,

Culture is knowledge that is transmitted to individuals and across time, that can be taught and learned, and that is distinctive to groups.
— Nicholas Christakis

When we speak of a culture, we necessarily draw attention to a group of people. In the case of a DataOps culture, the people are those that construct and operate data pipelines and data analytics applications, especially when the data and insights derived from it are the products that your organization sells to others.

If you monetize data and insights, you need to encourage your teams to develop a DataOps culture.

Start with a Broad DataOps Team

Culture is not produced by organizational structure. But organizational structure can kill culture. Be careful at the start. The DataOps culture absolutely requires a view that is broader than design & engineering.

DataOps Can Learn from DevOps

Let’s look at DevOps for a moment. DevOps is an active and valuable movement, and we can learn from its successes and failures. DataOps inherits some cultural values from DevOps, so it’s worth a quick comparison.

Many organizations fail in the first attempts to implement DevOps because they only engage design and engineering. This leads to Test-First and CI/CD and then … generally, it completely stops, right there. That’s a DevOps failure. You’ve used DevOps to improve Dev but not Ops. Test-First and CI/CD are great behaviors, but they’re only engineering behaviors. You’re not successful with DevOps until you also address production operations, monitoring in production, the ability to tune performance and adjust scalability “live” in production, and so much more.

DataOps Must Extend Beyond Engineering

DataOps also needs a broader perspective. If you attempt to introduce a DataOps culture by engaging only design and engineering teams, you’re missing the point. DataOps needs to be a shared culture across teams. The whole point is to establish shared knowledge in a larger group with diverse perspectives and needs.

If your engineers do not know when your data is good or bad, expand your team to include those who do. I repeat (because so many miss this point), please do not start working on DataOps with only a group of software engineers, not even if their titles have recently changed to “data engineer”. You need people who deal with production data every day and who handle client inquiries and recover from data quality failures in production.

If possible, also include the wide-view product managers who deal with value propositions and funding and market viability and go-to-market and A-B testing and intentional product obsolescence. Be careful with this one. If your organization uses “product owner” as a role, that’s not what we are talking about. Product owners live within the design & engineering teams as the voice of those outside. Product owners live within the engineering glass house, even if their role encourages them to look out of those glass walls. Wide-view product managers, on the other hand, bring additional perspectives far beyond those of engineering and operations. It is those additional perspectives that we are looking for.

Just as with the introduction of DevOps culture, the shared DataOps culture can span product management, design, engineering, deployment, production monitoring, operations, and direct client support. DataOps is not intended for engineers but for a larger and more diverse group of people that are not dominated or directed by engineering concerns.

First DataOps Goal? Data Quality Automation

It’s not the only place to start, but it’s a good one. You may already have what you need to make an immediate impact. This is a good choice if you want to demonstrate added value with some quick wins.

Notice, please, that I did not say to start with Data Quality. I said to start with the automation of it.

And now, we immediately encounter a culture collision. We have to talk about the word process and not everyone will understand this in the same way.

What Does the Word “Process” Mean?

Automation, by definition, is the automation of a process that was carried out by people before it was automated. Operations teams love to talk about process measurements, process improvements, and to speak about success & failure in terms of processes that succeed or fail.

Software engineering, teams, on the other hand, have learned to have a disdain for “process”. When they use this word, they don’t mean the same thing as operations teams do. There are good reasons for engineers to push back against unwieldy practices that interfere with design & engineering success. Process is a code word for many engineering teams, and it means “non-value-added activities that prevent me from doing value-added stuff”.

Sigh.

If you’re talking about “value add”, you’re talking about process. That’s a principle from Lean Manufacturing. If you speak of “adding value” as a way to fight against “process”, you’ve lost your way. You’ve forgotten what Lean is. (We could write a book on the ways in which agile methodologies have lost their way. But that’s an incendiary assertion for another day.)

This culture collision over the meaning of the word “process” is one of the reasons why I emphasize that Lean Manufacturing is one of the foundations for the DataOps culture. “Adding value” and “process waste” have very clear meanings in Lean Manufacturing. Every agile methodology in software engineering teams relies on concepts borrowed from Lean. If we return to that foundation, we can align to a common understanding of terminology. Lean Manufacturing is very process-oriented. The word “process” can become the point of overlap where both engineering and operations share the desire to remove waste from process.

But we will save the conversation about Lean Manufacturing principles for another time.

Add Automation to Human Behaviors

For now, here’s the place to start. We need to learn how data quality is measured in production operations, and focus on how to automate the human behaviors that define data quality, test for data quality, and remediate data quality failures.

It’s not hard to define data quality in a production operations process. You already do this. You may not recognize it for what it is.

Data Quality for Inputs

When data arrives as input to a software application or component, you will already have: (1) an expected schema, (2) an expected format, (3) an expected interface or location, (4) an expected size, (5) an expected time of arrival, and (6) an expected range of values in each column or field. Please feel free to extend “schema” and “format” to include flat-file, relational, JSON, and any number of other varieties.

Tests for each of the (1) through (6) can be automated. It’s not hard to imagine. You’re likely to be already checking each of these, but it may be the operations team who is doing it.

The point is to automate the above checks. Define just enough metadata so that you can write the rules for success and failure based on metadata. The automated code that runs the tests (in production) should be generic enough to use the metadata to test for expected conditions.

Did the schema or format change from what was expected? Did data fail to arrive when and where it was expected? Did the size or content of the data surprise you?

Capture the results of your data input tests. Collect these results in a queryable data structure so you can monitor trends. Eventually, you’ll get more sophisticated and you’ll want to write cooler tests such as “number of rows in a weekly incoming file should be the same or greater than prior weeks”, or “the percentage of null values in this column should vary no more than 10% from the trend line of the percentage of null values from the prior three months”. Cool tests require trending of data profiling statistics.

And of course, automate the alerts when tests fails. Automation does not remove the people from the conversation. It simply takes the drudgery out of the process. It runs the set of tests and collects the results and notifies the humans when something does not match expectations.

Data Quality for Outputs

As you begin to recognize that data quality tests can be pushed upstream (at least, within your own organization’s pipelines), you’ll realize that you also need to a good citizen about your downstream data consumers. If your downstream partners are going to push their data quality requirements upstream to you, then you need to test your outputs against their stated expectations. You should test both inputs and outputs.

With outputs, you’ll need to know something more interesting. You need to know what you published last time, and the time before. And some data profiling statistics about data that you published. And trend lines. And … oh … yeah … you can write tests that succeed or fail on the outputs that you produce, so that you know immediately if this is going to set off alarms in the downstream processes. To speak in manufacturing terms, you need to ensure that what you’re producing falls within the specifications and precision measurements that are needed downstream.

This begins to get more interesting than the tests for inputs. We’re now talking about testing for “fitness for use”. You have to know what the “use” is. How will downstream applications use the data that you’re producing? If you get good at this, you’ll find that this concept also applies when data is going to leave your organization and go to your clients and business partners. And that will start to have “teeth” — legal, contractual, regulatory compliance. Data quality measures can demonstrate compliance.

Here are a few common “fitness for use” tests and remediation behaviors: (1) referential integrity in outputs, (2) consistency across time, (3) restatements of prior published data, (4) freshness, and (5) tagging of data that has special regulatory or contractual meaning.

Once again, humans in your operations teams are likely to already have ways to handle each of these tests and remediation behaviors. Your goal is to collect these behaviors and to start to automate them.

Summary

The DataOps culture requires a common understanding, a common language, and shared knowledge between people of different skills and responsibilities. Engaging an extended team (or teams) with your DataOps cultural transition is essential to success. A good first goal is to identify existing human behaviors related to data quality and to automate them.

There’s more to the DataOps culture than the automation of data quality behaviors. But data quality automation is absolutely essential to DataOps because it demonstrates that the extended team shares the cultural value of data quality, and actually does something about it.

DataOps and Data Quality

Start with a Broad DataOps Team

First DataOps Goal? Data Quality Automation

Summary

Written by Kevin Kautz