Adirondacks, 2020. AJKS. Used by permission.

Let’s take a deeper look at the industry trend toward virtual data lakes.

Why do we need a data lake?

A data lake provides a consolidated view and single access point for analytical workloads such as dashboards & reporting for operations, and exploratory analysis & data science research. It expands access to data while restricting access to transactional systems to enforce security and to protect the compute resources of operational systems.

It protects & secures operations and it enables analytics.

Copying the data from operational systems to populate a data lake adds cost and introduces delays. Forcing data into a single storage technology narrows the available tools…


2020 Adirondacks lakeshore. Photo by AKS. Used by permission.

XP for Data Assets

Which metaphor is best-suited to elevate data to first class citizenship in modern organizations? I suggest that we borrow terminology from software development life cycles (SDLCs). To be more specific, from XP.

My audience are organizations that develop software. They understand SDLCs and with help, can see the value in XP. But the choice to use the software metaphor when talking about data is an attempt at broader applicability than metaphors such as data lakes and data warehouses.

So let’s approach data as the representation of human behavior through the lens of human creativity. …


Scotland, 2019. C. Brady. Used by permission.

As Software-as-a-Service (SaaS) grows in each industry, in a world where everything is cloud-hosted, there are some intriguing variations on the themes of multitenant software with data isolation.

Take the time to read the Medium post from Palantir, where they state clearly that they are not a data company but a software company. This is from Palantir is not a data company (Palantir Explained, #1):

We build digital infrastructure for data-driven operations and decision making. Our products serve as the connective tissue between an organisation’s data, its analytics capabilities, and operational execution.

The company publishes and continually upgrades software in…


Forest Thoughts. A.K.S. Used by permission.

I thought it would be easy. Briefly confer with academic experts, snatch an insight from linguistics, and return triumphantly with a new principle to guide us in representing data in business computing.

The insight was that information needs context. That’s pretty obvious, right?

And computing practitioners frequently cross to a new domain, gain a foundational understanding, and carry the seed to a garden in another climate to see how it fares. So this should be easy, right?

Spoiler alert: The seed rarely sprouts. Beyond the native climate, without natural pollinators, facing new pathogens, most transplants wither before maturity.

It just…


View from Mount Colden, Adirondacks, 2020, A.J.K.S. Used by permission.

Data warehouses and data lakes in the wrong hands will horde data, hide data, and hold it back. Data governance that limits data use reduces the value of data to your organization. Data governance that enables innovation, encourages experimentation, and guides decision-making increases its value.

It comes down to this. Data in use has value; data at rest does not.

Incorrect Risks Will Bias Results

One approach to data governance states that the only data risk is poor data quality that leads to decisions made with incorrect data. …


Used by permission. https://www.etsy.com/shop/bradyface

Information loss destroys value, increases waste, and reduces the effectiveness of collaboration. Stop throwing away context and you’ll uncover insights in what you already possess.

This is one of a series of topics on DataOps. Read my earlier posts to get a better understanding of how to apply lean manufacturing to organizations where data is the product that you create and sell to your clients. That’s my working definition of DataOps.

Sections to follow discuss how we lose or destroy context and information about our data as we misplace, mishandle, and outright mangle our raw materials, work in process, and…


By permission of A.K.S.

In the deep dark past — that is, a few decades ago — manufacturers transitioned to lean enterprises. Following shortly after, software organizations championed various flavors of lean software development. In the present, digital companies will recognize their deep debts to lean principles. It is time to learn how those apply to data operations.

I write about DataOps as a culture and a collection of practices for organizations that use data as the raw materials and produce data as the deliverable that clients pay for. Data analytics, for example, can be understood as a manufacturing process. …


Used by permission of A.K.S.

DataOps is a culture, not a process. To repeat a quotation from one of my earlier posts, Building a DataOps Team,

Culture is knowledge that is transmitted to individuals and across time, that can be taught and learned, and that is distinctive to groups.
Nicholas Christakis

When we speak of a culture, we necessarily draw attention to a group of people. …


Used by permission of A.K.S.

DataOps, you know I love you. But you act like a teenager trying to find yourself. Calm down. It’s not that hard. You just need a bit of context and a backstory as you set off to make your mark. Choosing a direction might help, instead of running in all directions at once.

Yeah, sure, you might take it upon yourself to immediately become your father. But, c’mon. Huge bureaucratic data governance is a well-paying job, especially at banks and insurance companies, but this is not really you. …


“Market Mornings” by permission of A.K.S.

Data companies are coming to a consensus on what we mean by DataOps. In short, we seek to bring the benefits of a DevOps culture to organizations with a core competency in data analytics. I’ll clarify this in a moment, but I want to draw your attention to the teams that engage in this work.

If you have a modern software engineering team, you are familiar with the concept of continuous build, continuous integration, and continuous deployment. …

Kevin Kautz

Professional focus on data engineering, data architecture and data governance wherever data is valued.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store