I thought it would be easy. Briefly confer with academic experts, snatch an insight from linguistics, and return triumphantly with a new principle to guide us in representing data in business computing.
The insight was that information needs context. That’s pretty obvious, right?
And computing practitioners frequently cross to a new domain, gain a foundational understanding, and carry the seed to a garden in another climate to see how it fares. So this should be easy, right?
Spoiler alert: The seed rarely sprouts. Beyond the native climate, without natural pollinators, facing new pathogens, most transplants wither before maturity.
It just seems obvious to me that computing systems need to represent semantics (pragmatics, to be precise), to reason about context and causality, and to treat data as a record of the behaviors of people in a complex discourse rather than as transactions recorded in a business ledger.
I haven’t given up. I continue to try. But so far, I have more failures than successes in attempts to represent context in data pipelines and data analytics.
Please note that if you don’t like reading about failure modes, and you prefer to jump ahead to the best practice recommendations, you can find them at the end of this article.
Failures of context
Many software development projects fail. A few fail spectacularly, but most simply fall short and provide little return on investment. Most do not unlock new revenue. Most have unclear value propositions. We accept incremental benefits and move on. We stop investing in the projects because we lose interest and don’t see a reason to continue.
Why, though, do most projects fail to keep the excitement alive?
When they succeed, software projects lead to sustainable communities of users who choose this above that, who are happy with their choice, and who encourage others to follow or to join.
When projects fail, it is because users cannot see themselves using this software. They don’t see the point. They find other ways to address their needs, or they decide to leave the needs unaddressed if that is less painful than to reshape their world to fit an ill-designed system.
Software projects fail when they do not fit the context. Can we do a better job of representing context in our systems? Can data include context as well as transactions? Can we give software systems a better sense of themselves and where they fit into a knowledge domain or business context?
I believe that many software failures and many data failures are failures of context. Let’s look at some of the ways that this plays out.
High-Fidelity Fails to Elicit Context
Prototyping is a phase within design thinking, in which low fidelity prototypes are the most effective. Low fidelity prototypes invite users to engage in the design process, to explain their context, to actively work with you, and to take greater ownership of the result. They lead to greater learning and to shared understanding of the unsolved problems.
High-fidelity prototypes look like they’re finished. They may sway you to buy before you’re sure how it will actually work in your situation. They look good and you’re tempted to stop struggling with your people, process and technology problems and to leap into a rosy future where you can start over in a fresh new context.
High-fidelity prototypes prevent conversations about the real world context that the users face. They prevent knowledge transfer from clients to developers. They avoid the hard work of creating a common language and a common understanding of the context in which the software will be of use.
This is an example of failing to elicit and to value business context within product management. The better path in product management is to use low-fidelity prototypes, avoid writing requirements, and continuously engage business users and software engineers in open collaboration for as long as it takes to get it right for this unique business situation.
Process Complexity Hides Context
Some of us love to look at the big picture. Those who love the details frequently narrow their focus and avoid the distractions of what is outside their control. The inability to bridge these is where process complexity leads to context failures.
When a skilled business analyst guides an organization to exactly the right key performance indicator (KPI), that measure is meaningful to those whose only concern is the day-to-day details, and it also meaningful to those who make decisions based on a long-range understanding of many interlocking processes.
Good KPIs are rare. More often, the measures of the operations teams are too narrow to help decision-makers. And the oversimplified KPIs that analysts provide to executives lead to decisions that unfortunately do not accurately reflect reality on the ground.
To understand the operations process, it is necessary to drill down from the big picture to the detail, and to zoom back up from reality to the projections we use for planning for what has not yet happened or is not yet possible.
Value stream mapping, a tool that comes from lean process engineering in manufacturing, is one of the best tools to create a common understanding of context that works at both the level of operations and the level of planning.
Data Models Don’t Capture Context
The most unsettling lesson that I’ve learned about data modeling is how it destroys context. As we finish modeling and start development, context is actively discarded until all we have left is a set of tables in a database and no memory of the knowledge that led to that design.
The biggest loss happens when we decide that we need this data and we do not need that data. We choose to represent only certain facts and only certain measures. The information that we do not select is far greater than what we do. What is not shown in a data structure is truly “out of sight and out of mind”. No one asks about data that is never captured.
Loss of context also happens when behaviors and business relationships are represented in data elements. The decision to make a piece of data into an integer prevents us from learning that a half or a quarter of something was used. The choice to limit a classifier field to a fixed set of values forces operations to pick from the list and thereby force a messy reality to fit that artificial list of values, and this leads to incorrect decisions because we never knew about the messy reality.
Mishandling data types or field lengths across a data pipeline mean that data values are often discarded or replaced by artificial rules of data representation in databases or comma-delimited flat files. Many transformations are accidental and unintentional. If we do not profile raw data before it is transformed, we may never know.
Developers and analysts only see the data that we choose to represent, and only in a format that was constrained by time & space & format. So much information is lost along the way that is no longer visible.
What is a better way?
Start with different tools to represent the ontology before you attempt to represent the data. Identify real-world behaviors of humans and organizations and transactions and intentions and regulations and risks. Show how the interplay of these can alter what is significant to decision-making.
Use your ontology to find the right questions. Describe the business context in narratives, in personas, in how people change priorities as they make decisions based on risks that they are aware of.
Work backward from software capabilities — to real questions that clients have — to the measures and representations that will help them make decisions — to the data sources and data representations. Let the ontology guide you to what is significant. And then change your data models and your data pipelines so that the right data makes it through in the right shape.
And profile your data. Profile the raw data before you change it. Profile intermediate data as transformations and standardizations happen. Profile the final production data. Compare those profiles to learn what your pipelines are doing to your data. Compare those over time to learn what you’re missing as the real world changes.
Lost Expertise Destroys Context
Sometimes we don’t lose expertise. We just don’t value it. If an organization does not continually increase trust between teams, and rebuild trust where it has been broken, we learn to forge ahead with our existing understanding of context without calibrating this against what others understand and we do not.
Sometimes we hoard expertise and do not expose it. This is not always for lack of trust. It may be caused by the narrow focus of one team who never knew that another team needed to understand what they do.
And of course, sometimes people leave the organization and their voice and their knowledge is no longer available. If we did not sufficiently value what they knew, or if we never learned what they knew, those that remain cannot know what it is that they do not know.
There is no easy path to capturing expertise and making it visible so that it becomes shared knowledge. We must actively assign tasks to capture, to present, and to preserve knowledge. It definitely helps if we built a culture of trust and a culture that values the contributions of others.
Best Practices in Context Preservation
Pulling it all together, the way to preserve context is as follows.
- Use design thinking to learn context by engaging directly with clients.
- Apply techniques from lean process engineering to identify value creation and to create a common language between operations and planning.
- Model the ontology to find what is significant enough to capture in data.
- Curate, disseminate and trust the accumulated knowledge of experts.
If your software development teams learn the context, treat important information with respect, share values with clients and decision-makers, and trust the experts, your projects will lead to high engagement and to enthusiastic users.