Kevin Kautz
Feb 9, 2024

While true that a dataframe is missing context, so is any matrix. SQL, which you seem to dislike, was designed primarily as a calculus for matrices, and as such, gives more context than dataframes do. SQL is better than dataframes, in this regard. In essence, a relational data model is the current standard, not the dataframes that we use to pass along portions of data that we gathered from a relational data model.

What I suggest is that you don't want to improve dataframes. You actually want to improve the relational model and then let your dataframes hold references to it. That's the way to add semantic context and retain it.

Kevin Kautz
Kevin Kautz

Written by Kevin Kautz

Professional focus on data engineering, data architecture and data governance wherever data is valued.

No responses yet