Analytics & Insights
The Five Data Governance Questions Every Organization Needs to Answer
When beginning a data governance journey, every organization needs to consider these five important questionsRead More
Author: Jeremy Nelson
The patterns and paradigms that industry has followed for data warehousing over the last two decades have been stretched to their breaking points by the ever-increasing demand for data. Far too often a business has ended up with a complex, unwieldly data warehouse solution that requires large, expensive teams to maintain. The cost and difficulty of maintaining these types of solutions have become an accepted evil in the pursuit of data-driven analytical insights to inform decision making.
It is time for an evolution. We must abandon the methodologies we have followed for so long in favor of new ones informed by hard-earned experience and updated for the modern world. The Metadata-Driven Extract/Load/Transform (ELT) approach is that next evolution for data warehousing.
The key concept behind Metadata-Driven ELT is simple. There are common patterns in data warehousing; why not define those patterns in easily mutable metadata? One example of a common pattern that can be defined in metadata is the replication of data from source systems into a common data repository. Instead of manually creating new custom pipelines for each source system, imagine rapidly and dynamically creating new pipelines with built-in data logging and data lineage through the simple management of your metadata.
Another common use case for organizations is mastering the subject of customer objects. Metadata-Driven ELT allows for the use of data science techniques to generate the metadata mapping between source customer objects and the final, conformed customer analytical object. This type of metadata mapping can be used to rapidly create a uniform customer ELT pipeline.
Metadata-Driven ELT starts with a metadata repository. This repository, typically comma-separated values (CSV) files or tables in a special database, contains the metadata that drives ELT pipelines. The key is that the metadata is in a simple, easy-to-modify format that enables even non-technical users to create, modify and schedule ELT pipelines. The use of CSV files or tables as the metadata repository allows for an easy-to-use, flexible, and extendable abstraction layer that massively reduces the effort involved to manage ELT pipelines.
The Metadata-Driven ELT approach is the answer to many of the pain points the data warehousing industry has from traditional methodologies. In summary, here are some of the potential advantages of using this approach:
Every organization must be forward thinking in their data strategy. The benefits of implementing a Metadata-Driven ELT approach go beyond optimizing your data ingestion and transformation pipelines. Additional benefits include consistent logging and monitoring, built in row-level data lineage, and innate data governance. These are just a few of the many best practice patterns that can be defined in metadata and abstracted from the end-user.
The end result of Metadata-Driven ELT is a streamlined, configurable, extendable data ecosystem powerful enough to accelerate data science and business intelligence initiatives. Exploring this approach to data warehousing is right for any organization looking to optimize their data ecosystem to deliver timely, accurate, high-fidelity data to inform data-driven decision making.
Have question about Metadata-Driven ELT? Schedule time to chat with one of RevGen’s experts.
Jeremy Nelson is a Senior Technical Consultant specializing in cloud, data warehousing and software solutions for data initiatives. He is passionate about empowering clients to realize the full potential of their data.