The patterns and paradigms that industry has followed for data warehousing over the last two decades have been stretched to their breaking points by the ever-increasing demand for data. Far too often a business has ended up with a complex, unwieldly data warehouse solution that requires large, expensive teams to maintain. The cost and difficulty of maintaining these types of solutions have become an accepted evil in the pursuit of data-driven analytical insights to inform decision making.
It is time for an evolution. We must abandon the methodologies we have followed for so long in favor of new ones informed by hard-earned experience and updated for the modern world. The Metadata-Driven Extract/Load/Transform (ELT) approach is that next evolution for data warehousing.
What is Metadata-Driven ELT?
The key concept behind Metadata-Driven ELT is simple. There are common patterns in data warehousing; why not define those patterns in easily mutable metadata? One example of a common pattern that can be defined in metadata is the replication of data from source systems into a common data repository. Instead of manually creating new custom pipelines for each source system, imagine rapidly and dynamically creating new pipelines with built-in data logging and data lineage through the simple management of your metadata.
Another common use case for organizations is mastering the subject of customer objects. Metadata-Driven ELT allows for the use of data science techniques to generate the metadata mapping between source customer objects and the final, conformed customer analytical object. This type of metadata mapping can be used to rapidly create a uniform customer ELT pipeline.
Metadata-Driven ELT starts with a metadata repository. This repository, typically comma-separated values (CSV) files or tables in a special database, contains the metadata that drives ELT pipelines. The key is that the metadata is in a simple, easy-to-modify format that enables even non-technical users to create, modify and schedule ELT pipelines. The use of CSV files or tables as the metadata repository allows for an easy-to-use, flexible, and extendable abstraction layer that massively reduces the effort involved to manage ELT pipelines.
Why Metadata-Driven ELT?
The Metadata-Driven ELT approach is the answer to many of the pain points the data warehousing industry has from traditional methodologies. In summary, here are some of the potential advantages of using this approach:
- Uniformity: Complexity is obviously an issue in data warehousing. This approach yields uniform data ingestion and transformation, pipeline orchestration, and logging. This ultimately results in a much more consistent, simple ecosystem.
- Agility: This approach allows for unique flexibility in the definition and orchestration of your ELT pipelines. Both template-driven and custom pipelines are available.
- Easy to Scale: Smooth scaling is demonstrated by the ease of adding new ELT pipelines just by merely defining metadata.
- Maintainability: Since everything from business logic to data flow is in the form of documents or tables, it’s easy to maintain this approach.
- Acceleration: Go from zero to data-warehousing-in-the-cloud in a fraction of the time it takes to build a custom solution.
Is Metadata-Driven ELT right for my organization?
Every organization must be forward thinking in their data strategy. The benefits of implementing a Metadata-Driven ELT approach go beyond optimizing your data ingestion and transformation pipelines. Additional benefits include consistent logging and monitoring, built in row-level data lineage, and innate data governance. These are just a few of the many best practice patterns that can be defined in metadata and abstracted from the end-user.
The end result of Metadata-Driven ELT is a streamlined, configurable, extendable data ecosystem powerful enough to accelerate data science and business intelligence initiatives. Exploring this approach to data warehousing is right for any organization looking to optimize their data ecosystem to deliver timely, accurate, high-fidelity data to inform data-driven decision making.
Have question about Metadata-Driven ELT? Schedule time to chat with one of RevGen’s experts.