This is a high-level summary that provides a few options of data warehousing and business intelligence solutions used by organizations and is not a recommendation for any one solution. There are many variables that must go into the decision and selection of an architecture that fits your organizational needs. You must not only consider your current needs but anticipate future needs, so please be sure to understand all aspects of a solution prior to committing to it. Data solutions can be very expensive and take a long time to develop, but can be an invaluable asset in driving data-driven decisions for your organization.
This article outlines common data warehousing and business intelligence reference architectures. There are many variations to these that I will not cover, but I hope this will provide you with a general understanding of the different architectures and their pros and cons.
Data Vault Architecture
The Data Vault Architecture is a methodology that makes it easy to store structured data. Performance to load a data vault can reach near real time loads, as there are mostly inserts rather than updates or deletes. This architecture is largely used where auditing of the data is needed – what source systems changed what data. This is a flexible architecture that allows the addition of additional data sources to be integrated easily. To accomplish the speed of inserts and auditing of data, a large number of tables are needed when compared to a dimensional model. This can add additional complexity if trying to report from or get data out of a data vault. Therefore, the data vault is often between source systems and a data warehouse (dimensional model).
Figure 1: Data Vault Reference Architecture
|· Agile architecture makes it easy to integrate new data sources or make changes to existing load procedures
· Strong audit history that logs each change
· Pattern based loads allow for code generation to create load packages
· Faster load times as most data are inserts rather than updates
|· Selecting data can be slow because of the number of joins required
· Contains many tables, which complicates ad hoc reporting
· Unable to load non-structured data
Table 1: Data Vault Pros and Cons
Data Warehouse Architecture
The Data Warehouse or Dimensional Model follows the Kimball Methodology, which is a denormalized architecture that is designed for reporting. The denormalized architecture reduces the number of tables and complex table joins to improve performance on getting data out. A staging environment which could be a data vault, or a data lake is used to land data without transformations from each of the source systems. The data integration and interoperability layer handles the complex transformation logic to shape the data into a dimensional model.
Figure 2: Data Warehouse Reference Architecture
|· Transformed and structured, making it easier to report from
· Uses business names and definitions
· Conforms data from disparate systems
|· Requires upfront architecture and development
· Not flexible with changing business needs
Table 2: Data Warehouse Pros and Cons
Modern Data Warehouse Architecture
With cloud offerings, such as those offered by Microsoft Azure, Amazon Web Services (AWS) or Google Cloud Platform (GCP), organizations have even more opti