Common Data Warehouse and Business Intelligence Reference Architectures

This is a high-level summary that provides a few options of data warehousing and business intelligence solutions used by organizations and is not a recommendation for any one solution. There are many variables that must go into the decision and selection of an architecture that fits your organizational needs. You must not only consider your current needs but anticipate future needs, so please be sure to understand all aspects of a solution prior to committing to it. Data solutions can be very expensive and take a long time to develop, but can be an invaluable asset in driving data-driven decisions for your organization.

 

This article outlines common data warehousing and business intelligence reference architectures. There are many variations to these that I will not cover, but I hope this will provide you with a general understanding of the different architectures and their pros and cons.

 

Data Vault Architecture

The Data Vault Architecture is a methodology that makes it easy to store structured data. Performance to load a data vault can reach near real time loads, as there are mostly inserts rather than updates or deletes. This architecture is largely used where auditing of the data is needed – what source systems changed what data. This is a flexible architecture that allows the addition of additional data sources to be integrated easily. To accomplish the speed of inserts and auditing of data, a large number of tables are needed when compared to a dimensional model. This can add additional complexity if trying to report from or get data out of a data vault. Therefore, the data vault is often between source systems and a data warehouse (dimensional model).

Data Vault Reference Architecture

Figure 1: Data Vault Reference Architecture

 

 

 

Pros Cons
·         Agile architecture makes it easy to integrate new data sources or make changes to existing load procedures

·         Strong audit history that logs each change

·         Pattern based loads allow for code generation to create load packages

·         Faster load times as most data are inserts rather than updates

·         Selecting data can be slow because of the number of joins required

·         Contains many tables, which complicates ad hoc reporting

·         Unable to load non-structured data

Table 1: Data Vault Pros and Cons

 

Data Warehouse Architecture

The Data Warehouse or Dimensional Model follows the Kimball Methodology, which is a denormalized architecture that is designed for reporting. The denormalized architecture reduces the number of tables and complex table joins to improve performance on getting data out. A staging environment which could be a data vault, or a data lake is used to land data without transformations from each of the source systems. The data integration and interoperability layer handles the complex transformation logic to shape the data into a dimensional model.

Data Warehouse Reference Architecture

Figure 2: Data Warehouse Reference Architecture

 

Pros Cons
·         Transformed and structured, making it easier to report from

·         Uses business names and definitions

·         Conforms data from disparate systems

·         Requires upfront architecture and development

·         Not flexible with changing business needs

Table 2: Data Warehouse Pros and Cons

 

 

Modern Data Warehouse Architecture

 

With cloud offerings, such as those offered by Microsoft Azure, Amazon Web Services (AWS) or Google Cloud Platform (GCP), organizations have even more options to implement their data strategies. One newer data option is an IaaS (Infrastructure as a Service) offering called a Data Lake. Data Lakes allow you to store data, both structured and unstructured data, in a single location. This centralized data store can be a good option for organizations that have both structured and unstructured data that they plan to use, now or in the future. There is minimal design or architecture needed up front to plan how you will store the data. Think of a file system where you store your text documents, images, videos, PDFs, etc. and how you creat