Data warehouse overview and a way to approach

Traditional operational or transactional databases did not satisfy the requirements for data analysis. This only for:
– They were designed and optimized to support daily business operations, and their primary concern was ensuring concurrent access by multiple users and, at the same time, providing recovery techniques to guarantee data consistency.
– Typical operational databases contain detailed data, do not include historical data, and perform poorly when executing complex queries that involve many tables or aggregate large volumes of data.
– Furthermore, when users need to analyze the behavior of an organization as a whole, data from several different operational systems must be integrated. This can be a difficult task to accomplish because of the differences in data definition and content.

Therefore, data warehouses were proposed as a solution to the growing demands of decision-making users.

Typical Data Warehouse Architecture
  • The back-end tier is composed of extraction, transformation, and loading (ETL) tools, used to feed data into the data warehouse from operational databases and other data sources, which can be internal or external from the organization, and a data staging area, which is an intermediate database where all the data integration and transformation processes are run prior to the loading of the data into the data warehouse.
  • The data warehouse tier is composed of an enterprise data warehouse and/or several data marts and a metadata repository storing information about the data warehouse and its contents.
  • The OLAP tier is composed of an OLAP server, which provides a multidimensional view of the data, regardless of the actual way in which data are stored in the underlying system.
  • The front-end tier is used for data analysis and visualization. It contains client tools such as OLAP tools, reporting tools, statistical tools, and data mining tools.

Other applications can integrate with Data warehouse to show BI report. They can call API to Reporting Tools, OLAP tools or event query directly to OLAP server or Data warehouse tier server to get data.

The staging layer is used for temporary storage for ETL process. Because the ETL process is long run task and if ETL process has problem, the original data may have changed and thank god, the stage storage is keeping the current data for this and do the replay.

%d bloggers like this: