Total Pageviews

Friday, 29 August 2014

Basic Elements of the Data Warehouse

Source System
An operational system of record whose function it is to capture the transactions of the
Business; sometimes called Legacy Systems in mainframe environment. Most preferably source system stores only current data; so no historical data is preserved here. These systems are input to data warehouse and data is called source data. Source data can also exist in the form of flat files, excel etc.
Data Staging Area
A storage area and set of processes that clean, transform, combine, de-duplicate, and prepare source data for use in the data warehouse. The data staging area is everything in between the source system and the presentation server. Input to staging area can be from multiple data sources and its always good to have all source data at one place before loading into the data warehouse or presentation server. Data staging area may contain a system of flat files also which are letter used to load into data warehouse system.

Presentation Server
The target physical machine on which the data warehouse data is organized and stored for direct querying by end users, report writers, and other applications. Storage in the presentation area will mostly be in RDBMS tables which are later used to analysis of data by querying database tables.

Dimensional Model
A specific discipline for modeling data that is an alternative to entity-relationship (E/R) modelling. A dimensional model contains the same information as an E/R model but packages the data in a symmetric format whose design goals are user understandbility, query performance, and resilience to change.

The main components of a dimensional model are fact tables and dimension tables. A fact table is the primary table in each dimensional model that contains measurements of the business. Every Fact table contains keys from dimensional tables called as foreign keys.

Dimensional tables mostly contain textual attributes (fields) and a primary key serves as foreign key in fact tables.

Data Mart
A logical subset of the complete data warehouse & it is specific to one subject area like Sales data mart etc. A data warehouse will be union of all data marts & will contain multiple fact tables from different data marts and fact tables from multiple data marts can be joined on the basis of conformed dimensions. A conformed dimension is one which definition across data marts does not change; foe eg. Time dimension, Geography dimension.

Data Warehouse
The queryable source of data in the enterprise. The data warehouse is nothing more than the union of all the constituent data marts. A data warehouse is fed from the data staging area.

Operational Data Store (ODS)
Stores Current data and recent history like in bank ATM ODS where bank stored Current balances and recent history; also no bank would likely to access data warehouse for mini transaction at ATM and so they go for ODS. And as ODS needs continuous operational access and updates, it should be housed outside of data warehouse.

End User Application/Tools
A collection of tools that query, analyze, and present information targeted to support a business need.

No comments:

Post a Comment