Topics
- 1. What is Data Warehouse ?
- 2. Key Characteristics of a Data Warehouse
- 3. Need, Importance & Components of Data Warehousing
- 4. Data Warehouse vs OLTP
- 5. Architecture, Advantages & Applications of a Data Warehouse
- 6. Data Warehouse (DW) Delivery Process
A Data Warehouse is a centralized repository that stores integrated data from multiple heterogeneous sources (databases, CRM systems, ERP, logs, spreadsheets, etc.). It supports business intelligence (BI), reporting, and decision-making.
Formal Definition (Bill Inmon – Father of Data Warehousing):A data warehouse is a subject-oriented, integrated, non-volatile, and time-variant collection of data to support management’s decision-making process.
Ralph Kimball’s Definition (Dimensional Model Advocate):A data warehouse is a centralized repository providing users access to data organized for analysis, typically using facts and dimensions.
| Characteristic | Description |
|---|---|
| Subject-Oriented | Organized around key business subjects such as Sales, Customers, Products, Finance, Inventory. Focuses on analysis, not transactional processes. |
| Integrated | Combines data from multiple sources: cleaned, transformed, standardized. Example: consistent customer names, date formats, currencies, product codes. |
| Non-Volatile | Data rarely changes once stored. Mostly read-only. Refreshed periodically via ETL processes. |
| Time-Variant | Stores historical data over long periods (months/years) for trend analysis, forecasting, and comparisons. Example: Sales by region over 5 years, seasonal trends. |
-
Businesses generate huge amounts of daily data.
-
Operational systems (OLTP) focus on transaction processing, speed, and routine operations, but are not ideal for analysis.
Benefits of Data Warehousing:
a) Better decision-making
b) Data consistency and accuracy
c) Fast query performance
d) Historical trend analysis
e) Supports BI and data mining
Components:
-
Data Sources – OLTP databases, CRM, ERP, spreadsheets, logs.
-
ETL (Extract, Transform, Load) – Extracts data, cleans it, standardizes formats, and loads into DW.
-
Data Storage – Can be:
-
Relational databases (RDBMS)
-
Multidimensional databases
-
Cloud warehouses (Snowflake, BigQuery, Redshift)
-
-
Metadata Repository – Stores information about data: definitions, source mapping, ETL rules, table descriptions.
-
Data Marts – Focused subsets of DW for specific business areas (e.g., Marketing Mart, Sales Mart, Finance Mart).
-
OLAP (Online Analytical Processing) – Enables complex, multi-dimensional analysis (slice, dice, drill-down, roll-up).
| Feature | Data Warehouse (OLAP) | OLTP (Operational DB) |
|---|---|---|
| Purpose | Analysis & reporting | Day-to-day transactions |
| Data Type | Historical | Current |
| Updates | Periodic (batch) | Frequent (real-time) |
| Queries | Complex | Simple |
| Orientation | Subject-oriented | Process-oriented |
| Design | De-normalized | Highly normalized |
-
Single-tier Architecture – Minimizes data redundancy; rarely used.
-
Two-tier Architecture – Client → Server; suitable for small systems.
-
Three-tier Architecture (Most Common):
-
Bottom Tier: Data sources + ETL
-
Middle Tier: OLAP server / Data Warehouse
- Top Tier: Front-end tools (reports, dashboards, BI tools)
Advantages
-
a) Improved data quality & consistency
b) Faster decision-making
c) Historical analysis & future forecasting
d) Better business intelligence
e) Supports data mining & analytics
Application
-
Retail: Sales forecasting, customer behavior analysis
-
Banking: Fraud detection, risk analysis
-
Healthcare: Treatment patterns, insurance claims
-
Telecom: Customer churn analysis
-
Government: Tax analysis, social data
-
Manufacturing: Supply chain optimization