Monday, March 31, 2014

Basic Data Base Concepts you should know before working on a Business Intelligence Project

Logical Design
Logical design can also be called as a conceptual design or an abstract. It does not consider the physical implementation details. In logical design we only define information specified by requirements. Entity-Relationship modeling is one of the most popular techniques' used for logical modeling. IN E-R Model mostly important data elements (Entities) are identified, then properties of these entities are identified and lastly relationships among different entities is established. In relational databases, an entity usually forms a table and an attribute which define the uniqueness of the entity forms a column.
To ensure data consistency unique identifiers are used , a unique identifier is added to tables so that we can differentiate between the same item when it appears at different places, unique identifier is  actually referred as primary key.
Though E-R Model has been mostly used for OLTP(Online Transactional Processing) databases, the technique is also very useful during the  data warehouse design in the form of dimensional modeling. In dimensional modeling facts and dimensions are associated using relationships.
Diagram below shows a very basic and simple E-R Diagram, where I have included only two entities 'Student' and 'School' they are linked with a relationship 'goes to' and each entity has 'attributes'.













 So to summarize

      – An entity  represents some data.
– Attributes represent properties of data.
   – Relationships are links in between an Entities.


Physical Design
In the physical design data gathered during the logical design phase is converted into the description of a physical database structure. Physical design decisions  directly affect the query performance and database maintenance aspects.
At this step we do the mapping:
•         Entities to tables
•         Relationships to foreign key constraints
•         Attributes to columns
•         Primary unique identifiers to primary key constraints
•         Unique identifiers to unique key constraints
Table spaces, Partitioned Tables, Views, Dimensions etc are created after the physical design. Materialized views, indexes etc can also be created for performance tuning.


Fact Tables & Dimension Tables
Fact tables and Dimension Tables are the two common types of objects found in any dimensional data warehouse schema.
Fact tables are tables in your data warehouse that store measurements they are typically business measures. Measures are the factual data that businesses usually want to analyze, such as total_sales or unit_cost. Fact tables typically contain facts (Measures) and foreign keys of the dimension tables. SALES, COST, and PROFIT  are examples of facts.
Dimension tables, are also called as  lookup or reference tables. Dimension tables store the information (metadata) that you normally use to fire queries. Dimension tables most of the times are descriptive. Exs. are CUSTOMERS and PRODUCTS.
Each dimension table is joined to the fact table using primary-foreign keys' relationships'.


Schema
Schemas are nothing but a way of logically grouping database objects such as tables, views, stored procedures etc. Schemas' in database concepts are also defined as containers' of database objects.
Users can then be assigned to schemas so that users can only access what they are authorized to access.
The logical schema is the structure of the tables and relationships in a database.
The physical one is how this structure and the data it holds are stored.

The star schema is the simplest form of data warehouse schema. It is called as star schema because it's logical structure refers a 'star' where' in usually at the center there happens to be a fact table that connects to many dimension tables outside. 



















Snowflakes' schema normalizes dimensions' to reduce the redundancy, which means the dimension data is grouped into multiple tables instead of one large table. Ex. a product table (dimension table) in a star schema may be normalized into a 'products table', a 'product_category' table and a 'product_manufacturer table' in a snowflake schema. Although this saves space, it increases the number of dimension tables and requires more foreign key joins. This results in more complex queries and reduced query performance.



















Normalization & Denormalization
To understand Normalization
Refer this link
To understand Denormalization
Refer this link


Thursday, March 20, 2014

A Typical Data Warehouse & It's Architecture


Today Let's see what a Data Warehouse is,

In general terms 'Warehouse' means a large place/building/house where goods are stored 
before they are distributed.
Similarly a 'Data Warehouse' is nothing but a large database where data is stored before it is
used/distributed for some analytic purpose. 
Data stored in a data warehouse is many a times referred as raw-data which could be
structured/unstructured. To simplify it further, a data warehouse is a database designed for 
querying, reporting and analyzing certain information.
Data Warehouse mostly contains historical data derived from transaction data. 
A typical data warehouse separates transactional and analytic workload. 
Data Warehouse is primarily an analytic's tool.

A typical Data Warehouse Architecture Diagram

Now lets see what a Data Warehouse with Staging Area means, in the previous diagram whatever data is put in to the data warehouse is assumed to be clean and already processed, but most of the data warehouses happen to have a staging area implemented where data cleansing and processing is done before the data is put into a data warehouse. Staging areas are used for performing operations that affect production query environment. This is many a times accomplished by using ETL(Extract-Transform-Load) tools.

Data Warehouse with Staging Area Architecture Diagram

We may require to customize our data warehouse for different group of users. This may be achieved by adding data-marts to our data warehouse, Data marts are nothing but databases/systems designed for a certain group of users/groups/businesses.

Data Warehouse with Data Marts Architecture Diagram