Monday, March 31, 2014

Basic Data Base Concepts you should know before working on a Business Intelligence Project

Logical Design
Logical design can also be called as a conceptual design or an abstract. It does not consider the physical implementation details. In logical design we only define information specified by requirements. Entity-Relationship modeling is one of the most popular techniques' used for logical modeling. IN E-R Model mostly important data elements (Entities) are identified, then properties of these entities are identified and lastly relationships among different entities is established. In relational databases, an entity usually forms a table and an attribute which define the uniqueness of the entity forms a column.
To ensure data consistency unique identifiers are used , a unique identifier is added to tables so that we can differentiate between the same item when it appears at different places, unique identifier is  actually referred as primary key.
Though E-R Model has been mostly used for OLTP(Online Transactional Processing) databases, the technique is also very useful during the  data warehouse design in the form of dimensional modeling. In dimensional modeling facts and dimensions are associated using relationships.
Diagram below shows a very basic and simple E-R Diagram, where I have included only two entities 'Student' and 'School' they are linked with a relationship 'goes to' and each entity has 'attributes'.













 So to summarize

      – An entity  represents some data.
– Attributes represent properties of data.
   – Relationships are links in between an Entities.


Physical Design
In the physical design data gathered during the logical design phase is converted into the description of a physical database structure. Physical design decisions  directly affect the query performance and database maintenance aspects.
At this step we do the mapping:
•         Entities to tables
•         Relationships to foreign key constraints
•         Attributes to columns
•         Primary unique identifiers to primary key constraints
•         Unique identifiers to unique key constraints
Table spaces, Partitioned Tables, Views, Dimensions etc are created after the physical design. Materialized views, indexes etc can also be created for performance tuning.


Fact Tables & Dimension Tables
Fact tables and Dimension Tables are the two common types of objects found in any dimensional data warehouse schema.
Fact tables are tables in your data warehouse that store measurements they are typically business measures. Measures are the factual data that businesses usually want to analyze, such as total_sales or unit_cost. Fact tables typically contain facts (Measures) and foreign keys of the dimension tables. SALES, COST, and PROFIT  are examples of facts.
Dimension tables, are also called as  lookup or reference tables. Dimension tables store the information (metadata) that you normally use to fire queries. Dimension tables most of the times are descriptive. Exs. are CUSTOMERS and PRODUCTS.
Each dimension table is joined to the fact table using primary-foreign keys' relationships'.


Schema
Schemas are nothing but a way of logically grouping database objects such as tables, views, stored procedures etc. Schemas' in database concepts are also defined as containers' of database objects.
Users can then be assigned to schemas so that users can only access what they are authorized to access.
The logical schema is the structure of the tables and relationships in a database.
The physical one is how this structure and the data it holds are stored.

The star schema is the simplest form of data warehouse schema. It is called as star schema because it's logical structure refers a 'star' where' in usually at the center there happens to be a fact table that connects to many dimension tables outside. 



















Snowflakes' schema normalizes dimensions' to reduce the redundancy, which means the dimension data is grouped into multiple tables instead of one large table. Ex. a product table (dimension table) in a star schema may be normalized into a 'products table', a 'product_category' table and a 'product_manufacturer table' in a snowflake schema. Although this saves space, it increases the number of dimension tables and requires more foreign key joins. This results in more complex queries and reduced query performance.



















Normalization & Denormalization
To understand Normalization
Refer this link
To understand Denormalization
Refer this link


1 comment: