Monday, March 31, 2014

Basic Data Base Concepts you should know before working on a Business Intelligence Project

Logical Design
Logical design can also be called as a conceptual design or an abstract. It does not consider the physical implementation details. In logical design we only define information specified by requirements. Entity-Relationship modeling is one of the most popular techniques' used for logical modeling. IN E-R Model mostly important data elements (Entities) are identified, then properties of these entities are identified and lastly relationships among different entities is established. In relational databases, an entity usually forms a table and an attribute which define the uniqueness of the entity forms a column.
To ensure data consistency unique identifiers are used , a unique identifier is added to tables so that we can differentiate between the same item when it appears at different places, unique identifier is  actually referred as primary key.
Though E-R Model has been mostly used for OLTP(Online Transactional Processing) databases, the technique is also very useful during the  data warehouse design in the form of dimensional modeling. In dimensional modeling facts and dimensions are associated using relationships.
Diagram below shows a very basic and simple E-R Diagram, where I have included only two entities 'Student' and 'School' they are linked with a relationship 'goes to' and each entity has 'attributes'.













 So to summarize

      – An entity  represents some data.
– Attributes represent properties of data.
   – Relationships are links in between an Entities.


Physical Design
In the physical design data gathered during the logical design phase is converted into the description of a physical database structure. Physical design decisions  directly affect the query performance and database maintenance aspects.
At this step we do the mapping:
•         Entities to tables
•         Relationships to foreign key constraints
•         Attributes to columns
•         Primary unique identifiers to primary key constraints
•         Unique identifiers to unique key constraints
Table spaces, Partitioned Tables, Views, Dimensions etc are created after the physical design. Materialized views, indexes etc can also be created for performance tuning.


Fact Tables & Dimension Tables
Fact tables and Dimension Tables are the two common types of objects found in any dimensional data warehouse schema.
Fact tables are tables in your data warehouse that store measurements they are typically business measures. Measures are the factual data that businesses usually want to analyze, such as total_sales or unit_cost. Fact tables typically contain facts (Measures) and foreign keys of the dimension tables. SALES, COST, and PROFIT  are examples of facts.
Dimension tables, are also called as  lookup or reference tables. Dimension tables store the information (metadata) that you normally use to fire queries. Dimension tables most of the times are descriptive. Exs. are CUSTOMERS and PRODUCTS.
Each dimension table is joined to the fact table using primary-foreign keys' relationships'.


Schema
Schemas are nothing but a way of logically grouping database objects such as tables, views, stored procedures etc. Schemas' in database concepts are also defined as containers' of database objects.
Users can then be assigned to schemas so that users can only access what they are authorized to access.
The logical schema is the structure of the tables and relationships in a database.
The physical one is how this structure and the data it holds are stored.

The star schema is the simplest form of data warehouse schema. It is called as star schema because it's logical structure refers a 'star' where' in usually at the center there happens to be a fact table that connects to many dimension tables outside. 



















Snowflakes' schema normalizes dimensions' to reduce the redundancy, which means the dimension data is grouped into multiple tables instead of one large table. Ex. a product table (dimension table) in a star schema may be normalized into a 'products table', a 'product_category' table and a 'product_manufacturer table' in a snowflake schema. Although this saves space, it increases the number of dimension tables and requires more foreign key joins. This results in more complex queries and reduced query performance.



















Normalization & Denormalization
To understand Normalization
Refer this link
To understand Denormalization
Refer this link


Thursday, March 20, 2014

A Typical Data Warehouse & It's Architecture


Today Let's see what a Data Warehouse is,

In general terms 'Warehouse' means a large place/building/house where goods are stored 
before they are distributed.
Similarly a 'Data Warehouse' is nothing but a large database where data is stored before it is
used/distributed for some analytic purpose. 
Data stored in a data warehouse is many a times referred as raw-data which could be
structured/unstructured. To simplify it further, a data warehouse is a database designed for 
querying, reporting and analyzing certain information.
Data Warehouse mostly contains historical data derived from transaction data. 
A typical data warehouse separates transactional and analytic workload. 
Data Warehouse is primarily an analytic's tool.

A typical Data Warehouse Architecture Diagram

Now lets see what a Data Warehouse with Staging Area means, in the previous diagram whatever data is put in to the data warehouse is assumed to be clean and already processed, but most of the data warehouses happen to have a staging area implemented where data cleansing and processing is done before the data is put into a data warehouse. Staging areas are used for performing operations that affect production query environment. This is many a times accomplished by using ETL(Extract-Transform-Load) tools.

Data Warehouse with Staging Area Architecture Diagram

We may require to customize our data warehouse for different group of users. This may be achieved by adding data-marts to our data warehouse, Data marts are nothing but databases/systems designed for a certain group of users/groups/businesses.

Data Warehouse with Data Marts Architecture Diagram






Wednesday, February 12, 2014

Business Intelligence and Business Intelligence with Information Technology

If somebody asks you what do you mean by 'Being Intelligent', you would probably say something like being alert, being responsive, being sharp or being active etc etc. And if you search through a few dictionaries for meaning of the word 'being intelligent' they would also talk of something similar.

Now lets talk about Business, If I ask you people what does Business mean, you might probably end up saying something like 'what sort of stupid question is this' , because most of us happen to have some or the other information about business, many people very well understand what business means and how it works, many would have a brief idea about it, many actually do business. But if we decide to do a thorough research on this we might end up developing a whole new theory. But that is not what I want to do now, so lets go back to the point what business means, in general Business is an activity by which people make profit or money for exchange of some goods or services, again I would like to say that there are many definitions of business available if you look online or in books written so far, and it might mean different things to different people, not necessarily money making would be the end objective or meaning of business for all people. But for sure what we can say is that every businessman/women happens to have some target some objective in mind which they would like to accomplish at the completion of a business transaction or activity.

So now what if we combine both the topics  I discussed earlier Intelligent person/system and Business, logically combination of both should be a business intelligent person's/system's. And that is exactly what I want to focus on. Business Intelligent people and systems. Business has been in existence since a very long time probably since the existence of mankind itself and as years passed as human kind evolved business has also evolved and so have the ways of doing business. So in the past as well concept of business intelligent people/business intelligence existed.

Now lets talk about the current scenario, currently Business Intelligence in general is a term used to combinely refer Business Intelligent resources that drive business growth or help and organization or individual achieve certain goals and objectives. Resources could be people, software systems, hardware systems, organizations, methods, theories, procedures etc.

We all know since the time Information Technology evolved a lot of things have changed, so has business and ways of doing business. Information Technology is being widely used these days by almost all the businesses, but not all the technologies being used by businesses serve the purpose of Business-Intelligence. So basically Information Technologies that can gather, process, and analyse huge unstructured/structured data and help an organization in achieving meaningful Business-Insight are known as Business Information System / Business Intelligence Systems. Almost all the major IT firms like Oracle, SAP, Microsoft, IBM have their own business information systems in place for business houses. Some of the most popular ones are OBIEE, SAP-BI, Microsoft-BI, IBM-Cognos, etc.

So on my blog over here we would be learning and understanding a few of them, Initially I'll be starting with the basic data-warehousing/data-mining concepts and then eventually we will see some of the popular BI tools and technologies being used these days. So stay tuned..... :)