Dimensional Model SchemasFlake and Constellation Star, Dimensional model can be organized in star-schema or snow-flaked schema. It is called a star schema because the entity-relationship diagram of this schema resembles a star, with points radiating from a central table. The center of the star consists of a large fact table and the points of the star are the dimension tables. A star schema is characterized by one OR more very large fact tables that contain the primary information in the data warehouse, and a number of much smaller dimension tables OR lookup tables , each of which contains information about the entries for a particular attribute in the fact table. A star query is a join between a fact table and a number of dimension tables.
|Published (Last):||15 April 2005|
|PDF File Size:||15.33 Mb|
|ePub File Size:||12.44 Mb|
|Price:||Free* [*Free Regsitration Required]|
Dimensional Model SchemasFlake and Constellation Star, Dimensional model can be organized in star-schema or snow-flaked schema. It is called a star schema because the entity-relationship diagram of this schema resembles a star, with points radiating from a central table. The center of the star consists of a large fact table and the points of the star are the dimension tables. A star schema is characterized by one OR more very large fact tables that contain the primary information in the data warehouse, and a number of much smaller dimension tables OR lookup tables , each of which contains information about the entries for a particular attribute in the fact table.
A star query is a join between a fact table and a number of dimension tables. Each dimension table is joined to the fact table using a primary key to foreign key join, but the dimension tables are not joined to each other. The cost-based optimizer recognizes star queries and generates efficient execution plans for them.
The dimension tables are time, branch, item and location. A star join is a primary key to foreign key join of the dimension tables to a fact table. The main advantages of star schemas are that they Provide a direct and intuitive mapping between the business entities being analyzed by end users and the schema design. Provide highly optimized performance for typical star queries. Are widely supported by a large number of business intelligence tools, which may anticipate OR even require that the data-warehouse schema contains dimension tables Snow-Flake Schema in Dimensional Modeling The snowflake schema is a more complex data warehouse model than a star schema, and is a type of star schema.
It is called a snowflake schema because the diagram of the schema resembles a snowflake. Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one large table.
For example, a location dimension table in a star schema might be normalized into a location table and city table in a snowflake schema. While this saves space, it increases the number of dimension tables and 15 DWBI Essential Guide requires more foreign key joins. The result is more complex queries and reduced query performance. Figure above presents a graphical representation of a snowflake schema.
Fact Constellation Schema This Schema is used mainly for the aggregate fact tables, OR where we want to split a fact table for better comprehension. Relational Dimensional modeling is different from the OLTP normalized modeling to enable analysis and querying through massive and unpredicted queries.
Something which is a relational model is illequipped to handle. How Dimensional model is different from an E-R diagram? An E-R diagram used in OLTP or transactional system has highly normalized model Even at a logical level , whereas dimensional model aggregates most of the attributes and hierarchies of a dimension into a single entity.
An E-R diagram is a complex maze of hundreds of entities linked with each other, whereas the Dimensional model has logical grouped set of starschemas. The E-R diagram is split as per the entities. A dimension model is split as per the dimensions and facts. In an E-R diagram all attributes for an entity including textual as well as numeric, belong to the entity table. Dimensional modeling is a better approach for Data warehouse compared to standard Data Model. The dimensional model has a number of important data warehouse advantages that the ER model lacks.
First advantage of the dimensional model is that there are standard type of joins and framework. All dimensions can be thought of as symmetrically equal entry points into the fact table. The logical design can be done independent of expected query patterns. The user interfaces are symmetrical, the query strategies are symmetrical, and the SQL generated against the dimensional model is symmetrical.
In other words, You will never find attributes in fact tables and facts in dimension tables. If you see a non-fact field in the fact table, you can assume that it is a key to a dimension table Second advantage of the dimensional model is that it is smoothly extensible to accommodate unexpected new data elements and new design decisions.
First, all existing tables both fact and dimension can be changed 17 DWBI Essential Guide in place by simply adding new data rows in the table. Data should not have to be reloaded. Typically, No query tool OR reporting tool needs to be reprogrammed to accommodate the change. All old applications continue to run without yielding different results. You can, respectively, make the following graceful changes to the design after the data warehouse is up and running by: Adding new unanticipated facts that is, new additive numeric fields in the fact table , as long as they are consistent with the fundamental grain of the existing fact table.
Adding completely new dimensions, as long as there is a single value of that dimension defined for each existing fact record Adding new, unanticipated dimensional attributes. Breaking existing dimension records down to a lower level of granularity from a certain point in time forward. Third advantage of the dimensional model is that there is a body of standard approaches for handling common modeling situations in the business world.
Each of these situations has a well-understood set of alternatives that can be specifically programmed in report writers, query tools, and other user interfaces. Dimensional modeling provides specific techniques for handling slowly changing dimensions, depending on the business environment. Heterogeneous products, where a business such as a bank needs to: o Track a number of different lines of business together within a single common set of attributes and facts, but at the same time..
It has to be designed to have global or re-usable set of dimensions and measures. Data Warehouse modeling has two components: Foundation to support medium to long-term capabilities, without the need to unsettle the structure time and again. The individual phases for developments of Data Marts eventually merge into the enterprise wide Data Warehouse.
A project has to address both the foundation and phase elements. Every stage in the Data Warehouse project will address these two elements in distinct and overt manner.
For dimensional modeling, the following foundation setting elements will work like reusable components. This means that: Dimensions are super-sets of all possible attributes for that dimension. Therefore, when creating the standard dimensions, one make the superset of attributes. Dimensions include all possible levels of business hierarchy. For example- A portfolio analysis of a channel may not require the branch level location, but the agent productivity analysis could.
Dimensions to include not only categories, but descriptive textual attributes as well wherever needed. For example- A textual detail for a location code could be needed for distribution analysis, but many not be needed for portfolio analysis. Make the dimension most granular- Many a times the analysis does not need to go down to the most granular level of customer ID. In case, customer 19 DWBI Essential Guide moves from his existing customer segment, the whole dimensional modeling could lead to issues, if the dimension is starting from customer group upwards Examples of foundation dimensions are- Customer, Location, Channel, Sales Lead etc.
Standard set of foundation or conformed facts. This means that: A fact table will include all possible units of measures for given set of dimensions. However, both units for the given measure should be included even if there is a standard conversion rate.
These standards conversion rates keep on changing with time. A Fact table logically groups a business instance. However, you will require the fact on final sale to the end customer for sales analysis. As a guideline, a highly linked business process should get combined in a single fact. Standard set of foundation measures. This means that All the measures and their possible units to be listed out. Measures are most susceptible to having confusing definitions OR to be misnamed.
Detailed formulas behind measures are must. Refer Sales Revenue Fact-Measure as an example. Examples of foundation measures are- Sales Measures, Customer Measures, etc. Slowly Changing Dimensions Entities change over time. Customer demographics, product characteristics, classification rules, status of customers etc. In a transaction system, many a times the change is overwritten and track of change is lost. For example a source system may have only the latest customer PIN Code, as it is needed to send the marketing and billing statements.
However, a data warehouse needs to maintain all the previous PIN Codes as well, because we need to track on how many customers move to new locations over what frequency. A key benefit for Data Warehouse is to provide historical information, which is typically over-written and thus lost in the transaction systems. How to handle slowly changing dimensions in a Dimensional Model is a key determinant to that benefit. This is obviously done, when we are not analyzing the historical information.
This is used, when there is more than one change in the attributes of an entity, and we need to track the date of change of the attribute. In this method, a new record is added whereby the new record is given a separate identifier as the primary key. This new identifier is called the surrogate key. Apart from adding a new record and providing a new primary surrogate key, the validity period for this new record is also added. Overtime, customer gets married and also moved to a new location.
This method has to know from the beginning on what attributes will change. Secondly, attribute can change maximum once in the lifetime of the entity OR at least the lifetime of the data warehouse.
However, the same methods will apply to fast changing dimensions as well. Online Analytic Processing is the capability to store and manage the data in a way, so that it can be effectively used to generate actionable information.
Data Warehouse OLAP makes Business Intelligence happen, broadly by enabling the following: Transforming the data into multi-dimensional cubes Summarized pre-aggregated and derived data Strong query management Multitude of calculation and modeling functions A data-warehouse could be having data in various formats like dimensional with a high degree of de-normalization OR highly relational like 3rd normal form. As a separate note- We have covered the entire data-warehouse chapter on the basis of dimensional modeling based storage.
Most of the concepts in the data-warehouse chapter remain the same irrespective of the kind of storage and data-modeling one needs to do. The detail differential between OLAP vs. Mostly the end-user tools like business modeling tools, Data mining tools, performance reporting tools.. OLAP and Data warehouse work in conjunction to provide overall data-access for the end-user tools.
You may like to refer to BI Architecture Scenarios to get a better back-ground. To put it in a simplistic manner, there is one array for one combination of dimensions and associated measures.
Handling rapidly changing dimensions are tricky due to various performance implications. This article attempts to provide some methodologies on handling rapidly changing dimensions in a data warehouse. I wonder why we have a staging layer in between. To be successful with database-centric applications which includes most of the applications Data Warehousing domain , one must be strong enough in SQL. In this article, we will learn more about SQL by breaking the subject in the form of several question-answer sessions commonly asked in Interviews. Published 9 months ago under SQL Performance Considerations for Dimensional Modeling Performance of a data warehouse is as important as the correctness of data in the data warehouse because unacceptable performance may render the data warehouse as useless. In this article we have a few points that you may consider for optimally building the data model of a data warehouse.
DWBI Concepts All
This is an all-encompassing domain that deals with the technologies involved in extracting data from disparate systems, treating and cleaning those data, transforming the data and finally load it to one or more target systems. Pages In this tutorial we will learn - what is meant by the term "Data Integration" DI , how data integration is done and why the need of data integration often requires us to build a data warehouse. I wonder why we have a staging layer in between. This technique is employed to perform faster load in less time utilizing less system resources. In this tutorial we will understand the basic methods of incremental loading.