Types of Fact Tables

In a presentation I’ve been delivering lately, Bus Matrix – the Foundation of your Dimensional Data Model, I briefly cover the hierarchy of an enterprise and how it can be conceptually broken down and mapped into a dimensional model.

  1. A single enterprise typically contains many business areas (e.g. Sales, Manufacturing, Finance)
  2. A single business area (e.g. Sales) typically contains many business processes (e.g. Orders, Shipments, Returns)
  3. A single business process can be modeled by multiple fact tables

This forms the basis of the main argument in the presentation: that dimensional models (despite being much less complex than normalized models) can quickly grow and become challenging to manage – thus requiring something other than a non-business-user-friendly ER diagram to visualize the model…like a bus matrix:

image

One point that I feel I’ve failed to make is that the complexity of a dimensional model is primarily driven by the number of fact tables…even though, in smaller models, it’s likely the number of dimensions exceeds the number of fact tables.

Fact Tables Drive the Complexity of a Dimensional Data Model

Folks rarely have trouble grasping the idea of dimensions. These tables map directly to business “entities” (e.g. Products, Territories, Customers, etc) – a clean 1 to 1 relationship leaving little room for confusion.

Note: technically the relationship between dimension tables and business entities can be many-to-one when you take into account core-vs-custom dimension tables. This concept is thoroughly explained in the best book on dimensional modeling so I won’t regurgitate it here.

Fact tables are a completely different story. They don’t map directly to business “entities”. If anything, they map to business processes…but even that is a bit misleading. Another argument is that they map to a set of metrics…again, not entirely true. The best way I can describe fact tables is that they are grouped by business area (e.g. Sales, Manufacturing, Finance) and model business processes (e.g. Orders, Shipments, Returns) in a way to support a variety of business measurements.

Types of Fact Tables

The Kimball methodology includes 3 main types of fact tables:

Transaction – the most common type of fact table, used to model a specific business process (typically) at the most granular/atomic level.

Periodic Snapshot – used to model the status of a business process at a specific point in time on a regularly recurring interval. For example, a periodic snapshot fact table might be used to track account balances on a monthly basis. In this case, a “snapshot” of the account balance would be taken at the end of each month – which represents the net of all withdrawal and deposit transactions occurring during the month. Inventory is another common scenario that makes use of periodic snapshots for tracking quantity on hand (by item) at the end of each month. In both examples, the primary “fact” (account balance and quantity on hand) in the two tables are “semi-additive” – which simply means they can’t be aggregated over time.

Accumulating Snapshot – model events in progress for business processes (e.g. Claims Processing for an Insurance Company) that involve a predefined series of steps (e.g. claim submitted, claim reviewed, claim approved/rejected). These tables prove useful in measuring/analyzing the duration between steps in a complete process and discovering bottlenecks.

In addition to the primary fact table types (above), there are a few additional fact tables to consider:

Factless Fact Tables – used to model the events (e.g. student attendance) or conditions (e.g. product promotions). One common misconception is that these tables do not contain facts – which is only partially true. For example, if we are using a factless fact table to model “student attendance” (a textbook example of factless fact tables – pun intended!), our table may also include – in addition to the keys mapping to the student, class, and date dimensions – a single fact called “attendance_count” with the value of 1 for every row. This is not necessary, but it makes it clear what we are measuring and might even make life easier for downstream development of a cube and/or reports.

Note: a factless fact table used to model conditions is known as a coverage fact table in the Kimball methodology.

Bridge Tables – which look amazingly similar to factless fact tables, are used to associate facts with multiple members from the same dimension (multi-value dimensions) or to associate a single dimension attribute with multiple values (multi-value attribute), or even to provide a solution for modeling complex hierarchies (hierarchy-bridge). A classic example involves banking transactions (e.g. deposits, withdrawals) which are typically tracked by account. While most accounts are associated with a single customer, there is the case of “joint” accounts (or accounts associated with more than one customer). In this latter case, a bridge table is used to map the relationship between customers and accounts allowing the business to report bank transactions by customer.

Note: some will argue that a bridge table is not a fact table – debating the differences between the two will make you want to claw your eyeballs out with a dirty spoon.

Aggregate Fact Tables – essentially a transaction fact table with the difference of being that records in this table are pre-aggregated to a certain level (e.g. month vs date). The primary benefit is performance although one could also make an argument for saving disk space. These tables are rarely necessary for architectures that include a multidimensional semantic layer and/or make use of in-memory technology.

Temporal Snapshot Fact Table – this is a very elegant solution for avoiding the data-explosion that comes with periodic snapshot fact tables. There is also a high degree of complexity involved when implementing this type of solution. For more information, I recommend starting with a SQL Bits X presentation by Davide Mauri (b | t) called Temporal Snapshot Fact Table – that is where I first learned of this design. (Warning: the audio quality in this presentation is terrible so you will definitely need to follow along in the slides and demo scripts available.)

Final Thoughts

Many claim that designing a data warehouse is a blend of art and science. My opinion (that’s why you’re here right?) is that it is mostly science. Determining which type(s) of fact tables to include (driven primarily by the types of questions the business wants to be able to answer) is where those “blend-of-art-and-science” folks mistake art for experience. Then you have other folks like Davide Mauri and the implementation of the Temporal Snapshot Fact table reminding you that, yes, there is a bit of art to this profession.

For a good demonstration of how different types of fact tables can have a drastic affect on performance and/or complexity (with respect to reporting), see the following SQLBits XI presentation by Alex Whittles (b | t): Data Modeling for Analysis Services Cubes

 

5 thoughts on “Types of Fact Tables

  1. Bill, I always tell my clients that Fact tables map to events. A till open, a sale, a machine drilling: all are events that have happened. There are exceptions (budgets are for a period not a point in time), but the concept holds

    1. Hi Mark – I agree with that logic especially when the target audience includes a client/business user.

      But for technical folks like you and me, I think it depends on the definition of an “event”. For example, in my mental-model, a transaction fact table maps to a single event where as an accumulating snapshot fact table maps to a series of different types of events…

      Claims Processing
      1. Claim Submitted (event)
      2. Claim Reviewed (event)
      3. Claim Approved (event)
      4. Customer Notified (event)
      5. Claim Paid (event)

  2. We have an interesting case where we have a bridge which joins facts. Our situation is odd, we have multiple entities which relate to many other entities in our system with a limited hierarchy (only Customer is required). For instance we have Activities as an entity which can relate to some or none of the following: Plans, Customers, Actions, Contacts and Outcomes. We have Objectives which can relate to some or none of the following: Plans, Customers, Actions, Contacts and Outcomes. The problem comes when people want to analyse data from any direction. One user will start at Actions and spread to Objectives, while another will want a full hierarchy Customers -> Plans -> Actions -> Activities -> Outcomes, while another will want to analyse just Plans and Outcomes.

    Our bridge table essentially models the relationships, it has a key to each of the fact tables and is either null where that Fact table is not involved in the relationship or filled in for where the relationship exists. The Bridge table is understandably huge.

    We have put this into Analysis Services, but due to the nature of the querying require bi-directional joins across the Bridge to each Fact (basically we don’t know which way the user will query). This requires a massive amount of processing power, for obvious reasons. However we can’t seem to see a way out of it, other than potentially move to Graph Tables for representing the complex relationships and having independent cubes for the most common types of user queries.

    Any suggestions for multiple inter related Fact tables?

    1. Hi Brent – this one might be too difficult to comment without more info. Feel free to shoot me an email at anton (at) opifexsolutions (dot) com with a screenshot of the data model diagram

Leave a Reply