A Closer Look at the Coverage Fact Table
One of the more interesting types of fact tables is the factless fact table for conditions – also known as Coverage Fact table by Kimball purists. I was recently reminded of this type of fact table while reading Star Schema: A Complete Reference in which Chris Adamson (b | t) describes the factless fact table for conditions as follows:
Factless fact tables for conditions are used to capture significant information that is not part of a business activity. Conditions associate various dimensions at a point in time. When compared with activities, they provide valuable insight. Examples of conditions include eligibility of people for programs, the assignment of salesreps to customers, active marketing programs for a product, or special weather conditions in effect.
In this post, we’ll explore an example of a coverage fact table using the AdventureWorks dataset. Our focus will be product/promotion analysis – for which we will use a factless fact table for the condition of products on promotion by date to enhance our analysis.
In this section we’ll walk through several ways we can model the coverage fact table and incorporate it into our dimensional data model. Using conformed dimensions to enable drill-across capability between fact tables in a dimensional model is always important, but in the case of a coverage fact table it is paramount. The reason for this is because most of the more interesting measures and calculations can only be derived when the coverage fact table is queried in conjunction with a transaction fact table – mixing conditions with actual business activity.
Note: Scripts for creating the tables shown in the following three scenarios (above) and loading them with data are included in a zip file at the end of this post.
The diagram below shows one way we could construct a factless fact table for products on promotion. In this case, FactProductPromotion1 contains a record for each day a product is associated with a particular promotion:
For example, if the marketing department created a promotion (promo A) that runs from 1/5/2014 through 1/9/2014 for products ABC and XYZ, then we should have the following records in FactProductPromotion1:
One situation to consider is a product being associated with more than 1 promotion on a single date – which is not uncommon. In that case, we’ll have multiple records for the same product on the same date. You can see this in the screenshot below where product ABC is associated to Promo A and Promo B for Jan 5, 2013 – Jan 7, 2013:
This is not a problem in and of itself – just a case where the relationship between the two entities (Product, Promotion) is many-to-many.
Another option (shown below) is where the fact table (FactProductPromotion2) only contains a single record for each day a product is associated any (nonspecific) promotion – so regardless of whether the product is associated with 1 or multiple promotions, there will only be a single record in the fact table. This may be desirable since it allows us to do a simple count to answer questions like “how many products are on promotion at a specific point in time” – where as in the first model (FactProductPromotion1) we would need to perform a DISTINCT count of products which is less efficient for large datasets.
In this case we’ve pushed the many-2-many relationship between Product/Promotion entities out of the coverage fact table and into bridge table (BridgePromotionGroupPromotion). Now products are associated with “groups of promotions” which will be constructed during the ETL process.
Note: As is the case with any many-to-many scenario, this design can add complexity to the ETL – especially if the promotion dimension has any type-2 (historical) slowly-changing attributes. It can also add confusion to some report developers who don’t have a enough experience with the peculiarities of many-to-many data models (e.g. sum of rows not adding up to the total).
One final scenario to discuss is the option of treating the coverage fact table as a snapshot. This would require no changes to the prior data model in Scenario 2 (above) – only a minor adjustment to ETL process in order to load the products that were not associated with any promotions for a particular date. The figure below shows what this looks like building from the previous examples.
Now we have records in the fact table for products ABC and XYZ for dates 20140101 through 20140104 when these products are NOT on promotion. As you can see, they are linked to the “No Promotion” key via the BridgePromotionGroupPromotion bridge table.
Note: personally, I don’t see much value in this variation – so you’ll want to have a really good reason in order to justify the increased data storage requirements and ETL work that accompany a periodic snapshot fact table. Nevertheless, I’ve included it in the DDL scripts (available at the bottom of the post) for you to play with. If you do decide to use this implementation, consider adding a basic fact (e.g. OnPromotion) with a value of 1 when the product is linked to a promotion and a value of zero when the product is linked to the “No Promotion” member. This will make it easier to create a measure to count products on promotion using the SUM aggregation function.
The Semantic Layer
In this section we’ll briefly walk through the process of implementing scenarios 1, 2, and 3 from the previous section into an existing semantic layer consisting of a trimmed down version of the AdventureWorks multidimensional cube project.
Step 1: Data Source View (DSV)
The first step to expanding a multidimensional cube is to add the new tables to the data source view (DSV).
Note: using separate diagrams for each fact table or “star” helps keep things organized – which becomes more and more important as the number of tables increases in the SSAS database project.
Step 2: Create SSAS Database Dimension(s)
Next (for scenarios 2 and 3) we will create a new dimension for the DimPromotionGroup table:
Step 3: Create New Measure Group(s)
After that, we need to create measure groups in the cube for each of the new coverage fact tables as well as the bridge table. The measures for the coverage fact tables (“count of rows” aggregation) will prove helpful when we get to the MDX in the next section. However, the measure for the bridge table measure group (also a “count of rows” aggregation) is completely worthless and only necessary as a requirement for SSAS (i.e. every measure group must have at least one measure) – therefore it should be hidden from the end users by setting the Visible property to false. To be clear, the only reason a measure group is created for the bridge table is to establish the proper relationships in the Dimension Usage tab.
Step 4: Add New Dimensions to Cube
Next we’ll add the new DimPromotionGroup dimension to the cube. The other dimensions (Date, Product, Promotion) should already exist in the cube since they are being used with the Internet Sales and Internet Orders measure groups (both of which are based on the FactInternetSales fact table).
Note: if you’re users will be accessing this cube via tools like excel, then you should strongly consider hiding the PromotionGroup cube-dimension by setting the Visible property to false. This dimension, like the BridgePromotionGroupPromotion measure group, is only in place to facilitate the appropriate relationships in the Dimension Usage tab.
The cube structure should now resemble the one below:
Step 5: Configure Relationships
Now that we have the primary structures in place, we need to set up the relationships in the dimension usage tab for our new measure groups and dimensions.
In the screenshot above, we have all regular relationships except for the 2 highlighted in red – representing the many-2-many relationship between the promotion dimension and the coverage fact tables from scenarios 2 and 3.
Here is a quick look at the configuration of this many-to-many relationship.
By removing the measure group filter making the Internet Sales and Internet Orders measure groups visible, we get a clear picture of our measure groups and how they are related via conformed dimensions…
Now that the data modeling discussion is out of the way and our cube has been augmented to include several new versions of the coverage fact table, we can now begin to switch gears and start focusing on the measures and calculations we want to create. A good first step – instead of diving right into the MDX – is to take a moment to acknowledge and appreciate the additional perspective gained by building out the coverage fact table and incorporating it into the dimensional data model with existing “activity” based fact tables (e.g. FactInternetSales).
Using the venn diagram below can help us visualize the benefits:
From this diagram, it is clear that the primary extension (from the perspective of our analytical capabilities) is linked to the area of the blue circle labeled as “products on promotion without any sales” that doesn’t overlap with the yellow circle at all.
With that in mind, I’ve created a first-pass list of measures that might be useful for promotion-focused sales analysis.
Couple of notes:
- The measures in grey are not dependent on the product/promotion coverage fact table, but they are used in conjunction with other measures (that are dependent on the product/promotion coverage fact table) to create higher-level measures (e.g. Pct of Products OnPromotion)
- This list does not include time-intelligence derivatives (e.g. prior year, YoY change) but those can easily be accommodated
- The MDX source code for the majority of these measures is included in the MDXScript in the Analysis Services project that can been downloaded using the link at the bottom of the post
- Count of Products
- Count of Products OnPromotion
- Count of Products OnPromotion with Sales
- Count of Products OnPromotion w/out Sales
- Pct of Products OnPromotion
- Pct of Products OnPromotion with Sales
- Pct of Products OnPromotion w/out Sales
- Count of Days In Period
- Count of Days on Promotion (at product-level…for products on promotion)
- Count of Days on Promotion with Sales (at product-level…for products on promotion)
- Count of Days on Promotion w/out Sales (at product-level…for products on promotion)
- Pct of Days OnPromotion (at product-level…for products on promotion)
- Pct of Days OnPromotion with Sales (at product-level…for products on promotion)
- Pct of Days OnPromotion w/out Sales (at product-level…for products on promotion)
- Count of Orders – Missed Promotion
this one is a bit of a stretch and depends on the scope and variation of promotions. For example, a sales transaction may involve a product that is associated with a promotion (e.g. volume discount) but isn’t linked to the promotion because it wasn’t ordered in a large enough quantity.
- Promotional Lift
Measures in this category require a more controlled setup. For example, you might choose to use these measures for an A-B test where stores (or regions) are clustered based on the similarity of extended attributes that are tightly correlated with product sales (i.e. data mining). Once the clusters are established, a promotion can be run in a subset of stores in a single cluster (Group B) while the remaining stores in the same cluster (Group B) don’t run a promotion. After a period of time, the sales between the two stores can be compared in a (variety of ways) to determine the actual “lift” in sales attributed to the promotion.
- Average Daily Sales – Products OnPromotion
- Average Daily Sales – Products OffPromotion
the following items are “sets”…in SSAS they would be defined as dynamic sets in order to take advantage of additional context applied downstream (e.g. in SSRS reports). These sets would show up in reports that prompt further investigation by business analysts…e.g. Why aren’t these products selling? Does it have something to do with the promotion?
- Products OnPromotion w/out Sales
- Products w/out Sales for more than X% of promotion duration
In this post we took a closer look at coverage fact tables (aka factless fact tables for conditions). We started off by covering the dimensional data modeling aspect of a coverage fact table; looking specifically at 3 different implementations. Then we walked through the process of augmenting an existing SSAS multidimensional cube. Finally, we ended with a quick list of new measures and calculations that can easily be created on top of the new coverage fact table or in combination with existing “activity” based fact tables.
This zip file contains everything you need to step through the examples in this post. You will need the AdventureWorks2012 and AdventureWorksDW2012 sample datasets (download here) if you actually want to deploy and process the cube.
Please feel free to reach out if you have any questions, comments, or interesting facts.