…brew your own Business Intelligence

Filtering a Pivot Table by a Large Dimension

A common issue I see on the MSDN forums and at clients is how to reduce the number of members available in a filter to only those with data in the pivot table. This question often originates from users wanting to filter the results of a pivot table by a dimension attribute containing a lot of members. Before we dive into a potential solution, let’s discuss why this is such a common issue.

Filtering a Pivot Table

The standard way to filter the results of a pivot table is by placing the dimension attribute (e.g. Calendar Year) in the filters area of the pivot table and selecting the desired member(s) from a drop down list…

image

image

And this works great for dimension attributes (e.g. Calendar Year) with on a few members…

image

But what about dimension attributes (e.g. Customer) with hundreds, thousands, millions of members?

image

If you click the caution message “Not all items showing”, you get the following dialog box:

image

The customer attribute in the Adventure Works dataset (shown above) contains ~18,484 members. How is our business user supposed to select from Customers 10,001 – 18,484?

Cascading Filters

Cascading filters is when multiple filters are linked such that a selection in the first filter limits the list of available values in the second filter. Consider the scenario below where we are filtering Internet Sales Amount by 5 filters: Customer Country, Calendar Year, Month of Year, Product Category, and Customer…

image

With cascading filters, once we select CY 2007 for the Calendar Year filter, the available values for the remaining filters would be limited to only members with data. By the time we worked our way down the list to the customer filter, the dropdown list would only contain the handful of customers who are from Canada and purchased a product from the Clothing category during the month of July in the year 2007.

Based on the query below, that would mean we should only see 9 customer names listed in the customer filter – much more manageable than the 18.5k.

SELECT    COUNT(DISTINCT dc.FirstName + ' ' + dc.LastName)
FROM    dbo.FactInternetSales fis
        INNER JOIN dbo.DimDate dd ON dd.DateKey = fis.OrderDateKey
        INNER JOIN dbo.DimProduct dp ON dp.ProductKey = fis.ProductKey
        INNER JOIN dbo.DimProductSubcategory dpsc ON dpsc.ProductSubcategoryKey = dp.ProductSubcategoryKey
        INNER JOIN dbo.DimProductCategory dpc ON dpc.ProductCategoryKey = dpsc.ProductCategoryKey
        INNER JOIN dbo.DimCustomer dc ON dc.CustomerKey = fis.CustomerKey
        INNER JOIN dbo.DimGeography dg ON dg.GeographyKey = dc.GeographyKey
WHERE    dg.EnglishCountryRegionName = 'Canada' AND
        dd.CalendarYear = 2007 AND
        dd.MonthNumberOfYear = 7 AND
        dpc.EnglishProductCategoryName = 'Clothing'

image

Unfortunately, this is not how multiple filters work in Excel pivot tables. Instead, each filter displays all members that exist in the dimension/attribute regardless of whether data exists in the pivot table and independent of any other selection in any of the other filters:

image

Enter the Slicer

Slicers aren’t exactly a new feature in Excel – they were introduced with version 2010 – but I often find that some of their more subtle properties aren’t fully understood. One such property that I didn’t fully grasp until recently was that they behave like cascading filters which makes them appropriate for filtering pivot tables by large dimension attributes with lots of members.

Building from the previous example, we can implement our customer filter as a slicer (instead of a regular pivot table filter) and we get a much better experience:

image

Notice in the screenshot above how only the 9 customers are displayed. These are the 9 customers from Canada who purchased a product from the Clothing category during the month of July in the year 2007. The rest of the members of the Customer dimension/attribute are greyed out because they aren’t linked to any transactions that contribute to the sales amount displayed in the pivot table. In fact we can hide these customers via the following slicer setting:

image

image

But Wait, There’s More

Not only is this a better experience for the user (since it is now clear which Customers are contributing the to the pivot table value section), but it is also more efficient on the backend because the results are “paged” … or returned to the UI in chunks of 1000. So even for a dimension attribute with 3 million members, only the first 1000 are going to be queried at a time.

Let’s take a look at the details:

Pivot Table Filter

Using a basic pivot table filter for the Customer dimension/attribute, the following query is generated by Excel and sent to the cube to populate the members displayed in the dropdown list:

SELECT
{
    AddCalculatedMembers (
        { [Customer].[Customer].[All Customers].Children }
    )
} Dimension Properties MEMBER_TYPE ON COLUMNS
FROM [Adventure Works] CELL Properties CELL_ORDINAL

This query selects all members of the Customer dimension/attribute (with no regard for the rest of the filters or measures used in the pivot table) and returns them to Excel to display to the user in the dropdown list. Imagine the user experience when your customer dimension contains 3 million members Smile with tongue out

Slicer

On the other hand, when the customer filter is implemented as a slicer, we see the following query being sent from Excel to the SSAS cube to populate the slicer values.

WITH SET __XLSlice0 AS { [Customer].[Country].&[Canada] }
SET __XLSlice1 AS { [Date].[Calendar Year].&[2007] }
SET __XLSlice2 AS { [Date].[Month of Year].&[7] }
SET __XLSlice3 AS { [Product].[Category].&[3] }
SET __XLSlice4 AS { [Measures].[Internet Sales Amount] }
SET __XLExistingMembers AS
    NonEmpty (
        [Customer].[Customer].[Customer].AllMembers
        , Crossjoin ( __XLSlice0, __XLSlice1, __XLSlice2, __XLSlice3, __XLSlice4 )
    )
SET __XLExistingRangeMembers AS Subset ( __XLExistingMembers, 0, 1000 )
SET __XLNonExistingRangeMembers AS
    IIf (
        __XLExistingRangeMembers .Count = 1000
        , { }
        , Subset (
            Except ( [Customer].[Customer].[Customer].AllMembers, __XLExistingMembers )
            , IIf (
                0 - __XLExistingMembers .Count < 0
                , 0
                , 0 - __XLExistingMembers .Count
            )
            , 1000
        )
    )
MEMBER [Measures].__XLRelated AS
    Intersect ( [Customer].[Customer].CurrentMember, __XLExistingRangeMembers ).Count
SET __XLTargetSet AS
    Subset (
        Union( __XLExistingRangeMembers, __XLNonExistingRangeMembers )
        , 0
        , 1000
    )
SELECT    { [Measures].__XLRelated } ON 0, 
        { __XLTargetSet } ON 1
FROM    [Adventure Works] CELL Properties Value

Obviously this is much more involved than the first query so let’s walk through it step by step.

The first few lines of code is creating named sets for the existing constraints on the pivot table. In this example we are filtering Internet Sales Amount by Canada, 2007, July, and Clothing.

WITH 
SET __XLSlice0 AS { [Customer].[Country].&[Canada] }
SET __XLSlice1 AS { [Date].[Calendar Year].&[2007] }
SET __XLSlice2 AS { [Date].[Month of Year].&[7] }
SET __XLSlice3 AS { [Product].[Category].&[3] }
SET __XLSlice4 AS { [Measures].[Internet Sales Amount] }

In the next snippet, another set is created consisting of the Customer members that are not empty based on the cross product with the second parameter – which is the Crossjoin of the named sets created above.

SET __XLExistingMembers AS
    NonEmpty (
        [Customer].[Customer].[Customer].AllMembers
        , Crossjoin ( __XLSlice0, __XLSlice1, __XLSlice2, __XLSlice3, __XLSlice4 )
    )

It is important to note that the there is an implicit “autoexists” when using the crossjoin function for the second parameter of the NonEmpty function…you can read more about that here.

Next we have 2 more named sets (__XLExistingRangeMembers and __XLNonExistingRangeMembers) which are used to control the paging of results back to the application. Here’s a quick breakdown:

  • __XLExistingRangeMembers uses the subset function to create a set consisting of the first (or next) 1000 members in the previous set __XLExistingMembers.
  • __XLNonExistingRangeMembers defines a complimentary set of members from the Customer dimension/attribute that don’t exist in the current pivot table context. Again, we see the use of the Subset function to limit the results of the set to 1000 members.
SET __XLExistingRangeMembers AS Subset ( __XLExistingMembers, 0, 1000 )
SET __XLNonExistingRangeMembers AS
    IIf (
        __XLExistingRangeMembers .Count = 1000
        , { }
        , Subset (
            Except ( [Customer].[Customer].[Customer].AllMembers, __XLExistingMembers )
            , IIf (
                0 - __XLExistingMembers .Count < 0
                , 0
                , 0 - __XLExistingMembers .Count
            )
            , 1000
        )
    )

Then we have the following calculated member which checks to see whether the current customer is part of the __XLExistingRangeMembers set. If so, then it returns 1 – otherwise 0.

MEMBER [Measures].__XLRelated AS
    Intersect ( [Customer].[Customer].CurrentMember, __XLExistingRangeMembers ).Count

Next we come to the following set which ensures no more than 1000 members are returned to the excel slicer.

SET __XLTargetSet AS
    Subset (
        Union( __XLExistingRangeMembers, __XLNonExistingRangeMembers )
        , 0
        , 1000
    )

Finally, we get to the actual query showing what is returned to Excel.

SELECT    { [Measures].__XLRelated } ON 0, 
        { __XLTargetSet } ON 1
FROM    [Adventure Works] CELL Properties Value

…and here are the results:

image

This result set provides Excel with all the information it needs to populate the first 1000 results with members from the target dimension/attribute as well as whether or not that member should be greyed out (or hidden) in the case that there is no associated data in the pivot table.

You can see the “paging effect” by scrolling down the members in the slicer:

image

4 thoughts on “Filtering a Pivot Table by a Large Dimension

  1. Rick Schultz says:

    Thank you SO much! I’ve been trying to do this for ages, and never found a way. This is a huge time saver for me.

    BTW, one thing you didn’t mention (but it’s a big deal) is that, if you change one of your main filters (via a field in the filter section) the Pivot table stays filtered by your slicer. I often need to filter to a small selection of a large group, and then change all the filters but still keep that small selection (to see how that segment of the population acts in other situations). You have saved me hours. Thank you!

    1. Bill says:

      Hi Rick – glad you found the material useful! Also, good point about maintaining filters on the slicer. Cheers

  2. notmyemailaddress.. says:

    I work in TV and developed an excel doc to monitor average TV ratings per network, program, and certain periods of the day.When I share this with other team members (each of us represents different clients) each of us will need to be able to filter out programs that we don’t want to air in. This will affect each networks avg rating (The avg of all the individual programs on each channel). Unfortunately there are thousands of programs and the only way to reasonably filter through this would be with a cascading filter. A slicer will not work as a pivot lookup formula will be (i hope) automatically pulling this data onto a seperate sheet with other information we factor in. Basically I need a cascading (dependent) filter so the different users can add or remove programs and see how that affects ‘avg ratings’. Hopefully this makes sense, will respond to any feedback!

    1. Bill says:

      Hi [notmyemailaddress],

      Having a bit of trouble following your scenario/question. Perhaps an email would be more appropriate?

      Below is the snippet that’s confusing to me…

      “A slicer will not work as a pivot lookup formula will be (i hope) automatically pulling this data onto a seperate sheet with other information we factor in. Basically I need a cascading (dependent) filter so the different users can add or remove programs and see how that affects ‘avg ratings’”

      …the blog post shows how a slicer can work as a cascading (dependent) filter but you state it will not work (and I’m not clear on the reason…pivot lookup formula + separate sheet).

      Feel free to send me a more detail email (perhaps a few screenshots) – link in the upper right corner of the page.

Leave a Reply to Bill Cancel reply