…brew your own Business Intelligence

Implementing Logical AND on Members From Same Hierarchy

The default behavior when filtering on multiple members from the same attribute hierarchy is a logical OR.

image

In the pivot table above, we have the [Internet Customer Count] measure (which is a distinct count measure) sliced by Product Subcategory and filtered for [Mountain Bikes] and [Tires and Tubes]. This shows that 4,089 customers bought at least 1 product from the [Mountain Bikes] subcategory and 8,490 customers bought at least 1 product from the [Tires and Tubes] subcategory. The grand total shows 11,472 – which is the number of customers who bought at least 1 product from the [Mountain Bikes] subcategory OR the [Tires and Tubes] subcategory.

Note: The astute reader will notice that the grand total does not equal the sum of the rows – this is standard behavior for Distinct Count measures. Some customers purchased products from both the [Mountain Bikes] subcategory and [Tires and Tubes] subcategory – simply adding the values on rows (i.e. 4,089 + 8,490) would double count these customers.

We can validate the grand total using the following SQL query against our source…

image

But what if we need to answer a slightly different question… How many customers purchased at least 1 product from the [Mountain Bikes] subcategory AND at least 1 product from the [Tires and Tubes] subcategory?

Note: Changing the WHERE-clause of the previous SQL query to use “AND” instead of “OR” is incorrect and returns zero. 

The answer is 1,107…

image

To add another product subcategory to the criteria, we simply add another subquery via intersect…

image

As you can see, out of the 1,107 customers who have purchased products from [Tires and Tubes] and [Mountain Bikes], only 110 of them also purchased a product from the [Jerseys] subcategory. If you throw in an other subquery for [Helmets], that drops the number down to 61.

In the rest of this post, we’ll walk through how to implement this in Analysis Services (both tabular and multidimensional).

Tabular

With tabular, we don’t have to make any changes to the model and can start immediately writing measures to implement this functionality.

The first measure is a “helper” measure that calculates the number of product subcategories  in the current filter context…

[Product Subcategory Filter Count] := 
    COUNTROWS( 'Product Subcategory' )

image

 

With that in place, we can write our [Customers – All Items] measure as follows…

[Customers - All Items] :=
    COUNTROWS (
        FILTER (
            SUMMARIZE (
                'Internet Sales',
                'Internet Sales'[CustomerKey],
                "NumRows", 
                    CALCULATE ( 
                        [Product Subcategory Filter Count], 
                        'Internet Sales' 
                    )
            ),
            [NumRows] >= [Product Subcategory Filter Count]
        )
    )

Here’s how this measure works. First we summarize the ‘Internet Sales’ table by Customer and calculate the number of product subcategories for each customer (i.e. “NumRows”). The resulting table is then filtered/reduced to only those customers where the number of product subcategories is greater than or equal to the number of product subcategories in the current filter context. Finally, we count the rows of the filtered table which gives us the number of customers who purchased products from all selected product subcategories.

image

And below we see that it is fully dynamic when adding more subcategories to the criteria…

image

It also works in Excel which generates MDX against the tabular model (whereas Power BI generates DAX against the tabular model)…

excel against tabular

Multidimensional

We can build the same functionality in a multidimensional cube – but the solution is a bit more involved.

Assuming for a moment, we have complete control over the queries generated (which we do when using reporting tools like SSRS), we might add the following calculated measures to the MDX Script…

image

…and then write our query like…

image

To adjust the selected categories, we simply change the members in the WHERE-clause…

image

However, with many popular self-service reporting tools (e.g. Excel, Power BI, etc) the developer has no control over the query getting generated. Excel, for example, will generate a query like the one below, where the filtering has been applied via a subselect…

image

…this works great for many excel/pivot table situations (e.g. Visual Totals). But it makes life difficult for the SSAS developer.

Looking at the previous screenshot, it’s clear that the subselect breaks our [Product Subcategory Filter Count] measure – which now returns the value (37) corresponding to all members of the product subcategory attribute and our [Customers – All Items] measure returns zero because no customers have purchased products from all 37 product subcategories. The issue is that we can’t easily detect the context of a subselect within a calculated measure.

In order to get this to work in our multidimensional cube, we need to be able to determine the number of members filtered in the subselect. Below are 2 approaches to accomplishing this…

Method 1: New Measure Group + Many-2-Many

When first presented with this issue at a client, this was the solution I came up with.

  1. Create a Distinct Count measure on the Product Subcategory column of the Product table.image
  2. Create a relationship between the Customer dimension and the (new) Product measure group.image
  3. Add the following measure to the MDX scriptimage

 

And voila – our users are now able to answer the question… “How many customers purchased products from each of the selected product subcategories?” …in self-service tools like Power BI…

image

…or Excel…

image

Method 2: Dynamic Set

This method is based on the comment below from Andrei Dirothcka and requires zero structural changes to the cube. We can simply add the following code to the MDX script…

dynamic set

…and we get the same results. The magic behind this solution is described in this post by Mosha Pasumansky.

Method 1 vs Method 2

Method 2 is quite elegant and easy to implement but performance is less than ideal when the attribute being counted (customers) is large (> 18k).

On my system (VM: 2 cores + 8GB memory + SSD), the following query takes ~6 seconds on both cold and warm cache.

image_thumb.png

Looking at a quick profiler trace, we see a separate subcube request for every customer…

fe-bound

And even though they are all resolved from cache, the performance hit of that many requests takes its toll. If we were counting objects with fewer members, this would be a much more reasonable approach and probably the one to go with.

On the other hand, Method 1 takes only 125ms for the same query on a cold cache (109ms on warm cache).

m2m-mg

Final Thoughts

This is a pretty good example highlighting the flexibility of DAX and tabular models over MDX and multidimensional cubes. As you saw above, no changes were needed to the generic Adventure Works tabular data model to achieve this functionality – all we had to do was write some creative (but not very complex) DAX. The multidimensional solution is a bit more involved and requires a deeper understanding of modeling techniques and MDX evaluation.

 

6 thoughts on “Implementing Logical AND on Members From Same Hierarchy

    1. Bill says:

      Thanks Thomas

  1. Great article!
    Still I have a question – in MDX case why you have to introduce intermediate measure group, there is a tectonics with dynamic sets – described in Mosha’s article http://sqlblog.com/blogs/mosha/archive/2007/08/26/mdx-in-katmai-dynamic-named-sets.aspx or more specific sample from here http://piers7.blogspot.ru/2011/05/excel-multiselect-issues-with-ssas.html.

    1. Bill says:

      Andrei – thanks again for the comment – never even considered the dynamic set approach to detecting a subselect. post has been updated based on your suggestion!

  2. Thank you for article update!
    Yoг have made an interesting point comparing performance of dynamic set vs your intermediate measure approach, and it dynamic set is not a favored solution. Thank you for this observation.

Leave a Reply