…brew your own Business Intelligence

MDX Performance Tip: Move Calculations From the Query to the MDXScript

In the world of business intelligence, most people prefer to embed as much business logic as possible into the ETL and materialize it in the data model (i.e. data mart, cube, tabular model, etc). After all, if we can take the hit during our batch processing window – then that’s time we just saved our users since they won’t need to take the hit at query time (e.g. when running a report). Plus it helps ensure 1-version-of-the-truth because report developers and business analysts don’t have to recreate the logic…it’s baked in.

(report developer)

Unfortunately, there are many scenarios where the business logic can’t be embedded in the ETL and/or data model…perhaps the data model is already well established and the client doesn’t have the resources (i.e. time/skills/money) to make the necessary changes…or maybe the business logic just doesn’t scale well (enough) in the ETL (especially in low-latency/realtime scenarios where the ETL needs to be as fast as possible). In these cases, the next best place is to embed the business logic in the semantic layer (if you’re not sure why, read this)…which, if you’re using SSAS Multidimensional as your semantic layer, means using the MDXScript!

In this post, we’ll pickup a query from a while back and I’ll highlight the benefits of leveraging the MDXScript…

The Set Up

To showcase the power of pushing the calculations into the cube via the MDXScript, let’s revisit one of the queries from a post last year. In that post, we started with a really nasty query that took ~42 sec to run on a cold-cache and ~41 sec on a warm-cache. What you see below was the final product used to highlight a few tuning techniques…it runs in ~1 sec on both cold and warm cache.

WITH
    MEMBER [Measures].[CustomerAge] AS
        DateDiff(
             "yyyy"
            ,[Customer].[Customer].CurrentMember.Properties("Birth Date", TYPED)
            ,Cdate("1/1/2007")
        )
    SET Customers18to35 AS 
        Filter(
             [Customer].[Customer].[Customer]
            ,[Measures].[CustomerAge] >= 18 AND
             [Measures].[CustomerAge] <= 35
        )
    SET CustomerBikePurchase AS
        NonEmpty(
             (Customers18to35,[Order Date].[Calendar YMD].[Date])
            ,(
                 [Product].[Product Hierarchy].[Product Category].&[Bikes] // Bike Category
                ,[Order Date].[Calendar YMD].[Year].&[2007]
                ,[Measures].[Sales Amount]
             )
        )

    MEMBER [Measures].[SalesAmountHelmet] AS 
        SUM(
            (
                 [Order Date].[Calendar YMD].CurrentMember.Lead(1):[Order Date].[Calendar YMD].CurrentMember.Lead(60)
                ,[Product].[Product Hierarchy].[Product Subcategory].&[Helmets] // Helmet Subcategory
            )
            ,[Measures].[Sales Amount]
        )
    SET CustomerHelmetPurchase AS
        Filter(
             CustomerBikePurchase
            ,[Measures].[SalesAmountHelmet] > 0
        )            
    MEMBER [Measures].[SalesAmountTire] AS 
        SUM(
            (
                 [Order Date].[Calendar YMD].CurrentMember.Lead(1):[Order Date].[Calendar YMD].CurrentMember.Lead(90)
                ,[Product].[Product Hierarchy].[Product Subcategory].&[Tires and Tubes] // TiresAndTubes
            )
            ,[Measures].[Sales Amount]
        )
    SET CustomerTirePurchase AS
        Filter(
             CustomerBikePurchase
            ,[Measures].[SalesAmountTire] > 0
        )

    /*    Customers between 18-35 who purchased a helmet within 60 days after a bike purchase */
    SET DenominatorSet AS CustomerHelmetPurchase

    /*    Customers between 18-35 who purchased a helmet within 60 days after a bike purchase 
        and who purchased a tire within 90 days after bike purchase
    */
    SET NumeratorSet AS 
        Intersect(
             DenominatorSet
            ,CustomerTirePurchase
        )

    MEMBER [Measures].[Denominator] AS Count(DenominatorSet)

    MEMBER [Measures].[Numerator] AS Count(NumeratorSet)

    MEMBER [Measures].[Performance] AS
        IIF(
             [Measures].[Denominator] = 0
            ,Null
            ,[Measures].[Numerator]/[Measures].[Denominator]
        )
    ,FORMAT_STRING = "percent"

    MEMBER [Measures].[Missing Customers] AS
        Generate(
             Except(DenominatorSet,NumeratorSet)
            ,[Customer].[Customer].CurrentMember.Name
            ,", "
        )

SELECT    {
             [Measures].[Denominator]
            ,[Measures].[Numerator]
            ,[Measures].[Performance]
            ,[Measures].[Missing Customers]
        } ON 0
FROM    [Adventure Works DW]

Here’s what this query is doing…

Out of all customers between the ages of 18 and 35 on 1/1/2007 who purchased a bike in 2007 and a helmet within 60 days following the bike purchase (but not on the same day), how many also purchased a bike tire within 90 days after the bike purchase?

In addition to calculating the components of the ratio (i.e. denominator, numerator) we also needed to generate a list of customers who satisfied the denominator (i.e. customers 18-35 who purchased a helmet within 60 days after purchasing a bike) but didn’t qualify for the numerator (i.e. didn’t purchase a bike tire within 90 days of the bike purchase)

Note: Yes, it seems contrived, but a promise you requirements like this exist. Since I’m using AdventureWorks, its intuitive to think of this from a retail perspective. In that case, this type of query might be part of an analysis focusing on missed upsell opportunities (i.e. pushing helmets and bike tires to customers purchasing a bike). However, if we use different nouns – patients, diagnosis, labs, procedures instead of customers, bikes, helmets, tires – one might be able to see how this would apply to healthcare data which in my opinion is wayyyyy more interesting (and arguably more beneficial to society) than selling more widgets) Smile

Utilizing the MDXScript

Instead of walking through the “how” (as in how we move the bulk of the calculations into the cube) let’s start off with the results and work our way backwards…

Warm Cache

image

Cold Cache

image

Results (just to make sure they are the same – yep)…

image

Here’s what the query looks like after the rewrite…

SELECT    {
             [Measures].[internal_Denominator]
            ,[Measures].[internal_Numerator]
            ,[Measures].[internal_Performance]
            ,[Measures].[internal_Missing Customers]
        } ON 0
FROM    [Adventure Works DW]

At first glance, the performance improvement looks pretty sick…we went from a ~1 sec query duration down to 25 ms on a cold-cache (and 8ms warm-cache).

Here’s what the MDXScript looks like…

/*
The CALCULATE command controls the aggregation of leaf cells in the cube.
If the CALCULATE command is deleted or modified, the data within the cube is affected.
You should edit this command only if you manually specify how the cube is aggregated.
*/
CALCULATE;

CREATE MEMBER CurrentCube.[Measures].[internal_CustomerAge] AS NULL;

SCOPE(
    [Customer].[Customer].Children,
    [Measures].[internal_CustomerAge]
);
    THIS = DateDiff(
            "yyyy"
            ,[Customer].[Customer].CurrentMember.Properties("Birth Date", TYPED)
            ,Cdate("1/1/2007")
        );
END SCOPE;

CREATE SET CurrentCube.[internal_Customers18to35] AS 
    Filter(
         [Customer].[Customer].[Customer]
        ,[Measures].[internal_CustomerAge] >= 18 AND
         [Measures].[internal_CustomerAge] <= 35
    )
,Display_Folder = 'Sets'
;

CREATE SET CurrentCube.[internal_CustomerBikePurchase] AS 
    NonEmpty(
         ([internal_Customers18to35],[Order Date].[Calendar YMD].[Date])
        ,(
             [Product].[Product Hierarchy].[Product Category].&[Bikes] // Bike Category
            ,[Order Date].[Calendar YMD].[Year].&[2007]
            ,[Measures].[Sales Amount]
        )
    )
;

CREATE MEMBER CurrentCube.[Measures].[internal_SalesAmountHelmet] AS 
    SUM(
        (
             [Order Date].[Calendar YMD].CurrentMember.Lead(1):[Order Date].[Calendar YMD].CurrentMember.Lead(60)
            ,[Product].[Product Hierarchy].[Product Subcategory].&[Helmets] // Helmet Subcategory
        )
        ,[Measures].[Sales Amount]
    )
;
CREATE SET CurrentCube.[internal_CustomerHelmetPurchase] AS
    Filter(
         [internal_CustomerBikePurchase]
        ,[Measures].[internal_SalesAmountHelmet] > 0
    ) 
;

CREATE MEMBER CurrentCube.[Measures].[internal_SalesAmountTire] AS 
    SUM(
        (
                [Order Date].[Calendar YMD].CurrentMember.Lead(1):[Order Date].[Calendar YMD].CurrentMember.Lead(90)
            ,[Product].[Product Hierarchy].[Product Subcategory].&[Tires and Tubes] // TiresAndTubes
        )
        ,[Measures].[Sales Amount]
    )
;
CREATE SET CurrentCube.[internal_CustomerTirePurchase] AS
    Filter(
            [internal_CustomerBikePurchase]
        ,[Measures].[internal_SalesAmountTire] > 0
    )
;
/*    Customers between 18-35 who purchased a helmet within 60 days after a bike purchase */
CREATE SET CurrentCube.[internal_DenominatorSet] AS 
    [internal_CustomerHelmetPurchase]
;

/*  Customers between 18-35 who purchased a helmet within 60 days after a bike purchase 
    and who purchased a tire within 90 days after bike purchase
*/
CREATE SET CurrentCube.[internal_NumeratorSet] AS 
    Intersect(
         internal_DenominatorSet
        ,internal_CustomerTirePurchase
    )
;

CREATE MEMBER CurrentCube.[Measures].[internal_Denominator] AS 
    Count(internal_DenominatorSet)
;

CREATE MEMBER CurrentCube.[Measures].[internal_Numerator] AS 
    Count(internal_NumeratorSet)
;

CREATE MEMBER CurrentCube.[Measures].[internal_Performance] AS
        IIF(
             [Measures].[internal_Denominator] = 0
            ,Null
            ,[Measures].[internal_Numerator]/[Measures].[internal_Denominator]
        )
    ,FORMAT_STRING = "percent"
;

CREATE MEMBER CurrentCube.[Measures].[internal_Missing Customers] AS
        Generate(
             Except(internal_DenominatorSet,internal_NumeratorSet)
            ,[Customer].[Customer].CurrentMember.Name
            ,", "
        )
;

All the calculated members and sets you see above are nearly-identical to the ones in the first query. The only difference is some minor syntax and one scoped-assignment for the [CustomerAge] calculated measure.

The main benefit of moving the calculations into the MDXScript is that we now gain the benefit of leveraging the global cache. The calculated sets defined in the MDXScript get computed ahead of time (when the MDXScript is executed) and stuck in the cache where they are then available for most (subsequent) MDX queries. On the other hand, defining the calculations in the actual MDX query (via the WITH-clause) means that the calculated sets will need to be computed every time…which is why the original query (where the calculations are defined in the query) always take ~1 sec no matter how many times it is executed.

To help illustrate that concept, let’s clear the cache and run the new query again with a profiler trace running in the background…

image

Wait – what happened to the 25ms query duration?

Well, earlier when I showed the 25ms cold-cache duration for the new query, I wasn’t telling the whole truth.

To get the 25ms query duration, I performed the following steps…

  1. cleared the cache
  2. ran an empty query (e.g. SELECT {} ON 0 FROM [<cube name>])
  3. executed the query giving me the 25ms duration

Step 2 is the one I didn’t mention…and the one which caused the MDXScript to be executed. Therefore, when we got to step 3 (executing the query)…and even though it was technically the first run of that query (which is typically the way we define a “cold-cache” run) the calculated sets where already computed and waiting in cache.

If we leave out step 2, then we get the query response time shown above (1sec 49ms). This is because SSAS has to execute the MDXScript before it can process the query – regardless of whether the query uses any of the calculations in the MDXScript <– (an important point, hence the red font)

We can see this effect in the profiler trace below…

image

The first event is the QueryBegin…then there’s a whole bunch of Execute MDX Script events…followed by the QueryEnd event. The total query execution time was 1042 ms, but most of that is attributed to the MDXScript execution events (1039ms). This is roughly the same duration as the first query (where all of the calculations are defined in the MDX query) and that’s no coincidence. The difference is that all subsequent executions of the new query (using the calculations in the cube) will benefit from the global cache and execute insanely fast (e.g. 8ms). On the flip side, the first version of the query can only use the query-scoped cache…which means the calculated sets will need to be recomputed for every single execution of the query.

Note: this is a pretty good method for figuring out which parts of the MDXScript need tuning. For example, in the trace above, there’s one calculated set in the script that takes > 90% of the time and we can see (below) exactly which statement is responsible for that delay.

image

However, thanks to Chris Webb (b | t), Greg Galloway (b | t), and Carl Rabeler (b | t) there’s also a utility out on codeplex called MDX Script Performance Analyzer (which I just realized was accidentally left out of my post on the top free tools for SSAS 2012 developers) that can make this even easier.

Wrap Up

Moving calculated sets from the query into the cube (by adding them to the MDXScript) is a good way to boost performance via leverage of the global cache (as opposed to the query or session scoped cache). That said, ever heard the phrase “there’s no such thing as a free lunch”? Same thing applies here. Computing the calculated sets in the MDXScript can not be avoided…it has to occur (when the MDXScript is executed). The good news is that we can in some ways control when the MDXScript is executed (e.g. after hours right after the cube is processed by sending in a dummy query) so that it doesn’t happen when the CEO brings up her favorite dashboard.

Keep in mind, when working in the MDXScript things can get complicated fast and you can find yourself in a nasty situation where having the calculations in the MDXScript is doing more harm than good. I’ve found myself there before (admittedly on more than one occasion) and troubleshooting can get very ugly and you may even begin to question your sanity.

Stay tuned for a future post where we’re going to build off of this example and add a bit of complexity by making the calculations in the MDXScript more dynamic. After all, its pretty unrealistic to expect the users to not want the ability to control the age-range, number of days, etc… people just want everything don’t they?

 

 

Update 2015-02-16 (845 ET): Initial posting was unclear with regard to “which” calculations get evaluated when the MDX Script is executed…Answer: Named Sets (not Calculated Members). H/T Chris Webb for pointing that out as it is an important distinction.

One thought on “MDX Performance Tip: Move Calculations From the Query to the MDXScript

Leave a Reply