…brew your own Business Intelligence

MDX Script: Calculated Members (vs Named Sets)

In the previous blog post I discussed the benefits of defining MDX calculations in the cube (instead of in the query using the WITH-clause). The primary benefit was having named sets computed ahead of time (when the MDX Script is executed) and available via the global cache which greatly improves query response time. The first version of that post (corrected later) neglected to specify that it was only the named set calculations that were “computed and cached” when the MDX Script was executed…and *NOT* the calculated members…which is a very important distinction (thanks again to Chris Webb for pointing out that ambiguity).

That said, even though calculated members aren’t computed (only created) when the MDX Script is executed, there is still performance benefits to be had by putting them in the MDX Script. This benefit comes in the form of global scope cache and

 

Scenario 1: Calculated Members Defined in the Query

WITH
    MEMBER [Measures].[qry_SalesAmountHelmet] AS 
        SUM(
            (
                 [Order Date].[Calendar YMD].CurrentMember.Lead(1)
                    :
                 [Order Date].[Calendar YMD].CurrentMember.Lead(60)
                ,[Product].[Product Hierarchy].[Product Subcategory].&[Helmets] // Helmet Subcategory
            )
            ,[Measures].[Sales Amount]
        )

SELECT    {
            [Measures].[qry_SalesAmountHelmet]
        } ON 1,
        {
            [Order Date].[Calendar YMD].[Date].&[20070728]
        } ON 0
FROM    [Adventure Works DW]

Step1: Clear Cache

Step2: Execute Dummy Query triggering MDXScript to be computed

This step doesn’t actually do anything since the MDX Script contains no calculations other than the “Calculate;” command. So what you see below in the profiler trace are 2 Execute MDX Script End events…the first one is for the “Calculate;” command, and the second one is simply for the entire MDX Script signifying that it has been processed.

image

image

Step3: Query Execution 1 (cold-cache)

Pretty straightforward cold-cache execution…we go to disk for one of the SE (storage engine) requests (i.e. EventSubclass 22; Non-cache data) and then hit the SE-cache (i.e. EventSubclass 21; Cache data) several times to finish resolving the rest of the query.

image

image

Step4: Query Execution 2 (warm-cache)

This time around we’re able to avoid going to disk for all of our SE-requests as all of our data is available in the storage engine cache…hence we’re 3 for 3 in cache lookups/hits.

image

image

 

Scenario 2: Calculated Members Defined in the MDX Script

This time we’ve moved the calculated member from previous scenario into MDX Script…

image

Here’s what the query looks like now…

SELECT    {
            [Measures].[internal_SalesAmountHelmet]
        } ON 1,
        {
            [Order Date].[Calendar YMD].[Date].&[20070728]
        } ON 0
FROM    [Adventure Works DW]

Step1: Clear Cache

Step2: Dummy Query to Execute MDXScript

Now that we have a calculated member in the MDX Script, we see an additional Execute MDX Script End event indicating that the [Measures].[internal_SalesAmountHelmet] calculated member has been “created”. This doesn’t mean it was cached…

image

image

Query Execution 1 (cold-cache)

As you may have expected…the results of the cold-cache execution are identical to the previous scenario where the calculation was defined in the query…

image

image

Query Execution 2 (warm-cache)

Now for the warm-cache execution we can avoid the storage engine altogether and grab our result right out of the FE (formula engine) cache (i.e. EventSubclass 2; Get data from flat cache).

If you noticed in the performance counter screenshot from the first execution, there was 1 flat-cache insert. Since our query doesn’t violate any of the rules precluding access/use of the global cache (more on that later) we are able to grab the result from the flat-cache which is typically better than going to the storage-engine (even if the data is already available in the SE-cache).

This is an important point to understand. Even though in the first scenario, where the calculation is defined in the query, we see that same flat cache insert…we can’t use it during the subsequent execution because our query violates the one of the rules which preclude it from accessing the FE-cache. So the best it can do on the second execution is use the SE-cache.

image

image 

Another point to make about the FE-cache and global scope is that objects are sharable across sessions…shown in the trace below where the same query was executed for a 3rd time from a different session (SSMS instead of MDX Studio). This can’t be insanely valuable from a performance perspective in a multi-user scenario where there’s a substantial amount of overlap between the queries submitted by many users (e.g. dashboards, reports, etc).

image

Scenario 3: Calculated Member using Non-deterministic Function in the MDX Script

Now, even though a calculated member is defined in the MDX Script, that does not necessarily mean it will be able to leverage the FE cache. Take the following calculated member which calculates the total sales amount for the last 90 days…

image

Because it uses the Now() function – which is a “non-deterministic” function (meaning it can produce a different result every time it is called) – when we run a very simple query…

SELECT    {
            [Measures].[internal_Sales_LastXDays]
        } ON 0
FROM    [Adventure Works DW]

multiple times…

image

…we see that it isn’t using the FE cache. Instead it can only use the SE cache. This is different from what we saw with the query in Scenario 2…the reason is that the query in scenario 2 was based on a calculated member that didn’t violate any of the “rules” which prevent use of the FE cache.

Note: There are quite a few other ways that prevent calculations from being cached in the FE cache…many of which are located in the Microsoft SQL Server Analysis Services Multidimensional Performance and Operations Guide. Chris Webb (b | t) has also blogged fairly extensively about this in the past (see references at the bottom of this post)

In this case its no big deal because this is a small cube and the query is already fast. But if we really wanted to leverage the FE cache we’d need to find another way to meet this same requirement while avoiding the non-deterministic Now() function.

One way would be to add a new attribute to our date dimension which flags the current date. We can then rewrite the calculation using non-deterministic functions…

image

…thereby getting us back to the FE (flat) cache…

image

Of course, if we really wanted to get nit-picky, then we’d create a set for the current date and then use it in the calculated member…

image

…saving us the 2 calculate non-empty events in the first run of the query (as seen in the previous profiler trace)…

image

Scenario 4: Simple Named Set defined using Base Measure

Let’s switch it up and throw in a few scenarios involving named sets to help illustrate the difference in behavior between named sets and calculated members defined in the MDX Script.

For example, top 10 customers based on Sales Amount…

image

When we clear the cache and execute the dummy query (to get the MDX Script to process), we see that there’s a little more going on…like the query subcube event shown below…

 image

What’s happening is that the named set is being computed and part of that process requires a dip into the storage engine (and eventually to disk) as the named set is based on the [Sales Amount] measure (a base measure of the cube).

If we were to then run a simple query involving that same base measure…

SELECT    {
            [Measures].[Sales Amount]
        } ON 0
FROM    [Adventure Works DW]

…we see that, even though this is the first execution of this query, the data is already available in SE cache. This is because the SE-caching occurred during evaluation of the named set during the execution of the MDX Script.

image

 

Scenario 5: Named Set defined using Calculated Member

Now, say we created the same named set…top 10 customers…but this time we based it on a (very) simple calculated member…

image

No difference…the named set is still computed and we see the same Query Subcube event when the MDX Script is executed…

image

And if we run a basic query with just the calculated member…

SELECT    {
            [Measures].[internal_SalesAmount]
        } ON 0
FROM    [Adventure Works DW]

…we see, again, that the data is available and waiting for us in the SE (measure group) cache…

image

 

Scenario 6: Named Set defined using Calculated Member v2

Here’s a similar example, except the calculated member does a little more than simply reference a base measure…

image

Same as before…barring the minor difference in context of the Query Subcube event…

image

And if we run a simple query with just the calculated member…

SELECT    {
            [Measures].[internal_SalesAmountHelmet]
        } ON 0
FROM    [Adventure Works DW]

…again, we see that the data was there waiting for us in the SE cache…

image

Scenario 7: Named Set defined using Calculated Member v3

This time let’s increase the complexity of our calculated member upon which the named set is based…

image

Now the named set depends on a calculated member which in turn references the current context of the [Order Date] dimension (via the CurrentMember function)…so what happens?

image

Well, this time there’s no Query Subcube during the MDX Script execution.

To understand why, check out the following query…

image

When the MDX Script is evaluated, the current context of the [Order Date] dimension is the “All” member. The query above shows what happens when you apply a relative member function (e.g. Lead) to the all member…the result is (null)…now if you apply (null) context in a calculated member…like you see in the calculation for [SalesAmount_NullContext] above…you get (null).

Wrapping Up

Even though calculated members defined in the MDX Script aren’t cached ahead of time the same way named sets are…there is still a benefit to defining them inside the cube. That benefit primarily comes in the form of the global formula engine cache.

MDX queries with subcubes involving calculations defined in the WITH-clause are unable to access the global scope cache…sorry, it’s just one of the rules :-/  So that means every time the query is executed (even if by the same user in the same session) the calculations have to be recalculated. By having the calculations (sets and members) defined in the MDX Script, they can be referenced directly in the base of the MDX query…which means your query doesn’t need a WITH-clause and can therefore access the global formula engine cache (assuming none of the other rules aren’t being broken).

The concept of scope applies only to the formula engine cache. The storage engine cache doesn’t really have the same concept (though I guess you could consider it as global with no restrictions)…and every query can make use of it, regardless of constructs (i.e. WITH-clause, dynamic security, non-deterministic functions, etc) which would preclude access to the global-FE cache. And that’s a good thing since going to disk is (typically) expensive so stopping at the measure-group cache is always nice. However,

There are quite a few “rules” that will preclude a query from leveraging the global FE cache if violated. As mentioned in the post above, these are covered quite well already so I’ll simply point you to a few reference below…

Further Reading…

3 thoughts on “MDX Script: Calculated Members (vs Named Sets)

  1. RAHUL JOSHI says:

    Thanks so much for such a detailed discussion of different scenarios. On that note, I have a specific scenario for which I am having some performance issues.

    We have a Customer Dimension – With Customer Members identified by Customer Key and Customer ID (Business Side Natural Key) – with about 600,000 unique members. We have a measure group that records the Sales Revenue – and a Measure that records the net sales value – positive (Sales) or negative (refunds / returns) for every record in Sales Fact.

    The MDX query I am writing is a two step query –

    1) Generate a running total of the net sales revenue (Sales – Returns/ Refunds) per customer – for a given Product Category (user input) and only up to a certain date in the current Sales Year (again a user input).

    2) The next step is to provide a Count of Customers – with net sales greater than a fixed amount (again a user input) – across the Store Geography Hierarchy – that consists of State, District and City levels.

    This helps us to track our target customers and also compare different store business areas – for the marketing activities – by suitably changing the value of the target net sales amount for a specific Product Category.

    Since we have a very large number of members in the Customer Dimension, I thought of building a Dynamic Set (inside the CUBE) for step 1 – using NONEMPTY() function to only include the Customers – who have at least one sales record – for the chosen Product Category.

    The net sales amount per Customer (running total of sales transaction amounts) is also defined as inline calculated member (inside the CUBE) – using the Product Category and Date user inputs. Thus we have a non-deterministic calculated member – that uses a non – deterministic dynamic set.

    The Reporting Query simply filters the calculated member – and then produce the results – slicing by the cross join of all the members of Store Geography Hierarchy on each of the level on Rows.

    The performance of the query kinds of fluctuates – and when I studied this in the MDX Studio – The following was the output (for one of the shortest running scenarios): –

    Time : 10 sec 610 ms
    Calc covers : 0
    Cells calculated : 9780
    Sonar subcubes : 8
    NON EMPTYs : 1
    Autoexists : 0
    EXISTINGs : 0
    SE queries : 13
    Flat cache insert : 0
    Cache hits : 30
    Cache misses : 28
    Cache inserts : 12
    Cache lookups : 58
    Memory Usage KB : 10924

    I was wondering if there is a way to enforce block type computation here – as I can see a cell-by-cell computation coming into play here.

    1. Bill says:

      Hi Rahul,

      with regards to your question “is there a way to force block computation”…no, you can’t “force it”, but there are certain MDX patterns that can prevent it. If you don’t mind, I’d prefer to take this conversation over to email as its a fairly specific (and complicated) scenario that will likely require screenshots and possibly sending files back and forth. Please feel free to reach out to me at the following address: anton (at) primedataintelligence (dot) com

  2. RAHUL JOSHI says:

    Thanks so much for your reply. I very much appreciate your time and help on this one. I am also sending you an e mail with more details of the problem. Thanks again.

Leave a Reply