Categories
Uncategorized

DAX: Performance Gains via Aliases and Variables

 

Consider the following DAX snippet:

SWITCH ( 
     TRUE()
    ,SUM('Internet Sales'[Sales Amount]) > 1000000, "A"
    ,SUM('Internet Sales'[Sales Amount]) > 100000, "B"
    ,"C"
)

Very straightforward, right? If the sum of sales amount is > 1mm, return “A”… if its between 1mm and 100k, return “B”… otherwise, return “C”.

Note: SWITCH-statements are merely syntactic sugar. Under the covers, they get expanded into nested IF-statements like…
IF( <condition>, <true>, IF( <condition>, <true>, <false> ) )

Since the evaluation criteria for each branch is a simple sum-aggregation over a base measure for a very small dataset, performance is not going to be an issue. However, if the measure is more complicated and takes more time to compute, then we could run into a performance problem and need to make adjustments. To understand why, we need to investigate how this code is being executed which we can easily do with DAX Studio

Here’s the snippet used in a DAX query…

DEFINE
    MEASURE 'Internet Sales'[ProductSegment] = 
        SWITCH ( 
             TRUE()
            ,SUM('Internet Sales'[Sales Amount]) > 1000000, "A"
            ,SUM('Internet Sales'[Sales Amount]) > 100000, "B"
            ,"C"
        )
        
EVALUATE
 (
    SUMMARIZE (
        'Internet Sales',
        'Internet Sales'[ProductKey],
        "Sales", SUM('Internet Sales'[Sales Amount]),
        "Segment", [ProductSegment]
    )
)
ORDER BY [Sales] DESC

Below is the (cold cache) execution details…

image

As you can see there are 4 storage engine (SE) requests. If you click on each line, you’ll see that 1 is to get a distinct list of ProductKeys and the other 3 are retrieving the sum of SalesAmount by ProductKey …

image

And here’s where those are falling in the physical query plan…

image

Note: even for this simple query, relating operations in the physical query plan to storage engine requests can be tricky. In this case, however, we can use clues like the number of records and the data types to make the mappings. For more information, check out the excellent whitepaper: Understanding DAX Query Plans

As you can see from above, 2 of the SE requests fall within the SWITCH-statement (which show up in the query plan as nested IF-statements). And even though these requests are spooled from cache, they still have a cost (albeit a very low cost in this example) of server resource (CPU/memory) and time.

Ideally the DAX query engine would be smarter and realize that it only needs to evaluate this calculation 1x for the SWITCH statement. Unfortunately, that is not yet the case (as of SSAS 2014).

Now that you understand the problem, I would like to share a solution to reduce the amount of work done to resolve this query.

Optimization using ALIASES

Here’s the same query written another way…

EVALUATE
 (
    ADDCOLUMNS (
        SUMMARIZE (
            'Internet Sales',
            'Internet Sales'[ProductKey],
            "Sales", SUM('Internet Sales'[Sales Amount])
        ),
        "Segment", SWITCH ( TRUE (), [Sales] > 1000000, "A", [Sales] > 100000, "B", "C" )
    )
)
ORDER BY [Sales] DESC

This time we’re calculating “Segment” using the alias [Sales] which is only calculated 1x in SUMMARIZE-function. The result set is the same but the amount of work done to resolve the query is much less. Below you can see we now only have 2 SE requests… 1x for the distinct ProductKeys and the other for the sum of SalesAmount by ProductKey…

image

And the physical query plan shows this as well…

image

As you can see the nested IF logic is referencing the [Sales] value instead of doing separate spools.

One thing to keep in mind is that this is only going to work when you are using a reporting client (e.g. SSRS) that allows you to hand-craft the DAX query. This will not work for other reporting clients (e.g. PowerBI, PowerView, etc) where the DAX is automagically generated via interaction w/ the UI.

Optimization using Variables

Variables are new to DAX 2015… meaning they can only be used with SSAS 2016, Excel 2016, and Power BI Desktop. Furthermore, they’ve already been covered by Marco Russo (b | t) in this blog post…so best to read that first as I’m only going to show you how the concept applies to this example.

Below is the same query re-written to take advantage of DAX variables…

DEFINE
    MEASURE 'Internet Sales'[ProductSegment] =
        VAR vSA = SUM('Internet Sales'[Sales Amount])
        RETURN
            SWITCH(
                 TRUE()
                ,vSA > 1000000,"A"
                ,vSA > 100000, "B"
                ,"C"
            )

EVALUATE(
    SUMMARIZE(
        'Internet Sales'
        ,'Internet Sales'[ProductKey]
        ,"Sales",SUM('Internet Sales'[Sales Amount])
        ,"Segment",[ProductSegment]
    )
)
ORDER BY 
    [Sales] DESC

Below is a look at the output from DAX Studio when running the query above against the same model running on an instance of SSAS 2016 CTP3.

image

As you can see, there are only 2 SE requests… 1x for the distinct list of ProductKeys, 1x for the sum of SalesAmount by ProductKey.

To me, that was a bit strange as I was expecting to see 3 SE requests…

  1. 1x for the distinct list of ProductKeys
  2. 1x for the sum of SalesAmount by ProductKey attributed to the “Sales” column
  3. 1x for the sum of SalesAmount by ProductKey attributed to the vSA variable used in the ProductSegment calculated measure

However, I was even more caught off-guard when I looked at the physical query plan…

image

…which very closely resembles the query plan from the initial execution showing what appears to be 3 SE requests related to SalesAmount. However, if you look very closely, the difference shows up in the LogOp attribute of the SpoolLookup operator.

The first one (with LogOp = Sum_Vertipaq) is the only one that actually hits the storage engine…

image

…while the second 2 which show up down in the nested IF-construct have a LogOp=ScalarVarProxy…

image

Very interesting. Despite explicitly referencing the calculation (i.e. SUM(‘Internet Sales'[SalesAmount] ) 2x we see it only being calcualted (via LogOp=Sum_Vertipaq) only 1x. Based on this observation (as well as some other discovery-type of work I’ve been doing related to “Super DAX”) and of course some blog posts that have been released by Kasper and the SSAS team (see here, here, here, & here) regarding improvements to SSAS Tabular 2016 CTP3, it seems we can expect quite a bit of internal optimizations added to the DAX query engine.

Conclusion

As you can see, using DAX variables is a much better solution than using the aliases…the performance improvement is about the same, however, variables we can wrapped up in the calculated measures inside the model allowing us to take advantage of the performance gain with all tools (not just those allowing us to hand-craft the DAX queries).

The query used in this post is too simple to highlight the performance benefit (small data dataset, simple calculation)…but it did make it easier to cruise the query plans and SE requests. In reality, a better use case for highlighting the performance benefits of these optimizations is with a query that hammers the Formula Engine (FE).

Take the query below… which runs in just over 3 seconds on my VM. The alternative (i.e. not using DAX variables) takes just over 8 seconds…

DEFINE
    MEASURE 'Internet Sales'[complex measure] = 
        SUMX(
            'Date'
            ,COUNTROWS (
                FILTER (
                    'Internet Sales',
                    'Internet Sales'[Order Date] < 'Date'[Date] &&
                    'Internet Sales'[Ship Date] > 'Date'[Date]
                )
             )
         )
    MEASURE 'Internet Sales'[ProductSegment] =
        VAR vSA = [complex measure]
        RETURN
            SWITCH(
                 TRUE()
                ,vSA > 1000000,"A"
                ,vSA > 100000, "B"
                ,"C"
            )

EVALUATE(
    SUMMARIZE(   'Internet Sales'
        ,'Internet Sales'[ProductKey]
        ,"Sales",[complex measure]
        ,"Segment",[ProductSegment]
    )
)
ORDER BY 
    [Sales] DESC

One reply on “DAX: Performance Gains via Aliases and Variables”

Leave a comment