ADLC: Final Thoughts

This is the 7th, and final, post in a series of posts covering the Aggregation Design Life-Cycle (ADLC).

  1. Introduction
  2. Create Initial Aggregation Designs
  3. Remove Ineffective Aggregations
  4. Capture Query Workload
  5. Run Usage-Based Optimization (UBO) Wizard
  6. Create Custom Aggregations
  7. Final Thoughts

This post is a catch-all for additional considerations that should be factored in before, during, and after each iteration of the ADLC.

ADLC

What about Caching?

Some are quick to dismiss Aggregations arguing that the performance-related benefits of caching in Analysis Services (and the Windows file-system to a somewhat lesser extent) render the touted benefits of aggregations all but useless.  On the other side of that argument are the masses who are diligently devoted to aggregations as the be-all-end-all when it comes to improving query performance against a cube.

Note: It is my opinion that the main driver for such large numbers of developers discounting the value of cache (relative to that of aggregations) is experience.  New developers (with little experience) have likely not given serious thought to the value of caching since it is an advanced topic that is typically only alluded to (if mentioned at all) in beginner texts.  Then there are those developers (lacking recent experience) who cut-their-teeth back in the days of 32bit systems…when the amount of memory available for cache stores was severely choked…who haven’t revisited the concept now that 64bit systems are ubiquitous and the cost of memory is cheap.

Nevertheless, neither group is 100% correct 100% of the time…because, as goes the saying, “it depends”!

In order to address this argument, it is important to know a bit about caching (hopefully you already know a little about aggregations).  Caching refers to the capability where Analysis Services retains recently accessed data in memory.  By doing so, if another request for the same dataset comes in, it can be quickly satisfied without having to go to disk.  It’s a bit more complicated than that…but from a high-level and in the context of data retrieval, the diagram below should help form a mental model:

Data Access Performance Hierarchy

Note: this process can actually be broken down quite a bit more if you distinguish between the flat and calculation cache in the Formula Engine and factor in the impact of the Windows file-system cache on aggregations and base partitions. In fact, the impact of the Windows file-system cache is quite significant…something I didn’t quite realize until reading this post by Greg Gonzalez (b | t) from SQL Sentry.

As you can clearly see in the diagram above, aggregations only come into play when the data isn’t already available in the SSAS cache stores.  That’s a *really* important concept and it forms the basis of the argument that “aggregations are somewhat pointless in the face of Analysis Services caching”.

So building from that logic, the value of having aggregations goes up under any scenario where the odds of a cache-hit decreases (implying the need to go to disk for the data)…such as in the following scenarios:

  1. when the Analysis Services instance is under memory-pressure and entries in the cache-stores are being actively “evicted” by the memory cleaner process to free up memory for other tasks and/or datasets.
  2. when partition(s) are being processed very frequently (to achieve low latency) and the related entries in the cache-stores are cleared out

In both of the scenarios above, having aggregations in place will help.

Note: Even with the pervasive use of 64bit systems and super-cheap memory, there are still plenty of scenarios where an Analysis Services instance running on a system with > 64GB of RAM run into memory-pressure.  The topic of SSAS-memory pressure is worthy of a post on its own…but you can confirm the presence of memory pressure via Perfmon counters or a performance monitoring tools such as SQL Sentry’s Performance Advisor for Analysis Services.

Always Be Closing Measuring

If you aren’t periodically measure performance of the SSAS database, how will you know if your aggregation designs are helping?  Hint: you won’t!  That’s why you should be measuring throughout the ADLC to make sure you are heading in the right direction.

Note: you should actually be measuring performance on a recurring basis regardless of whether you are explicitly implementing changes.  It’s a proactive measure (and best practice) that provides a starting point for addressing issues such as: “the cube is taking a long time to process”, “reports are slow”, etc.

glengarry_rect

The only legitimate questions you should have on this topic are “what to measure?” and “how to measure”…both of which, to a certain extent, depend on your goals/requirements (ex. raw query speed, max concurrency, etc).

Bob Duffy (b | t) gave a solid presentation at SQLBitsX titled Load Testing Analysis Services where he touches on both of these questions by showing  you how to setup a load test and compare the results.

The case-study in his presentation was primarily dealing with high-concurrency…where he measured the max number of users and query throughput achievable within predefined constraints (query response < 3s, CPU < 80%).  That may not be exactly what you’re aiming for, but a lot of the same principles apply. Perhaps you only have a handful of users and need rip-roaring speeds in which you’re primarily concerned with raw-speed?

After figuring out what you want to measure…the next step is constructing an automated test that you can easily run throughout the various steps in the ADLC.  Any tool that allows you to automate the execution of a pre-collected workload against the analysis services database will do…this list is not comprehensive:

In Bob’s presentation, referenced above, he used Visual Studio as the test harness and had a small bit of code that basically looped through MDX queries stored in a relational table and executed them against the cube.  This provides a fairly easy means of controlling (or randomizing) the order of execution, the number of concurrent connections, and frequency of queries.

Once you have your workload and testing framework in place, the final step is to simply run the workload periodically (and definitely after each major change), measure the results (ex. average query duration, throughput, resource utilization, etc) and compare with the prior test.

Timing Between Steps and Iterations

Once you wrap your head around the purpose of each step and the iterative nature of the cycle, timing becomes fairly intuitive.

For example, the first step (Create Initial Aggregation Designs) should be completed before deploying to production but after you have a basic idea of the type of queries that will be hitting the cube (reports, dashboards, etc).  The ideal time to carry out this step is in conjunction with QA or UAT (user acceptance testing).

Once the cube is in production, the first iteration (steps 2-5) should begin as soon as a production workload can be captured…which varies depending on the roll out strategy (is the cube being immediately opened up to all departments or just a select few users).  Once the production workload can be captured, each step (2-5) can be run one-after-another until the iteration is complete.  In fact, steps 3 and 4 (Capture Query Workload, Run UBO Wizard) should be run as close together as possible.

After each iteration, the aggregations should have a pretty good “fit” for supporting the production workload.  So there’s not really much point in starting the next iteration until the production workload changes…as the business metrics and analysis evolve.  That pace will vary from business to business, but a good way to tell from the IT side is when you start to see a bunch of new report requests come in…or new source systems, fact tables, dimensions, and similar changes coming down the project pipeline.

Special Scenarios

Before closing, there are two special scenarios that require some attention regarding aggregations:

 

References

Leave a Reply