…brew your own Business Intelligence

Aggregation Design Life-Cycle: Introduction

This is the first post in a series of posts covering the Aggregation Design Life-Cycle (ADLC).

  1. Introduction
  2. Create Initial Aggregation Designs
  3. Remove Ineffective Aggregations
  4. Capture Query Workload
  5. Run Usage-Based Optimization (UBO) Wizard
  6. Create Custom Aggregations
  7. Final Thoughts

When it comes to maximizing Analysis Services query performance, nothing beats a solid dimensional data model with appropriately configured attribute relationships and a good measure group partitioning strategy.  This of course assumes the instance is already properly configured with adequate hardware resources.  If you’ve got all that in place and you’re still looking to eek out that last bit of performance, then its time to focus on a good aggregation strategy.

Ricky-Bobby

In this series of post, we’ll cover the entire Analysis Services Aggregation Design Life-Cycle…from creating the initial set of aggregations to the recurring tuning and maintenance needed to keep your aggregations so fresh and so clean.  Along the way I’ll share some tips and recommendations based on a combination of first hand experiences and best practice.

What are Aggregations and What Purpose do they Serve?

From a TSQL perspective, you can think of an aggregation as the result set from a SELECT statement with a GROUP BY clause…only the result set is materialized (like a materialized view) so the response time is near instantaneous.  In the context of a cube, that translates into a measure group being aggregated across 1 or more dimension attributes…the dimension attributes would be columns in the GROUP BY clause, and the measure would be the column or expression inside the aggregate function (ex. SUM, COUNT, MIN, MAX, etc) in the SELECT clause.

Aggregations help with query speed by pre-aggregating data (during processing) which prevents the SSAS engine from having to waste time scanning the entire partition and aggregating the data on the fly before returning the results whenever a query comes in (ex. from a client reporting tool).  For certain queries this can mean the difference between minutes and milliseconds.  There’s just one catch – aggregations have associated costs…mostly in the form of processing time and storage space.  The goal is to design the minimum number of aggregations that will satisfy the maximum number of queries…essentially striking a balance between average query speed and processing time/storage space.

Aggregation Life-Cycle

Unfortunately, aggregation design is typically thought of as a “one and done” development task…which is a bit short sighted.

Think about it – how many businesses track the same metrics, run the same reports, and perform the same analysis year in and year out?  The answer is likely “not many” or perhaps more accurately “not many that will still be in business in 5 years”.

Over time metrics will change (new strategies/initiatives lead to new goals and metrics), reports and dashboards will change (new reports added, old reports retired), and ad-hoc query patterns (generated by information workers via Excel/PowerPivot) will change.  These are the objects responsible for the majority of the queries that hit the SSAS database.  If they are changing, then it’s only logical that the aggregation designs supporting these common queries need to change as well.

Enter the Aggregation Design Life-Cycle:

ADLC

The Aggregation Design Life-Cycle is an iterative process aimed at keeping the aggregations in your SSAS database relevant over time.

Everything starts off with an initial set of aggregations that are designed just before deploying the cube into production.  The rest of the steps are executed in a loop once the cube is in production and are based on real statistics about the workload the cube is handling.

The first step in the production-side iteration is to remove any aggregations that are no longer being used or otherwise ineffective from a query-response perspective.  Next we capture queries that represent the day-to-day and period-end workload handled by the cube.  Those queries are fed into the Usage-Based Optimization (UBO) Wizard which allows us to create aggregations that target the slowest and most frequent queries in the workload.  Finally, we supplement the aggregation design with a few hand-crafted aggregations aimed at the one-off queries that we may have missed during UBO.  After a bit of time passes, the cycle repeats…starting back up with the Remove Unused Aggregations step.

Stay tuned as we explore the steps of this process in more detail.

One thought on “Aggregation Design Life-Cycle: Introduction

Leave a Reply