SSAS Scale-Out Architectures
Processing can be a very resource intensive task which will absolutely interfere with query performance when using a single server deployment. Most clients can get around this issue by simply processing after hours and/or designing and implementing a complex incremental processing scenario (e.g. process add + process partitions).
However, when it comes to large data sizes and/or near realtime (NRT) latency requirements, neither of those are viable solutions. Large data requires a longer processing duration and decreased latency requires more frequent processing. Eventually, you get to the point where processing overlaps with an active query load.
At this point you’re left with 2 options:
ROLAP is certainly an option that addresses the large data size and latency issues…but the cost comes in the form of average query response duration. Don’t get me wrong, there are a good number of ways to optimize for ROLAP (e.g. columnstore indexes, smart use of statistics) but in my opinion, these are best suited for scenarios where we already know the types of queries we need to optimize for.
- What if our query-load is dynamic?
- What if our concurrent user-load is heavy?
This post will discuss 2 architectures that can address all of these requirements.
Scale-Out Querying for Analysis Services with Read-Only Databases
The most common architecture for large-scale SSAS multidimensional solutions (and one that has been written about in great detail) is to separate processing and querying allowing for increased availability, lower data latency (i.e. more frequent processing), and higher query throughput which are all usually pretty desirable but more difficult to achieve as datasets grow and/or latency requirements become near real-time (NRT).
Below is a typical architecture diagram of this solution:
The idea is to have a beefy processing server for the backend and then a set of query-servers in a network load-balancing (NLB) cluster for the front-end. Each front-end query server contains 2 instances of analysis services – only one of which is active at a time. After a processing cycle completes, data is synched with the idle instance on the query servers (typically via a robocopy script). The frontend mechanism for balancing the query load across the query servers is updated to point incoming queries at the idle instance that was just updated making it the active instance.
The processing server is able to begin processing again immediately after copying the data to the idle instance…making the data latency equal to the amount of time it takes to process and copy new data. Combined with an incremental processing strategy, near real-time (i.e. minutes) data latency can be easily achieved.
Note: in scenarios where data sizes are large and latency requirements are low, there’s a risk of the processing server going down and the system getting behind on processing new data. In this scenario, it might be a good idea to consider setting up the processing server in a failover-cluster configuration for high availability (HA).
In this architecture, the processing server is scaled “up” (i.e. CPU, memory) while the query servers are scaled “out” (i.e. adding nodes). Figuring out how many query servers are needed becomes trivial once you measure the capacity of a single frontend query-server from the perspective of the average user load:
a = estimated number of users
b = user capacity of single query server
(# of query servers needed) = a / b
As your user base grows, you can add more query servers…very flexible!
The major limitation with this architecture is the ability of the processing server to process new data fast enough to meet the latency requirements. There are definitely some things you can do to tune the processing environment (e.g. thread configurations, memory configurations, disk, network) once you isolate the bottlenecks – check out this great post by Henk Vandervalk (b | t).
For more details on this architecture, check out the Scale-Out Querying for Analysis Services with Read-Only Databases whitepaper by Denny Lee and Kay Unkroth.
Scale-Out Everything – OLAP Farm
The OLAP Farm is an architecture to which I was first introduced in Microsoft SQL Server 2008 Analysis Services Unleashed and it really opened my eyes to the potential for **VERY** large scale SSAS solutions where datasets and latency requirements simply can’t be handled by a single processing server. That said, you really don’t see too much written on the subject, and I’ve yet to come across one in production at a client but it’s a fun thought exercise.
Below is an architecture diagram of this solution:
- The master server is at the heart and soul of this solution – containing all of the metadata about the SSAS database.
- Measure group partition data (the bulk of data in a multidimensional database) is spread across the backend servers via the remote partitions feature. This spreads the resources necessary for processing new measure group data across multiple servers. Additionally, depending on the configuration of the query servers on the frontend, it can also help spread the resources necessary for resolving queries (i.e. partition scans) across multiple servers.
- The query servers are setup in an NLB cluster and can make use of the linked dimensions and linked measure group features in order to leverage the resources on the master and the backend servers to resolve queries. Another option is to leverage the frontend architecture of the Scale-Out Querying for Analysis Services with Read-Only Databases section above (where new data is processed remotely and synch’d) which allows queries to be resolved locally. The primary difference is that the first option (i.e. linked measure groups and dimensions) spreads query resolution across multiple servers.
There’s quite a bit of flexibility in this design depending on the bottleneck:
Processing Performance: increased by scaling out the backend servers – each backend server can dedicate 100% of its resources to process new data for the subset of partitions it manages.
Query Throughput: increased by scaling out the query servers – incoming queries are distributed equally across the query servers in the NLB cluster. This provides for a very linear scale (e.g. if 1 server can handle the load of 25 users, 4 servers can handle the load of 100 users, etc)
Query Performance: increased by scaling out both backend and query servers. If query servers are configured with linked measure groups and dimensions, partitions scans (including SE cache) will be spread across the backend servers.
For more details on this architecture, check out Chapter 25 in Microsoft SQL Server 2008 Analysis Services Unleashed.
Two Birds, One Cup Stone
Ever since reading about the “OLAP Farm” I’ve been dying to stand one up in a lab and do some benchmarking. The major problem is resources. Even the geekiest of geeks will have trouble justifying the $$$ for a proper setup…well maybe not the geekiest of geeks
So instead of taking out a second mortgage, I chose to build in the cloud (/ewwws-and-ahhhhs) using Azure Virtual Machines – another technology I’ve been interested in spending some time getting to know. Plus, I get this sweet $150/mth credit via my MSDN subscription – which has simply been going to waste for quite a while now.
Stay tuned for future posts where I describe the process of standing up an OLAP Farm in the cloud.