Press enter to see results or esc to cancel.

SSAS Tabular: Scale-Out Query Architecture

Earlier this week, while discussing high-concurrency architecture options for SSAS multidimensional, a colleague asked about options for SSAS tabular and mentioned there was very little information available online specific to tabular…and so, a blog post is born.

The short answer is that Scale-Out Query Architecture for Tabular is essentially identical to that for multidimensional. And there’s already some pretty good material available on the internet for that (e.g. Scale-Out Querying with Analysis Services) so what follows is a simple adaptation specific for tabular.

Architecture Diagram

Below is a diagram of the basic architecture (click to zoom):


Here we have 5 servers (1 for processing, 4 for resolving queries) and 9 tabular instances (1 for the processing server, and 2 per query server).

Note: the query servers technically don’t need 2 instances and if you’re already pressed for resources (i.e. memory/cores) you might simply go with 1 tabular instance per query server.

To the far right you see the reporting layer which can be any number of reporting tools (as long as they’re compatible with Tabular) which point to a network load balancer or some other device/application that spreads queries from the reporting layer across the instances on the query servers.

Refreshing Data

The processing server (PS) contains a single Tabular instance (SSAS_PS) used to process the tabular model(s) after every ETL run. Once processing completes, the tabular model(s) are detached from the instance and robocopy scripts are used to copy the files from PS to the data directories associated with the standby instances(e.g. SSAS_QS<n>_B) on each of the query servers (QS01-QS04). Meanwhile, queries continue to come in from the reporting layer and are being load balanced across the active instances (e.g. SSAS_QS<n>_A).


Note: while tabular models reside in memory (which explains the very fast query response times) they also exist as files on disk. If that weren’t the case, anytime the server was rebooted or the SSAS service was restarted, the model would have to be reprocessed from source – and that wouldn’t really fly for a variety of reasons. This is why data is stored on disk – so that it can be simply loaded back into memory once the server or service comes back up (without having to re-query the source).

Once the data is finished being copied over, the standby instances on the query servers are started and the tabular model(s) are re-attached…which loads the new data (that was just copied from the processing server) into memory on the query server. The tabular model(s) for the active instances (e.g. SSAS_QS<n>_A) are detached and those instances can be stopped. The active instance is now SSAS_QS<n>_B and the standby instance is SSAS_QS<n>_A. At this point, the mechanism that distributes the queries across the pool of query servers (e.g. network load balancer) is updated to direct incoming queries to the new active instances (e.g. SSAS_QS<n>_B).


Note: the standby/active instance swapping can be handled a number of ways – mainly depending on the latency and/or memory requirements. For example, if you’re already tight on memory, its probably not feasible to have 2 copies of the tabular model(s) loaded into memory at the same time…which means you’re going to have to detach the active instance before you can load the passive one into memory. And if you don’t have a maintenance window and need to service queries 24×7 then you’ll need to roll through the query servers 1 by 1 which adds complexity to the automation scripts.

Query Performance

As is the case with multidimensional (for the most part), this architecture will NOT increase the speed of individual queries – so if that is your problem you will need to look at hardware (scale-up) or tune your queries and/or data model. However, if your issue is a concurrency bottleneck, then this architecture will absolutely help and the scalability is nearly linear. For example, if one server can support 20 concurrent users, then 2 servers can support 40 concurrent users with similar query performance. That’s just an example out of thin air…your numbers might/will be different. It all depends on the size of your servers (cpu/memory), the size of your model/data, the number and complexity of your queries, and the query response times you require…this is why a sizing exercise is so very important and should not be left until the end of the development phase Winking smile

Quick Word On Hardware

In case you were unaware, Tabular is not “NUMA-aware”…which simply means queries perform worse on hardware with multiple NUMA nodes. Alex Whittles (b | t) did a great job demonstrating this effect in the following blog post: SSAS Tabular Performance: NUMA Update. Good to know if you haven’t already purchased your hardware. And for those who were unfortunate enough to have already pulled the trigger and purchased large servers with tons of cores and multiple NUMA nodes…you can use advice from this post and run multiple SSAS tabular instances on the server (each instance affinitized to a single NUMA node).

Regardless of your situation, this document is a must read if you’re considering a solution involving SSAS Tabular.