…brew your own Business Intelligence

Performance Monitoring for SSAS – Perfmon Counter Cheat Sheet

Welcome back! This is the 4th post in a series of posts focusing on performance monitoring for Analysis Services.

  1. Performance Monitoring for SSAS – Intro
  2. Performance Monitoring for SSAS – Data Collection
  3. Performance Monitoring for SSAS – Extended Events Cheat Sheet
  4. Performance Monitoring for SSAS – Perfmon Counter Cheat Sheet
  5. Performance Monitoring for SSAS – Extracting Information

In this post, we’re going to run through the list of Performance Monitor (Perfmon) counters that you’ll want to collect along with an explanation about the type of performance-related information the provide.

Note: The bulk of this (very long) post is based on knowledge I’ve obtained through (multiple) readings of the books, whitepapers, and blog posts written by top Analysis Services professionals (referenced at the bottom) as well as my own personal experience dealing w/ performance-related issues for many clients over the years. 

Due to the length of this blog post (/wiki) I’ve included some links to help navigate the sections. For those who are only interested in the list of counters, feel free to skip to the bottom where you can find a simple list of all counters discussed in this post.

Performance Monitor (Perfmon)

Perfmon is the tool used to analyze and understand the impact of our SSAS workload on the system resources. It works by capturing snapshots of resource utilization at the specified interval (e.g. every 15 seconds). This data allows us to see which resources are currently maxed out and causing a bottleneck for our workload. Similarly, when looking at this data over time, we can determine which resources are likely to become a bottleneck in the future.

CPU

There are a ton of counters related to CPU, but I’ve found the following handful to be more than enough for the majority of issues related to CPU pressure.

[Windows] Processor: % Processor Time
This counter shows processor utilization allocated to the entire system. It can be captured at the total aggregate CPU utilization (_Total) level as well as by individual (logical) CPU core. Its essentially the same values you see on the performance tab in task manager…

image

You should capture it at both levels and here’s why: performance issues can present themselves in several different ways from a CPU perspective.

For example, a slow query due to a bottleneck in the formula engine, will show up as a prolonged spike in a single CPU core – because the formula engine is single threaded. However, if you’re only capturing the total aggregate CPU utilization on a server that has more than 1 CPU core – this issue is not as obvious. On a server configured with an Intel® Xeon® Processor E5-2667 v2 w/ 8 physical cores (a nice choice for a tabular implementation @ 4GHz clock speed and 25MB L3 cache!) the total aggregate spike for an FE-bound query is less than 15%.

[Windows] Process (msmdsrv): % Processor Time
This counter shows processor utilization allocated specifically to the Analysis Services instance. This looks very similar to the first counter – Processor: % Processor Time – but there’s a slight and very important difference. Consider the scenario where SQL Server and SSAS are installed on the same server (not uncommon for smaller implementations). How do you determine the amount of CPU utilization due from database engine vs. the SSAS engine? The short answer is you can’t – not if you’re only collecting the first counter. Therefore, both of the counters above should be collected to determine a) what the CPU utilization is on the system but also b) how much of that CPU utilization is attributed to Analysis Services.

Note: This does not just apply to those scenarios where SQL Server & SSAS are installed on the same server…it can also help identify scenarios where some random third-party software/process is taking up CPU cycles. 

These next few counters are related to “threads” – which are strongly correlated with CPU resources. Without getting into the weeds of processor architecture, instruction cycles, pipelines, and OS implementation specifics it will be sufficient for us to think of threads within the following framework: servers have 1 or more processors (or sockets), each processor has 1 or more CPU cores (usually 4+ for server class processor), and each CPU core can only process work for a single thread at a time. So the limit of (true physical) parallelism is the number of physical CPU cores. Got it? Good – now you don’t need to waste 3 years taking classes for a computer engineering degree only to realize you’d rather be writing code and designing databases.

[Windows] System: Processor Queue Length
This counter shows how many threads are ready to start doing work but can’t because there aren’t any available CPU cores for them to use – hence they are waiting in a queue.

Note: According to Microsoft, having a value greater than zero is ok and to be expected on busy systems with multiple processors but if this counter shoots up for a sustained period of time, you’ve got a CPU bottleneck.

Analysis Services manages the work it has to do using the thread pool pattern – which is a method for efficiently allocating a large number of tasks across a smaller and finite number of resources. When a query comes into the server, it is broken up into a set of “jobs” needed to satisfy the request: 1 job to parse the query, another job to manage the formula engine activity, and another (sometimes several – depending on the design) to managed the storage engine activity like retrieving data from partitions. Before each job can start, it goes to the corresponding thread pool and requests a worker thread. For example, the jobs responsible for managing FE activity request threads from the query pool, while the jobs responsible for scanning partition data request threads from the process IO pool.

When the system isn’t very busy, threads are handed out upon request – no problem. However, if the system is really busy, a thread pool may have already allocated all of its threads to other jobs in which case the job will have to wait in queue until one of the earlier jobs finishes up and returns its thread to the pool.

Note: the max thread limit for each pool is configured in the msmdsrv.ini file (details). But if you just want to determine the current limits on an existing system, check out the msmdsrv.log file.

image_thumb1

Knowing when jobs are queued up waiting for threads from a particular job pool is very helpful for identifying and resolving performance bottlenecks. Fortunately, we have the following performance counters which tells us when this is happening…

[SSAS] Threads: Query pool job queue length
This counter captures the number of jobs waiting for a thread from the query pool. Threads from the query pool are used for activity that takes place in the formula engine.

[SSAS] Threads: Processing pool job queue length
This counter captures the number of jobs waiting for a thread from the processing pool. Threads from the processing pool are used for write-activity in the storage engine during processing workloads as well as certain portions of the query-workload for multidimensional cubes with ROLAP partitions.

[SSAS] Threads: Processing pool I/O job queue length
This counter captures the number of jobs waiting for a thread from the processing IO pool. Threads from the processing IO pool are used for read-activity in the storage engine during query workloads (e.g. dimension scans, partition scans, aggregation scans).

Note: Prior to SSAS 2012, the Processing pool managed threads for all storage engine activity: both reads during query execution and writes during processing operations. However, as part of the effort to improve performance on high-end servers (i.e. those with multiple NUMA nodes and tons of CPU cores) the engineering team decided to break off the storage engine (read) activity into a new pool called the processing IO pool. More details on this can be found here.

While detecting thread bottlenecks is trivial (thanks to these perfmon counters), diagnosing the issue is quite the opposite.

For example, if you see a bottleneck at the query pool (i.e. a query pool job queue length > 0 for a sustained period of time), its easy to think the solution is to simply add more CPU cores. While that may resolve the issue in some cases, I’d argue (at least from my own experience) those are the minority of cases. A better approach is to take a step back and consider the bigger picture.

CPU is expensive – licensing alone for adding 8 cores to a server running enterprise edition is going to cost just north of $50k. That’s a lot of money so you’ll want to be damn sure that’s going to solve the problem if/when you recommend it to the client.

In this particular case, I’d also want to check the CPU utilization, the level of concurrency, and the average query duration. Basically, I want to know what my query throughput is. If CPU is high (i.e. > 70%) and I’m seeing good throughput (lots of queries running quickly) then yes, adding CPU cores will probably help (assuming there’s also additional capacity in terms of memory and disk IO to support additional concurrency in the query workload). 

On the other hand, if CPU utilization is low (i.e. < 35%) then its not that cut and dry. For example, if the workload is such where most of the queries are severely storage engine bound, then spending +$50k to add 8 more CPU cores to the server is going to be only marginally more effective than lighting that money on fire. Perhaps you don’t have enough memory and the SE requests are frequently having to go to disk and scan partitions because the data they need is not cached in physical memory. Do you have the right partitioning strategy? Are there aggregations that can be added? If it turns out the storage solution is simply too slow – for $50k you can purchase a pretty sweet Fusion IO storage solution that should greatly increase the performance and throughput of your workload and still have enough money left over to backpack through Europe for a few weeks!

MEMORY

Memory issues are very easy to spot. And now that most (all?) servers are running 64 bit OS and memory prices are at historical lows, it’s rare to see a memory-related bottleneck where the cheapest solution isn’t to simply buy more memory. Nevertheless, there are still many companies who push back and demand proof that adding memory is going to resolve the issue.

[SSAS] Memory: Memory Usage KB
This counter shows how much (virtual) memory is currently reserved for Analysis Services.

Note: virtual memory = physical ram + pagefile. Analysis Services makes requests for virtual memory and it is up to the operating system whether the allocation is made from physical memory or the pagefile.

If you can only capture a single memory-related counter for monitoring Analysis Services performance, this is the one. Armed with this value, the instance-level memory limits, and a general understanding of how Analysis Services behaves at different stages of memory pressure, you can make some fairly reasonable/accurate assumptions about what is happening under the covers in terms of memory pressure and related performance issues.

[Windows] Process: Working Set (msmdsrv)
This counter shows how much physical memory (in bytes) is being used to hold Analysis Services objects. The value will (typically) be less than the value from [Memory: Memory Usage KB] …the difference between the two represents the amount of space used by objects that have been paged out of physical memory into the pagefile on disk.

It’s not necessarily a bad thing to have Analysis Services objects in the pagefile…just know that if these objects are needed again (and they might not be), they’ll have to read from disk which is much slower than if they were sitting in physical memory.

Note: The pagefile is a physical file on disk that serves as an extension of physical memory. Together, physical memory and the page file represent the virtual memory for a windows server. For example, my laptop has 16GB of physical memory and a page file is 2.4 GB (according to WinDirStat) which means I have ~18.4GB of virtual memory – which we can confirm on the performance tab of task manager…

memory - virtual

For a more thorough explanation of virtual/physical memory allocations with regards to Analysis Services, I strongly recommend you check out chapter 10 (Monitoring Cube Performance and Usage) in Expert Cube Development with SSAS Multidimensional Models.

[Windows] Memory: Pages/sec
This system-wide counter captures the number of pages read from or written to disk per second – to resolve hard page faults.

Earlier, I mentioned Analysis Services memory objects sometimes get paged to disk (i.e. to the pagefile) but that its not a performance issue unless those objects are used again and need to be brought back into physical memory. Well, this is the counter that can provide insight as to whether those pages are being used again.

For example, during periods of low activity (e.g. over night when no reports/queries are executed) the difference between [Memory: Memory Usage KB] and [Process: Working Set] may increase. This is because the Analysis Services objects in physical memory aren’t being used and the OS may decide to page them to disk (i.e. to the pagefile) in order to free up physical memory for other purposes (e.g. backups, virus scans, etc). However, once Analysis Services activity starts to pick up again (e.g. in the morning as users start running reports), the gap between [Memory: Memory Usage KB] and [Process: Working Set] will start to decrease and you should see a corresponding spike in [Process: Pages/sec] as objects are read back into physical memory from the pagefile. Eventually, things should stabilize (once most Analysis Services objects have been read back into physical memory) and this counter will drop back down. If this counter remains high during extended periods of Analysis Services activity, then that’s a pretty good indication that the server could benefit from adding more physical memory.

[Windows] Memory: Available MBytes
This counter shows how much physical memory is available on the server. When looking at resource monitor (below), this value is displayed as “Available” in the list at the bottom (Window 2012) and represents the sum of standby (dark blue) and free (light blue)…

memory - physical

In scenarios where Analysis Services is under memory pressure, this counter can provide clues as to whether (or not) there’s room to bump up those Analysis Services memory limits. Another issue to watch out for is when Analysis Services may need more memory, but doesn’t get it because this value is too low… we’ll see how to detect and triage this in a subsequent post.

Note: for a more thorough discussion on setting Analysis Services memory limits and recommended settings, start here.

[Windows] Memory: Free & Zero Page List Bytes
This counter captures the amount of physical memory immediately available on the system (i.e. the light blue section labeled “Free” in the image above).

If you subtract this counter value from [Memory: Available MBytes] you get the amount of physical memory currently allocated to the file system cache (i.e. the blue section labeled “Standby” in the image above).

The file system cache is physical memory reserved for frequently accessed files and it typically helps performance. However, in some cases (most commonly where cube size is much greater than physical memory) the file system cache can grow too large and start forcing forcing Analysis Services objects to be paged out of physical memory to disk. If you see this, then you will want to consider limiting the size of the file system cache via the LimitSystemFileCacheMB property in the msmdsrv.ini file.

Note: See section 3.1.3.2 Memory Setting for the Analysis Service Process in the SQL Server 2008 R2 Analysis Services Operations Guide for more information on this property.

This next group of counters represent the values assigned to the corresponding instance-level properties. Having these included in the trace isn’t necessary but it makes life easier when analyzing memory usage over time especially if you’re tweaking/optimizing the limits and benchmarking various workloads.

[Windows] Memory: Cache Bytes
[Windows] Memory: Standby Cache Normal Priority Bytes

When it comes to the file system cache, there are 2 flavors: active and standby. Active, represented by [Cache Bytes], is physical memory in active use. Standby, represented by [Standby Cache Normal Priority Bytes] is also physical memory but classified as available for use by another process if a request is made.

[SSAS] Memory: Memory Limit Low KB
[SSAS] Memory: Memory Limit High KB
[SSAS] Memory: Memory Limit Hard KB
[SSAS] Memory: Memory Limit VertiPaq KB

Note: Memory Limit VertiPaq is specific to tabular instances and has different implications depending on the value assigned to the [VertiPaq Paging Policy] instance-level property. Marco Russo has a great post explaining the details (here).

Having Analysis Services memory usage above the Low-limit is not necessarily a bad thing – just as having periods of 100% CPU utilization is not necessarily a bad thing. However, if you’re consistently seeing Analysis Services memory usage well over the low limit and approaching the high limit then you’ll want to consider taking action.

Below is an example of a system where SSAS memory usage (black line) crosses both the Low (blue line) and High (red line) memory limits. This is very bad for performance – i’ll explain why shortly. But the take away is that this server (shown below) needs more memory – period.

image

Analysis Services has 2 types of memory allocations: shrinkable or non-shrinkable. You can get a breakdown of each using the counters below.

[SSAS] Memory: Cleaner Memory Shrinkable KB
[SSAS] Memory: Cleaner Memory Nonshrinkable KB

Shrinkable memory allocations are created for things like the formula engine cache, storage engine cache, dimension data (which also gets cached), etc. Non-shrinkable memory allocations are created for things like metadata, active sessions, temporary objects used to during query activity, etc. The big difference between these 2 types of allocations comes into play when Analysis Services begins to run low on memory.

When Analysis Services memory utilization crosses the configured Low-limit, a process known as the “cleaner” is woken up and starts “evicting” older/lesser used objects out of memory on a periodic basis to make room for new objects. From an allocation type perspective, the key point is that only shrinkable memory objects can be evicted by the cleaner. Non-shrinkable memory objects stick around.

Note: the explanation above is a gross oversimplification of how Analysis Services internally manages memory. If you’re interested in a deeper explanation (and I highly recommend it because the memory management model is super elegant) check out the chapter on memory management in SQL Server 2008 Analysis Services Unleashed.

[SSAS] Memory: Cleaner Current Price
This counter is an indication of how aggressively the “cleaner” process is working to free up memory. Basically, it starts at zero until memory usage reaches the Low-Limit and then increases if memory usage continues to climb.

DISK

Regardless of the storage technology in place (i.e. spinning rust, SSDs/flash, or some other fancy array that gets you invited to steak & lobster parties at conferences) you will want to keep an eye on disk performance as its a common bottleneck in multidimensional solutions.

Note: Even though its an in-memory technology, Tabular instances also depend on the storage subsystem during certain activities such as processing, backups/restores, as well as after server reboots when the models are read from disk back into memory. And while I don’t expect to see many disk-IO related bottlenecks with Tabular instances, it is possible.

In order to be effective, you’ll also need to understand the performance capabilities/limits of the underlying storage subsystem in terms of latency (typically measured in milliseconds) and throughput (e.g. MB/s). Without this information its going to be nearly impossible to determine whether there’s an actual problem (e.g. configuration) with the storage subsystem or you’ve simply reached its limits.

When evaluating the disk/storage performance, the first thing to look at is latency – how long does it take to read/write data. If latency is low (i.e. < 20ms or whatever your storage admin said to expect) we have nothing to worry about – life is good. On the other hand, if latency is high we need to look at throughput (i.e. how much data is being transferred per time interval).

If we see high latency and high throughput, then the only way to improve is to reduce the IO load or increase the bandwidth capacity of our storage solution. For a query workload, we can reduce the IO load by adding aggregations, using partitions so that our queries are scanning less data on average, or use a cache-warming strategy to reduce the number of queries having to go to disk during business hours. Increasing the bandwidth capacity of our storage solution (e.g. striping across more drives, short stroking the disks, moving to flash, etc) is also an option, but usually a last resort (and technically something we should have caught during initial benchmarking).

If we see high latency and low throughput, then we may have a configuration issue (e.g. wrong/old driver) and need to engage the storage admin.

[Windows] LogicalDisk: Avg Disk Sec/Transfer
This counter captures the average disk transfer duration in seconds for both reads and writes. If you were limited to collecting a single perfmon counter to measure disk performance, this is the one. It does not factor in the size of the transfer – only the speed – and is used to measure latency.

When setting up this counter, you will want to specify the (logical) drives associated w/ your analysis services data files.

logical disk - avg sec per transfer

Note: Alternatively you can capture [Avg Disk Sec/Read] and [Avg Disk Sec/Write] which captures the same information but breaks it down into reads vs writes. However, the reality is this… unless your cubes are being processed in the middle of the day while users are running reports (not as common as you might think) the majority of the disk transfers are either going to be reads (during query workloads) or writes (during processing workloads) – so as long as you know which workload you’re reviewing you can make assumptions about which type of transfer (read vs write) is driving this counter.

[Windows] LogicalDisk: Disk Bytes/Sec
This counter captures the overall disk throughput in bytes per second for both reads and writes. It essentially tells you how much load (in term of data transfers) is being placed on the disk subsystem and makes it easy to determine how close you are to the theoretical maximum – or how far away 😉

Note: Alternatively you can capture [LogicalDisk: Disk Read Bytes/Sec] and [LogicalDisk: Disk Write Bytes/Sec] which captures the same information filtered for reads or writes.

[Windows] Process (msmdsrv): IO Data Bytes / Sec
This counter captures the overall IO (not just disk IO, but also network IO) throughput in bytes per second for both reads and writes for Analysis Services. Alternatively you can capture [Process (msmdsrv): IO Read Bytes/Sec] and [Process (msmdsrv): IO Write Bytes/Sec] which captures the same information filtered for reads or writes.

Subtracting [LogicalDisk: Disk Bytes/Sec] from this value provides a rough estimate of the IO being satisfied by the file system cache – which is MUCH better than having to actually go to disk. For example, during a query workload, if you see this counter with a value of ~200MB/sec while [LogicalDisk: Disk Bytes/Sec] is only showing ~5MB/sec, then you can be fairly confident those queries are benefiting immensely from the file system cache.

Note: For more info on why the Windows file system cache kicks is a great thing for Analysis Services, check out the following two posts: Why doesn’t SSAS cache the entire cube? by Richard Lees and Analysis Services and the Windows File System Cache by Greg Gonzalez

[Windows] LogicalDisk: Free Megabytes
[Windows] LogicalDisk: % Free Space
These counters capture the amount (or percent) of free space available on your (logical) drives. Obviously this isn’t a “must-have” from a performance monitoring perspective – other than the fact that performance will definitely suffer if you run out of disk space 😉

NETWORK

Network resources come into play during query workloads when sending the result sets back out across the network to users. This is rarely a problem unless your users are generating queries that return a large number of cells. The more common place to find a bottleneck is during processing workloads where the data source is located on a separate server and large amounts of data has to travel across the network during processing.

Note: network resources are were used in scale-out architectures involving linked measure groups, linked dimensions, and remote partitions. However, those features are were rarely used and have been marked as deprecated since Analysis Services 2014.

[Windows] Network Interface: Bytes Total/sec
This counter tracks the rate at which data is sent and received by the server over each network adapter.

Similar to hard drives, network cards have specified limits. If you find your processing workloads are being slowed down due to a network resource bottleneck, adding an additional network card and implementing NIC-teaming might be a good option.

Note: optionally we can break this data out by Sent vs Received using the following counters: [Network Interface: Bytes Sent/sec] and [Network Interface: Bytes Received/sec]. 

OTHER

In addition to all of the resource-specific performance counters above, here are a few more that I recommend including…

[SSAS] Connections: Current User Sessions
[SSAS] Connections: Current Connections
These counters capture the number of open user sessions and network connections.

Note: In most cases there is a 1-to-1 ratio of sessions to connections (i.e. a connection is opened for each session; each session sends commands over its corresponding connection). However, it is possible (though fairly rare) for an application to create multiple sessions over the same connection. I’m not fully versed on all pros/cons of this approach – but it is a thing.

Keep in mind, seeing a high number of users connected to the Analysis Services instance is not a bottleneck in and of itself. However, it does provide additional context which can be helpful when combined with other cues such as Query Pool Thread counters.

[SSAS] Processing: Rows Processed/Sec
[SSAS] Processing: Rows Read/Sec
[SSAS] Processing: Rows Written/Sec
These counters display the number of rows being read/written/processed per second and provide a very easy way to benchmark the performance of a processing workload. Sure, there are other parts of a processing workload (e.g. building indexes and aggregations) that aren’t picked up through these counters. However, for most processing workloads, the bulk of the duration is associated with reading rows from the source (sometimes over the network) and the writing rows to the buffer and disk.

Note: Henk van der Valk has one of the best blog posts on optimizing SSAS cube processing performance. It’s a 2-part series and absolutely worth a read: part1, part2. Even though these blog posts focus on multidimensional cubes, most of the optimizations are also applicable to Tabular models.

[SSAS] Proc Aggregations: Rows Created/Sec
This counter displays the number of rows being created per second for aggregations during a processing workload. It’s similar to [Processing: Rows Processed/Sec] but only apply to aggregations – as opposed to base partition data.

[SSAS] Proc Indexes: Rows/Sec
This counter displays the number of rows per second being read from MOLAP stores to create indexes during a processing workload.

[SSAS] Storage Engine Query: Calculation cache hits/Sec
[SSAS] Storage Engine Query: Calculation cache lookups/Sec
[SSAS] Storage Engine Query: Flat cache hits/Sec
[SSAS] Storage Engine Query: Flat cache lookups/Sec
These counters capture the number of calculation and flat cache hits (and lookups) per second. From this data we can determine and track formula engine cache utilization during query workloads.

[SSAS] Storage Engine Query: Dimension cache hits/Sec
[SSAS] Storage Engine Query: Dimension cache lookups/Sec
[SSAS] Storage Engine Query: Measure group cache hits/Sec
[SSAS] Storage Engine Query: Measure group cache lookups/Sec
These counters capture the number of dimension and measure group cache hits (and lookups) per second. From this data we can determine and track storage engine cache utilization during query workloads.

A cache hit occurs when the data needed to satisfy (some portion of) a query is sitting in one of the Analysis Services cache stores. Cache hits are good – you want as many as possible – the alternative is a partition scan which typically takes much longer. The difference in throughput for a query-workload with a high cache hit ratio vs a similar query-workload with a low cache hit ratio can be several orders of magnitude.

Note: On a busy/production system it will be nearly impossible to associate the cache lookups and hits represented by these counter values to any particular query or queries. For that level of detail, we must turn to our trusty xEvents trace.

 

Observer Overhead

Before wrapping up this post we need to touch on an important and often misunderstood detail regarding the sample interval of perfmon counters. A shorter sample interval between counter readings gives us better/higher resolution when measuring the impact of the workload on system resources. Put another way, it allows us to more accurately attribute commands in the workload with spikes in resource utilization. That’s a good thing.

On the flip-side, it takes system resources to capture data about system resources. That’s right, there’s a “cost” – in the form of CPU cycles, memory objects, and IO – associated with capturing perfmon counter values. This cost is commonly referred to as “observer overhead” and can be a bad thing if left unchecked.

Note: Jonathan Kehayias wrote a nice blog post on SQLPerformance.com showing a method for measuring observer overhead. While the post is focusing on the SQL Server database engine, the same technique can be used for Analysis Services. Another solid (though much more in depth) read on observer overhead is Overhead Analysis by Aaron Bertrand @ SQL Sentry.

As I’ve mentioned before, SQL Server licensing is not cheap. Therefore, it’s important to minimize the amount of CPU (and other resources) used for performance monitoring or any activity not directly related to providing business value. At the same time, if we don’t collect performance counters often enough, then we won’t have the resolution necessary to identify and attribute bottlenecks to various commands in our workloads.

Bottom line, we must strike a balance between observer overhead and trace resolution to be effective.

Summarized Counter List

  • CPU
    • [Windows] Processor: % Processor Time
    • [Windows] Process (msmdsrv): % Processor Time
    • [Windows] System: Processor Queue Length
    • [SSAS] Threads: Query pool job queue length
    • [SSAS] Threads: Processing pool job queue length
    • [SSAS] Threads: Processing pool I/O job queue length
  • MEMORY
    • [SSAS] Memory: Memory Usage KB
    • [Windows] Process: Working Set (msmdsrv)
    • [Windows] Memory: Pages/sec
    • [Windows] Memory: Available MBytes
    • [Windows] Memory: Free & Zero Page List Bytes
    • [Windows] Memory: Cache Bytes
    • [Windows] Memory: Standby Cache Normal Priority Bytes
    • [SSAS] Memory: Memory Limit Low KB
    • [SSAS] Memory: Memory Limit High KB
    • [SSAS] Memory: Memory Limit Hard KB
    • [SSAS] Memory: Memory Limit VertiPaq KB
    • [SSAS] Memory: Cleaner Memory Shrinkable KB
    • [SSAS] Memory: Cleaner Memory Nonshrinkable KB
    • [SSAS] Memory: Cleaner Current Price
  • DISK
    • [Windows] LogicalDisk: Avg Disk Sec/Transfer
    • [Windows] LogicalDisk: Avg Disk Sec/Read
    • [Windows] LogicalDisk: Avg Disk Sec/Write
    • [Windows] LogicalDisk: Disk Bytes/Sec
    • [Windows] LogicalDisk: Disk Read Bytes/Sec
    • [Windows] LogicalDisk: Disk Write Bytes/Sec
    • [Windows] Process (msmdsrv): IO Data Bytes / Sec
    • [Windows] Process (msmdsrv): IO Read Bytes / Sec
    • [Windows] Process (msmdsrv): IO Write Bytes / Sec
    • [Windows] LogicalDisk: Free Megabytes
    • [Windows] LogicalDisk: % Free Space
  • NETWORK
    • [Windows] Network Interface: Bytes Total/Sec
    • [Windows] Network Interface: Bytes Sent/Sec
    • [Windows] Network Interface: Bytes Received/Sec
  • OTHER
    • [SSAS] Connections: Current User Sessions
    • [SSAS] Connections: Current Connections
    • [SSAS] Processing: Rows Processed/Sec
    • [SSAS] Processing: Rows Read/Sec
    • [SSAS] Processing: Rows Written/Sec
    • [SSAS] Proc Aggregations: Rows Created/Sec
    • [SSAS] Proc Indexes: Rows/Sec
    • [SSAS] Storage Engine Query: Calculation cache hits/Sec
    • [SSAS] Storage Engine Query: Calculation cache lookups/Sec
    • [SSAS] Storage Engine Query: Flat cache hits/Sec
    • [SSAS] Storage Engine Query: Flat cache lookups/Sec
    • [SSAS] Storage Engine Query: Dimension cache hits/Sec
    • [SSAS] Storage Engine Query: Dimension cache lookups/Sec
    • [SSAS] Storage Engine Query: Measure group cache hits/Sec
    • [SSAS] Storage Engine Query: Measure group cache lookups/Sec

Wrapping Up

Now that we have our list of performance counters and extended events to collect we’re ready to dive in and start analyzing all this data and . In the next post we’ll look at how to do just that: extract information from our perfmon and extended events traces, correlate data between the 2 data sets, and find existing bottlenecks. We’ll also spend some time discussing and showing how to setup proactive alerts for key trends (e.g. memory usage, cpu usage, etc) in order to detect if (and estimate when) we need to make adjustments to avoid potential bottlenecks in the future.

In the meantime, go ahead and start playing around with this stuff in your development environments. As the saying goes, use it or lose it!

Also, if you’re interested in automating this process (i.e. collecting perfmon data) here’s a link to get you started.

Note: If you’re feeling overwhelmed, it’s totally ok. There was a lot of information in this post and it may take some time to digest if you haven’t don’t have a lot of experience dealing with Analysis Services performance issues for a while. Don’t get discouraged. How do you eat an elephant? One bite at a time 😉

Additional References

7 thoughts on “Performance Monitoring for SSAS – Perfmon Counter Cheat Sheet

  1. DC says:

    Hi, has post number 5 been written yet?

    1. Bill says:

      afraid not…had to put down the draft copy when life/(paid)work got in the way. posts since then have been quickies just to keep the blog alive. I’m hoping to have part 5 completed by late September.

  2. Steven Neumersky says:

    Having read all these references in the past, I can tell you one thing…..there is zero substitute for experience 🙂

    1. Bill says:

      couldn’t agree more 🙂

Leave a Reply