Quickly Analyze Large Objects in #SSAS Databases with WinDirStat
There are tons of solid free tools out there that make life much easier for SSAS developers. A while ago, I blogged about the Top FREE tools for SSAS 2012 Developers most of which I still use on a daily basis. But even that list is getting a bit stale with some new heavy hitters making a splash recently… I’m looking at you DAX Studio 2.x …and is in much need of an update.
In the meantime, I wanted to get a quick post out with a tool that I’ve been using for many years now, but never in conjunction with Analysis Services. In fact, the only time I’ve used it in the past was when trying to free up disk space on my workstations and the occasional server. The tool is called WinDirStat which stands for Windows Directory Statistics. You can read up on the tool (here) before downloading it (here)…but the quick’n dirty is that this utility will crawl a hard drive (or specific folder) and create a hierarchical report (and interactive heat map) based on file sizes.
Below is a screenshot after running the utility on the C-drive of a VM I’m using. That big red square is a SQL Server data file (hey- it’s a DEV environment…save the spiel about sql data/log files on OS drives).
WinDirStat is a self contained executable (which pleases my inner-OCD-spirit) and is insanely simple to use…
You can even use it to map out a specific folder for which I’ve taken the liberty to point at one of the db directories in the Data directory of my local SSAS Tabular instance…
And just a few seconds later, we have the screenshot below…
The upper left window provides a hierarchical breakdown of folder/files in the target directory. Below is a closer look.
To the right (in the same upper-left window) are details providing various bits of information such as the absolute size of the folder contents, percent of total, and number of child items (broken out by folders and files which you can’t see in the screenshot above). You can also expand the nodes on the left to drill into the subfolders. For example, in the screenshot above, the folder corresponding to the ICD10 table is selected. This table has files that add up to 8.5MB in size. Being familiar with this model, I know that the majority of that size is due to the dictionary file for a large string-valued column in the table.
The upper right window (shown above) provides a breakdown of the file types and is always from the perspective of the highest level (simply meaning its not just looking at the files in the currently selected level in the upper left window).
Both of those screens are very informative but the way I prefer to use this tool is to drive the analysis from the heat map visualization in the bottom window…
Clicking on a block in the heat map auto-magically navigates to the exact file in the upper left window so you can quickly see which folder it rolls up under. You can also take a quick glance at the upper right window to see what type of file each block represents (they’re color-coded!).
Just the other day, I used this tool to discover a single column in my fact table was responsible for nearly 65% of the entire table size. Turns out this column was just an identity column for the clustered index in the source database and made its way over to the tabular model and isn’t being used in any of the measures or relationships so I could simply delete that column and reduce my model size by a substantial amount.
Note: when it comes to Tabular models, the file size on disk can be used to approximate the size in memory, but it’s definitely not 100% accurate. As a reference, I used Vertipaq Analyzer (which analyzes the size of tabular models in memory from queries against the DMVs) to review the object size in memory and found that the differences were only off by a very small amount.
- WinDirStat is a handy tool for mapping out the space-hogs in your Analysis Services databases.
- While this post focused on Tabular please rest assured it can also be used in a similar fashion for Multidimensional.
- There are other tools out there that may be more appropriate depending on accuracy-requirements and type of analysis you are looking to perform. Vertipaq Analyzer is a great example and absolutely worth checking out.