PSA: Stop Putting Sorts in SSIS Data Flows…
…or any fully-blocking transforms for that matter.
The primary function of SSIS is to move data from location A to location B with the option of performing transformations to the data along the way. That’s certainly an over simplification but it gives you the general framework from which to build.
Note: there are many secondary functions of SSIS all of which are outside the scope of this post
Data comes in from the source (i.e. Location A …which could be a text file, database, etc) and is placed into a memory buffer. When trying to visualize this I’ve found it helpful to think of a memory buffer as a table with a fixed number of rows. At some point, as more and more data comes in from the source, the current memory buffer will fill up. At this point, a new memory buffer is created and becomes the new target for incoming data. Meanwhile the previous memory buffer is sent down the pipeline to undergo whatever transformations lie ahead. Given this new context of memory buffers, you can think of SSIS as a stream of memory buffers flowing from location A to location B.
Note: the size of memory buffers is configurable and should be configured when seeking optimal performance. Chris Schmidt (b | t) wrote a nice blog on this a while back that does a good job of explaining the how and why.
The really cool thing about SSIS is that, when designed correctly, its blazing fast. The really UNCOOL thing about SSIS is that it can be difficult (and sometimes unintuitive) to design correctly…which makes it blazing SLOW!
One common mistake I see in packages designed by less experienced SSIS-developers is the misuse of blocking transformations (e.g. Sort, Aggregate, etc) in the data flow.
To illustrate why this is bad, let’s relate it to your morning commute. Picture yourself sitting in your car, cruising down the highway until you hit a toll gate. Now imagine you have to wait at that toll gate until every single other car heading down the highway gets to the same toll gate before you can proceed. Sounds HORRIBLE, right? Well that’s exactly how blocking transforms work in SSIS. The highway is the dataflow, the cars are memory buffers, and the toll gate is the blocking transform.
The performance impact of blocking transforms can go unnoticed for a long time as developers rarely work with production size datasets. Combined with the poor practice seen at many clients, of not load-testing until just before release to production (if at all)…and you have a recipe for disaster!
To drive home the evilness of blocking transforms, here’s a quick example from a recent performance tuning effort at a client.
The project started at the beginning of the year and the SSIS piece was completed sometime in May. After begging for a few days to load-test the solution prior to go-live, the client caved and gave us a
gameboy tiny virtual machine to load-test the solution. After getting the solution deployed to the new environment (which took very little time thanks to the new continuous integration paradigm this client adopted earlier in the year!) we fired it up against a backup from one of the production servers…
…and it immediately puked an error.
Ruh-Roh. This is a solution that had been running like butter in DEV/QA…what ever could be the issue?
Without going too deep, the error messages indicate that the SSIS solution was paging data to disk and the disk drive filled up. This diagnosis could be further confirmed with a quick check of the counters in the performance monitor (perfmon) trace we had running in the background.
Below is a quick look at available memory (Available Mbytes in blue) which starts to bottom out around the 6:40pm mark and hits rock bottom just before 6:48pm. At the same time, you see the page file (% Usage in green) getting slammed…
By default, when SSIS pages data to disk it writes to the location specified in the TEMP/TMP environment variables (which defaults to the C-drive). Unfortunately, we weren’t tracing the free space on the C-drive. But if we had been, we would have seen the available space (under logical disk: Free megabytes) collapse as the data was being paged out of memory on to disk.
Note: It would have been nice if we’d thought to include Memory\Committed Bytes to provide more context around the PagingFile\%Usage. See this post for more details.
If paging to disk was acceptable (hint: it’s not…ever) we could adjust the following data flow properties (or configure TEMP/TMP environment variables) to direct paging activity to bigger/faster disk drive: BLOBTempStoragePath, BufferTempStoragePath (details).
Paging to disk in SSIS is not acceptable because performance becomes abysmal. When it is noticed, troubleshooting should begin until the culprit is found and resolved. Typically, the culprit is a blocking transform in a data flow that moves a lot of data *or* an undersized SSIS server that may just need more memory. If it cannot be avoided by adding more memory then more drastic adjustments should be strongly considered (e.g. landing the data to a staging table and using a set-based ELT pattern instead of an ETL pattern).
In this case, the culprit was a bit of both…
- Blocking Transforms
There was a (fully-blocking) SORT transform at the very beginning of each major data flow…which means all the data (100s-of-millions of rows) had to be extracted from the source and placed into memory buffers before anything else can happen. This caused SSIS to start paging to disk after all available memory was consumed.
- Not Enough Memory
The server being used to load test this ETL was much smaller than the original architecture recommended for the production environment. According the architecture documentation, SSIS was supposed to be on its own server w/ 256GB of memory. At the last minute, in a move to reduce costs, the architecture was drastically changed to a single server solution running both the database engine and SSIS on the same box with 8 (physical) cores and 96GB of memory (after talking them up from 40GB of memory). Super risky change to make at the last minute.
Note: Having to convince the infrastructure team to give us more than 40GB of memory was an interesting conversation…
“Son, do you know how much 128 GB of memory costs?”
“Not exactly…but I know it’s a lot less than the $50-some-THOUSAND dollars you folks are going to be paying for the SQL Server licensing.”
Fortunately, this client was smart enough to allocate time for a load test prior to production (*cough* after learning a very hard lesson the previous year *cough*). So all of this was caught in advance.
There are a number of ways to handle this and a lot of it depends on specific details about your goals, environment, etc. In this case, the fix was simple. We just ripped the SORT transforms out of the dataflow and pushed it down into the source queries and marked the source components as “sorted” (details).
Just because your SSIS solution can process a 100MB dataset in under 1 min does NOT mean it will be able to process the 20 GB dataset in under 4 hours. It might…but if you have a fully blocking transform in the data flow and/or an undersized server, probably not. Extrapolation is not a substitute for real load-testing!
- Avoid the use of fully-blocking transforms in SSIS dataflows unless you’re absolutely certain about the amount of data won’t be a problem. If you can’t be certain of this, be sure to document your assumptions and include a quick note about design alternatives.
- Load-Test early! Load-Test often!
- Get a job w/out a commute