Rapid DW Development with WhereScape RED
Earlier this month, I spent an entire week evaluating WhereScape RED with a new client. If you’ve never heard of it, RED is a platform for rapid data warehouse development. Admittedly, I’ve always been skeptical of these types of tools…”build your entire data warehouse and ETL in a few hours“…but after a week of “kicking the tires” I think my mind is opening up to the idea.
It started earlier this year when Doug Barrett (b | t), a solution architect with WhereScape and veteran of the data warehousing industry, “wow’d” this client at the PASS Business Analytics Conference. A quick demo and a few conversations later…Doug flew into Atlanta to lead an on-site sales demo/workshop with a few members of the DW/BI development team.
The workshop started off with a fairly in-depth demo…throughout which we stopped to ask questions and discuss relevant and hypothetical “can it do this” scenarios…which Doug answered/demonstrated with ease. By the end of the first day, everyone had installed a trial version of the software, pointed it at the development environment, and was poking around to get a feel for the tool and the process. Over the next 4 days, we used WhereScape RED to reproduce a few of the client’s existing solutions.
What follows is an outline of my thoughts about the tool…what it does well, what it doesn’t do well, etc.
Documentation and Impact Analysis
In my experience, lack of good documentation is one of the top 2 reasons for poor user adoption with DW/BI solutions (the other one is lack of proper training). The problem is that developers hate hate HATE documentation. It’s boring, mundane, and makes us bleed from the ears. Fortunately, with WhereScape RED, documentation can be generated, literally, with the click of a button…which means it can be updated after every single change!
Not only does it generate documentation for end-users, but it also generates documentation for DW developers and DBAs. For example, you can (again with the click of a button) generate impact analysis diagrams/reports which show the lineage of all the connected objects. So if, for example, I wanted to remove or rename a column in a source table, I can quickly generate a diagram that shows me which downstream objects are impacted and also need to be updated.
This is all possible because of the metadata! See, under the covers, RED is a metadata-driven development experience…every object created in the development environment (tables, columns, cubes, measure groups, etc) is persisted in a metadata repository. It isn’t until you explicitly “build” an object…that it gets created in the target environment. Unfortunately for Microsoft shops, there is no out-of-the-box metadata management utility for BI/EIM solutions…and “rolling your own” metadata management solution is VERY difficult.
There are quite a few places ripe for standardization in DW development…Dimension processing, Fact processing, etc. RED does a nice job of generating clean TSQL stored procs (with logging/auditing constructs) that accomodate the common loading patterns in the DW world (ex. Slowly Changing Dimensions). It also provides the developer with the flexibility to easily override the standard load procedure for any 1-off scenarios.
Partitioning is another area where RED shines. Partitioning is a very common part of DW development that is fairly simple to setup…but creating the procedures to manage certain partitioning strategies, such as the “sliding window” can be a real P.I.T.A. … especially in the SSAS analytical layer. Thankfully, RED provides a nice UI that allows the developer to configure the partitioning strategy with just a few clicks. RED then generates all the metadata needed to implement and maintain the partitioning strategy over time.
RED comes with it’s own scheduling service for scheduling jobs to deploy and/or process objects in the target environment. It roughly plays the same role as SQL Agent in SQL Server DW environments…but it has a bit more of a DW-focus. For example, you can easily establish (and visually review) “dependencies” between the various objects in the job so that your “load” tables are processed before your “stage” tables. This interface makes it very easy to create one-off processing jobs, to handle the odd-ball tasks that arise…like reloading a fact table after a slight modification to the source system or underlying schema without having to reprocess the entire warehouse.
The Not Good
ELT vs ETL
When loading a DW there are two main methodologies for loading data: ELT vs ETL. The decision on which methodology to use depends on architecture and rest of the resources in your environment (source systems resources, network resources, etc).
Unfortunately, RED only supports the ELT pattern where transformations and key lookups are performed via TSQL. This could be a deal breaker for some shops who require complex tranformations that are better handled with an application like SSIS and/or there isn’t enough the database (cpu, memory, disk space) resources to handle the transformations in TSQL.
Fortunately, this particular client is already running an ELT loading pattern and is way over-provisioned with hardware relative to the size of their data and processing requirements (so rare and so freaking awesome!) … so this isn’t going to be a problem for them.
When it comes to cubes, there are a couple of major limitations of which to be aware:
- Only Regular (one-to-many) relationships are supported. I can get over the fact that it doesn’t support reference relationships…but no many-to-many? C’monnnnnnnnnnnnn /Peter Griffin
- Does not support composite keys…which means you have to take extra special care when creating the dimension attributes if you want to be able to create a “natural” user hierarchy in order to take advantage of all the internal optimizations that SSAS provides for natural user hierarchies.
- There’s no UI for creating roles…they have to be scripted out via XMLA and added after the fact.
- Impact Analysis diagrams don’t carry through to calculated members in the cube. So you can’t see which calculated measures are affected by renaming a fact in the DW.
Despite the big kick in the nuts when it comes to cube development in RED, the documentation and impact analysis features alone would be worth the hassle of managing SSAS development in a separate environment (ex. SSDT + TFS). However, if your environment can’t support an ELT loading strategy, you might have a more difficult decision.