Analyzing SQL Server Object Dependencies with NodeXL
One time I was tasked with developing a handful of custom reports for a legacy application. The data model was an absolute nightmare – overflowing with technical debt and being held together with duct tape and views…tons and tons of views. In this post, I’ll show you how NodeXL can be used to help navigate your way through a large and poorly documented data model.
Mapping the Schema
The first step in this type of scenario (after crying) is to grab another cup of coffee. Once you’ve calmed down and are fully caffeinated its time to tackle the data model. If you’re lucky there will be documentation…and if you’re really lucky, the documentation will be up to date. In this case there was no documentation and the schema was fairly large (hundreds of tables, views, etc) so I ended up hacking together a query based on this one from Chad Miller (b | t) that build a table of all the object dependencies in the database.
Below is an updated version of query:
SELECT DISTINCT ParentObjectName = s.name + '.' + OBJECT_NAME(referencing_id) ,ParentObjectType = o.type_desc ,ObjectName = COALESCE(referenced_server_name + '.','') + COALESCE(referenced_database_name + '.','') + COALESCE(referenced_schema_name + '.','') + referenced_entity_name ,ObjectType = o2.type_desc FROM sys.sql_expression_dependencies AS sed INNER JOIN sys.objects AS o ON sed.referencing_id = o.OBJECT_ID AND o.type_desc != 'CHECK_CONSTRAINT' INNER JOIN sys.schemas AS s ON s.schema_id = o.schema_id INNER JOIN sys.objects AS o2 ON sed.referenced_id = o2.object_id AND o2.type_desc != 'CHECK_CONSTRAINT' ;
And here is the output when run against the AdventureWorksDW2012 database:
As you can see the output resembles a parent-child hierarchy including object types.
To take things a bit further, we can run a second query based on the result set from the first:
;WITH CTE_ObjectReferenceHierarchy AS ( SELECT DISTINCT ParentObjectName = s.name + '.' + OBJECT_NAME(referencing_id) ,ParentObjectType = o.type_desc ,ObjectName = COALESCE(referenced_server_name + '.','') + COALESCE(referenced_database_name + '.','') + COALESCE(referenced_schema_name + '.','') + referenced_entity_name ,ObjectType = o2.type_desc FROM sys.sql_expression_dependencies AS sed INNER JOIN sys.objects AS o ON sed.referencing_id = o.OBJECT_ID AND o.type_desc != 'CHECK_CONSTRAINT' INNER JOIN sys.schemas AS s ON s.schema_id = o.schema_id INNER JOIN sys.objects AS o2 ON sed.referenced_id = o2.object_id AND o2.type_desc != 'CHECK_CONSTRAINT' ) ,CTE_DistinctOjects AS ( SELECT ObjectName = ParentObjectName ,ObjectType = ParentObjectType FROM CTE_ObjectReferenceHierarchy UNION SELECT ObjectName = ObjectName ,ObjectType= ObjectType FROM CTE_ObjectReferenceHierarchy ) SELECT ObjectName = tmpDI.ObjectName ,ObjectType = tmpDI.ObjectType ,NumInboundRefs = ISNULL(ds1.NumInboundRefs,0) ,NumOutboundRefs = ISNULL(ds2.NumRefs,0) FROM CTE_DistinctOjects tmpDI LEFT OUTER JOIN ( SELECT tmpOH.ObjectName ,NumInboundRefs = COUNT(*) FROM CTE_ObjectReferenceHierarchy tmpOH GROUP BY tmpOH.ObjectName ) AS ds1 ON ds1.ObjectName = tmpDI.ObjectName OUTER APPLY( SELECT tmpOH.ParentObjectName ,NumRefs = COUNT(*) FROM CTE_ObjectReferenceHierarchy tmpOH WHERE tmpOH.ParentObjectName = tmpDI.ObjectName GROUP BY tmpOH.ParentObjectName ) AS ds2 ;
…which will give us the following output:
This is a distinct list of objects where for each object, we are calculating the number of objects that reference it (NumInboundRefs) as well as the number of objects directly referenced by it (NumOutboundRefs). These two attributes will be used in the next section when we load the data into NodeXL to visualize the object-dependencies as a graph.
Visualizing the Schema
NodeXL is a really cool add in for excel that makes it easy to visualize and explore network graphs. And when exploring the data model where we have views on top of views on top of views…a graph visualization can prove very useful.
After firing up the NodeXL excel template, we are presented with a blank workbook that includes a few sheets:
In this case we only really need to deal with the Vertices and Edges worksheets. But first let’s add a sheet called “Data” where we’ll store the 2 result sets from the first part of this post.
Notice in the screenshot above that I also included an additional column in there called “Size” that computes the product of Inbound and Outbound columns. This will be used to dynamically size the vertices we define in the next step.
On the Vertices tab, we’ll go ahead and add the distinct list of objects and formulas for determining the color and size of each object:
And here are the formulas for color and size:
Color = IF(VLOOKUP([@Vertex],Data!$A:$B,2,FALSE)="VIEW","DarkBlue",IF(VLOOKUP([@Vertex],Data!$A:$B,2,FALSE)="USER_TABLE","Blue","Red"))
Size = (1000/MAX(Data!$E:$E)/2)*(VLOOKUP([@Vertex],Data!$A:$E,5,FALSE))
The color formula defines the color of the vertices based on the object type (VIEW, USER_TABLE, etc). The size formula defines the size of the vertices based on the number of references (or the size column in the Data-worksheet).
Now we can define the Edges (or relationships between the vertices) by copying over the object/parent object relationships from the second table in the Data-worksheet:
Finally, we can switch over to the graph viewer on the right side of the Excel window and display the graph:
As you can see, we have a very clear visualization of the various objects and how they are interconnected.
Note: There are many layout option/patterns to choose from using the drop down menu, and in this case I’ve chosen the “Harel-Koren Fast Multiscaler” layout option. I’ve also specified that I want to see a “directed” graph (as opposed to an “undirected” graph) which is just to say whether or not we want to see the edges with arrows indicating the direction of the relationship…does A reference B or vice-versa?
Just for fun, below is an image of the graph (Harel-Koren Fast Multiscale) for one of the legacy databases I’ve worked on:
And if we switch over to the grid layout:
…or a horizontal sine wave layout:
As you can see, these are interesting visualizations, but perhaps not very useful as static images. Luckily, NodeXL provides functionality for interacting with the graphs…via clicking, filtering, grouping, transparency, etc…to aid the user in exploring the graph in ways that make the relationships, patterns, and information more easily discernible.
In this post, we covered how to extract object dependencies from a SQL Server database, load them into Excel, and visualize the objects and relationships using a free add-in called NodeXL (which you can download from codeplex).
Despite the fairly basic example in this post where color was determined by object type and size was determined by the number of inbound and outbound references there are plenty of other (more useful) applications. For example, imagine doing some performance troubleshooting and using NodeXL to visualize queries from the plan cache that are color coded and sized based on relevant criteria such as resource utilization and duration.
If nothing else, I encourage you to download a copy and spend a few hours playing around with it. And, be sure and check out the graph gallery where you can see some pretty amazing visualizations put together by other much more talented than myself.