…brew your own Business Intelligence

Analyzing SQL Server Object Dependencies with NodeXL

One time I was tasked with developing a handful of custom reports for a legacy application.  The data model was an absolute nightmare – overflowing with technical debt and being held together with duct tape and views…tons and tons of views.  In this post, I’ll show you how NodeXL can be used to help navigate your way through a large and poorly documented data model.

Mapping the Schema

The first step in this type of scenario (after crying) is to grab another cup of coffee.  Once you’ve calmed down and are fully caffeinated  its time to tackle the data model.  If you’re lucky there will be documentation…and if you’re really lucky, the documentation will be up to date.  In this case there was no documentation and the schema was fairly large (hundreds of tables, views, etc) so I ended up hacking together a query based on this one from Chad Miller (b | t) that build a table of all the object dependencies in the database.

Below is an updated version of query:

[sql] SELECT DISTINCT
ParentObjectName = s.name + ‘.’ + OBJECT_NAME(referencing_id)
,ParentObjectType = o.type_desc
,ObjectName =
COALESCE(referenced_server_name + ‘.’,”) +
COALESCE(referenced_database_name + ‘.’,”) +
COALESCE(referenced_schema_name + ‘.’,”) +
referenced_entity_name
,ObjectType = o2.type_desc
FROM sys.sql_expression_dependencies AS sed
INNER JOIN sys.objects AS o
ON sed.referencing_id = o.OBJECT_ID AND
o.type_desc != ‘CHECK_CONSTRAINT’
INNER JOIN sys.schemas AS s
ON s.schema_id = o.schema_id
INNER JOIN sys.objects AS o2
ON sed.referenced_id = o2.object_id AND
o2.type_desc != ‘CHECK_CONSTRAINT’
;
[/sql]

And here is the output when run against the AdventureWorksDW2012 database:

click to zoom

click to zoom

As you can see the output resembles a parent-child hierarchy including object types.

To take things a bit further, we can run a second query based on the result set from the first:

[sql] ;WITH
CTE_ObjectReferenceHierarchy AS (
SELECT DISTINCT
ParentObjectName = s.name + ‘.’ + OBJECT_NAME(referencing_id)
,ParentObjectType = o.type_desc
,ObjectName =
COALESCE(referenced_server_name + ‘.’,”) +
COALESCE(referenced_database_name + ‘.’,”) +
COALESCE(referenced_schema_name + ‘.’,”) +
referenced_entity_name
,ObjectType = o2.type_desc
FROM sys.sql_expression_dependencies AS sed
INNER JOIN sys.objects AS o
ON sed.referencing_id = o.OBJECT_ID AND
o.type_desc != ‘CHECK_CONSTRAINT’
INNER JOIN sys.schemas AS s
ON s.schema_id = o.schema_id
INNER JOIN sys.objects AS o2
ON sed.referenced_id = o2.object_id AND
o2.type_desc != ‘CHECK_CONSTRAINT’
)
,CTE_DistinctOjects AS (
SELECT ObjectName = ParentObjectName
,ObjectType = ParentObjectType
FROM CTE_ObjectReferenceHierarchy
UNION
SELECT ObjectName = ObjectName
,ObjectType= ObjectType
FROM CTE_ObjectReferenceHierarchy
)
SELECT ObjectName = tmpDI.ObjectName
,ObjectType = tmpDI.ObjectType
,NumInboundRefs = ISNULL(ds1.NumInboundRefs,0)
,NumOutboundRefs = ISNULL(ds2.NumRefs,0)
FROM CTE_DistinctOjects tmpDI
LEFT OUTER JOIN
(
SELECT tmpOH.ObjectName
,NumInboundRefs = COUNT(*)
FROM CTE_ObjectReferenceHierarchy tmpOH
GROUP BY tmpOH.ObjectName
) AS ds1
ON ds1.ObjectName = tmpDI.ObjectName
OUTER APPLY(
SELECT tmpOH.ParentObjectName
,NumRefs = COUNT(*)
FROM CTE_ObjectReferenceHierarchy tmpOH
WHERE tmpOH.ParentObjectName = tmpDI.ObjectName
GROUP BY tmpOH.ParentObjectName
) AS ds2
;
[/sql]

…which will give us the following output:

click to zoom

click to zoom

This is a distinct list of objects where for each object, we are calculating the number of objects that reference it (NumInboundRefs) as well as the number of objects directly referenced by it (NumOutboundRefs).  These two attributes will be used in the next section when we load the data into NodeXL to visualize the object-dependencies as a graph.

Visualizing the Schema

NodeXL is a really cool add in for excel that makes it easy to visualize and explore network graphs.  And when exploring the data model where we have views on top of views on top of views…a graph visualization can prove very useful.

After firing up the NodeXL excel template, we are presented with a blank workbook that includes a few sheets:

click to zoom

click to zoom

In this case we only really need to deal with the Vertices and Edges worksheets.  But first let’s add a sheet called “Data” where we’ll store the 2 result sets from the first part of this post.

click to zoom

click to zoom

Notice in the screenshot above that I also included an additional column in there called “Size” that computes the product of Inbound and Outbound columns.  This will be used to dynamically size the vertices we define in the next step.

On the Vertices tab, we’ll go ahead and add the distinct list of objects and formulas for determining the color and size of each object:

click to zoom

click to zoom

And here are the formulas for color and size:

Color = IF(VLOOKUP([@Vertex],Data!$A:$B,2,FALSE)="VIEW","DarkBlue",IF(VLOOKUP([@Vertex],Data!$A:$B,2,FALSE)="USER_TABLE","Blue","Red"))
Size = (1000/MAX(Data!$E:$E)/2)*(VLOOKUP([@Vertex],Data!$A:$E,5,FALSE))

The color formula defines the color of the vertices based on the object type (VIEW, USER_TABLE, etc).  The size formula defines the size of the vertices based on the number of references (or the size column in the Data-worksheet).

Now we can define the Edges (or relationships between the vertices) by copying over the object/parent object relationships from the second table in the Data-worksheet:

click to zoom

click to zoom

Finally, we can switch over to the graph viewer on the right side of the Excel window and display the graph:

click to zoom

click to zoom

As you can see, we have a very clear visualization of the various objects and how they are interconnected.

Note: There are many layout option/patterns to choose from using the drop down menu, and in this case I’ve chosen the “Harel-Koren Fast Multiscaler” layout option.  I’ve also specified that I want to see a “directed” graph (as opposed to an “undirected” graph) which is just to say whether or not we want to see the edges with arrows indicating the direction of the relationship…does A reference B or vice-versa?

Just for fun, below is an image of the graph (Harel-Koren Fast Multiscale) for one of the legacy databases I’ve worked on:

click to zoom

click to zoom

And if we switch over to the grid layout:

click to zoom

click to zoom

…or a horizontal sine wave layout:

click to zoom

click to zoom

As you can see, these are interesting visualizations, but perhaps not very useful as static images.  Luckily, NodeXL provides functionality for interacting with the graphs…via clicking, filtering, grouping, transparency, etc…to aid the user in exploring the graph in ways that make the relationships, patterns, and information more easily discernible.

Conclusion

In this post, we covered how to extract object dependencies from a SQL Server database, load them into Excel, and visualize the objects and relationships using a free add-in called NodeXL (which you can download from codeplex).

Despite the fairly basic example in this post where color was determined by object type and size was determined by the number of inbound and outbound references there are plenty of other (more useful) applications.  For example, imagine doing some performance troubleshooting and using NodeXL to visualize queries from the plan cache that are color coded and sized based on relevant criteria such as resource utilization and duration.

If nothing else, I encourage you to download a copy and spend a few hours playing around with it. And, be sure and check out the graph gallery where you can see some pretty amazing visualizations put together by other much more talented than myself.

 

5 thoughts on “Analyzing SQL Server Object Dependencies with NodeXL

  1. Rik says:

    Have you used NodeXL to do a Dependency Modeling for risk assessment as proposed by the Open Group?

    1. Bill says:

      Hi Rik,

      No, admittedly my experience with NodeXL is little…and for “dependency modeling for risk assessment” it is nonexistent 😉

      Have you?

  2. Tony says:

    Simply brilliant. I too was tasked with modeling dependencies in a database and this worked like a charm. Thanks for the hard work! I’ll be donating to NodeXL shortly!

    1. Bill says:

      glad you liked it 😉

      If you’re interested in more from NodeXL – check out Marc Smith on twitter (@marc_smith)

  3. Ramon says:

    Very interesting. Is it possible to get in the objectname field the stored procedures called too?

Leave a Reply