Categories
Uncategorized

Extracting Facebook Data With SSIS

This is the 3rd post in this quasi-series on Facebook data from a business intelligence perspective.  First, we discussed the value of customer analysis and how integrating Facebook data can enrich such analysis.  Next we established a basic understanding of the Facebook social graph from which we wish to extract customer data.

Now, with that out of the way, we are ready to tackle the process of extracting Facebook data with SSIS.

Overview

As we covered in the previous post, Facebook data is accessed via the Graph API web service.  Requests for data are submitted and JSON responses are returned that contain the requested data.  So, as an SSIS developer tasked with extracting data from the Facebook social graph, you’re going to need to design a SSIS package/project that can format and send requests to the Graph API web service and parse/flatten the JSON response object returned.

If you’re handy with .NET programming, this should be no problem.  However, if you’re “.NET-challenged” (like me) then you’re stuck hacking something together based on examples from the internet or taking the 3rd party route.  In this post, I’m going to use the 3rd party Facebook SSIS components from RSSBus to extract data from Facebook and join it up to a customer dimension.

Note: RSSBus tools are very easy to use (as you will soon see). When compared with a custom solution on the basis of turn-around time for standing up a similar solution – the RSSBus price tag of $239 for the full version of the Facebook SSIS components seems very attractive.

In a production solution, access tokens will need to be acquired via an app that your customers use for one reason or another.  The idea is to build a lookup table that pairs the CustomerId (from your data warehouse) with a Facebook access token generated via your company’s Facebook app.  The SSIS project can then loop through this table (example below) extracting data for each customer, 1 after the other.

[sql]
— Create Schema and Tables for Mapping Facebook Data
USE Demo_SocialData
;
GO
IF NOT EXISTS (SELECT 1 FROM sys.schemas WHERE name = ‘map_fb’)
EXEC (‘CREATE SCHEMA [map_fb]’)
;
GO
IF OBJECT_ID(‘map_fb.CustomerMapping’,’U’) IS NOT NULL
DROP TABLE map_fb.CustomerMapping;
CREATE TABLE [map_fb].[CustomerMapping](
[CustomerID] [nvarchar](15) NOT NULL
,[ACCESS_TOKEN] [nvarchar](2000) NOT NULL
,[EffectiveDate] [datetime] NOT NULL
CONSTRAINT DF_CustomerMapping_EffectiveDate DEFAULT (GETDATE())
,[IsActive] [bit] NOT NULL
CONSTRAINT DF_CustomerMapping_IsActive DEFAULT (1)

,CONSTRAINT [PK_CustomerMapping]
PRIMARY KEY CLUSTERED ([CustomerId],[EffectiveDate])
)
;
GO
[/sql]

But since this is only a demo, we’re just going to use the Facebook utility to generate an access token 😉

The Setup

Before we start building the SSIS solution, we need to install the RSSBus Facebook SSIS components and create a demo database to store the data we extract from Facebook.

The RSSBus installer is very straight forward and of the “next, next, next…., finish” variety.  Afterwards, you should see a few new items in your SSIS project environment:

Click to Enlarge
Click to Enlarge

Next, we’re going to create the demo database and tables to hold the data extracted from my personal Facebook account.  You can download a copy of the database and table creation script here.

Demo_SocialData Schema

  • DimCustomer

This is the generic customer dimension (copied from the AdventureWorks DW sample database) that will be found in most retail data warehouses. As you can see, I’ve created a dummy record representing myself as a customer.

  • facebook.Likes
  • facebook.Friends
  • facebook.Posts
  • facebook.Status

The schema for these 4 tables is based on the output of the RSSBus Facebook Source component. I’ve also included a CustomerId column in each of these tables so that they can easily be joined to the customer dimension for analysis.

Now we are ready to create our SSIS package:

Click to Enlarge
Click to Enlarge

As you can see, this is a very simple package and does not represent a complete solution – in fact, it’s really just a child package. The parent package would loop through a set of customer Access Tokens and execute this child package for each item in the set.

Parameters

  1. ACCESS_TOKEN: value used to connect to Facebook identifying which permissions and objects are accessible. 
  2. CustomerId: value from the internal customer dimension that will be included in the destination Facebook tables to easily join them back to DimCustomer.
Click to Enlarge
Click to Enlarge

Connection Managers

  1. ANTON-PC.Demo_SocialData: connection to my local instance of SQL 2012
  2. RSSBus Facebook Connection Manager: connection to Facebook via the RSSBus tools
Click to Enlarge
Click to Enlarge

Below I’ve parameterize’d the RSSBus Facebook Connection Manager so that it uses the package parameter ACCESS_TOKEN.  I’m a really BIG fan of the “parameters” concept in SSIS 2012.

Click to Enlarge
Click to Enlarge

Data Flow

Here’s a quick shot of the Data Flow – again, simple and straightforward.

Click to Enlarge
Click to Enlarge

The properties of the RSSBus Facebook Source component are very intuitive.  Choose the Facebook connection manager, choose a table (as you can see there are quite a few to choose from) and voila’…

Click to Enlarge
Click to Enlarge

The rest of the data flow just adds an additional column containing the CustomerId value (via a Derived Column Transform) and then loads the data into the destination SQL Server table.

Big Money, No Whammies!

So now we’re ready to run our package…

Click to Enlarge
Click to Enlarge

And here are the results in the database…

Click to Enlarge
Click to Enlarge

Note: Because I’m only using the trial version of the RSSBus components, the extraction is limited to just the first 20 rows of data.

Also, as I mentioned in the previous post, Access Tokens have an expiration date.  If I try to run the same package after the access token expires, the package will blow up…

Click to Enlarge
Click to Enlarge

…and I’ll see an error message like the follow which indicates the the access token has expired:

Click to Enlarge
Click to Enlarge

And that’s all there is to it.

Leave a comment