Facebook as a Data Source
month year, I discussed the value of integrating Facebook data for customer analysis. Originally, I planned to follow up with a post covering the process of extracting data from Facebook using SSIS… but as I started working through the material, I quickly realized the background information on ‘Facebook as a data source’ was worthy of its own post.
What is a graph, and why do I care?
A graph, in the context of this post, is a mathematical construct consisting of a set of objects (nodes) and the connections between these objects (edges). From a data science point of view, graphs are extremely flexible structures making them an ideal vessel for modeling rapidly evolving networks…such as social networks…and that’s exactly how Facebook social data is modeled!
For the Facebook social graph, we have the following breakdown:
- nodes are the entities such as users, groups, pages, etc
- edges are the relationships (likes, friends, etc) between each entity
- properties are the metadata/attributes about entities (name, email, etc)
Below is a graph diagram which I’ll use to walk through an example of a social graph. Let’s start with 5 nodes (circles), 5 edges (lines connecting circles), and 2 property sets:
Now, suppose my mom decides to join Facebook and creates an account. Next she sends friend requests to Jena and me which we accept — and immediately add her to the “family-earmuffs” group (love you mom). This new activity is easily integrated into the social graph diagram by adding a new node to represent my mom’s Facebook account and 2 new edges connecting my mom to Jena and me.
As you can see, the graph is extremely flexible structure and a great way to model and visualize objects with complex relationships…such as all the Facebook friendships worldwide:
Facebook social graph is also highly extensible
Via the Open Graph platform/layer, developers have the ability to extend/enrich the social graph by creating new types of nodes, edges, and properties.
For example, if you were creating a soccer app for Facebook you might decide to define a few new objects (player, team, ball) and actions (pass, shoot, slide tackle). From these objects and actions (nodes, edges – respectively), you can model the activity (objects and interactions over time) of an entire soccer match.
The diagram above shows 2 players (Bill, Eric) who play for 2 different teams (Greensboro United, Winston-Salem Twins). Time stamps are added to each edge (Slide Tackle, Kick) in the graph indicating the order of activities as they occurred during the match – Eric slide tackled Bill @ 8:45:16 am and then Bill kicked the ball @ 8:48:16 am.
Imagine the kinds of analysis that could be performed on (and questions that could be answered by) having all the activity in each match of the world cup modeled as a graph?
That’s cool, but how do “I” actually get the data out?
Facebook provides (and controls) access to social graph data via the Graph API. The Graph API is a RESTful web service through which requests for data (web service calls) are made and data is returned in JSON objects (web service response).
For example, you can request all of the publicly available information for the Coca-Cola Facebook object by submitting the following URL in the address box of your browser: https://graph.facebook.com/cocacola.
The Graph API web service “responds” by returning a JSON object shown in the image below:
Note: JSON, like XML, is extremely flexible but with less overhead.
And this works the same for all objects…you just need the name (or Id) of the object…which you can pretty much guess for popular ones:
- Target: Bill Gates
- Target: Mike Tyson
- Target: Lady Gaga
Now, if you want to retrieve more detailed information, as will often be the case, then you’ll need to obtain an access token.
What’s an Access Token and How Do I Obtain One?
An Access Token is a method for protecting privacy by controlling which data is accessible (token permissions) and the duration of time for which it will be accessible (token expiration). According to Facebook:
An access token is a random string that provides temporary, secure access to Facebook APIs.
A token identifies a User, App or Page session and provides information about granted permissions. They also include information about when the token will expire and which app generated the token. Because of privacy checks, the majority of API calls on Facebook need to be signed with an access token. All access tokens are generated with OAuth 2.0 authentication and authorization procedures.
You can read more about the types of permissions available and how they are grouped here. But basically, you should assume that
you your app will need extended permissions to get the juicy details necessary to facilitate enhanced customer analysis/mining…which is the whole point, right?
By the way, did you notice how I used the phrase “your app” in the last sentence? That’s a pretty key concept. The intention is that *apps* (not users) will request permissions from users to gain access to the Facebook data associated with that user in order to “enhance the user experience”.
- Apps request permissions from users
- Users grant permissions to apps
- Users do not request permissions from other users
Yes, Facebook has provided developers with a utility that can be used to generate Access Tokens on the fly to test various Graph API calls. This, however, is not the intended method for generating Access Tokens in a production solution.
So, in order to connect with users in a way that allows your system to extract information, you will need an app.