Social Graph Paper: 2011

Sunday, December 4, 2011

Facebook Social Network Research

This awesome reading - two embedded, detailed, research papers from the Facebook data team.

Sunday, October 30, 2011

Zuck Interview

Start at 43:30: - http://www.justin.tv/startupschool/b/298692604

Thought this was a really transparent and authentic interview with Zuckerberg @ YCombinator. The main take-aways:

If you don't do what you love, solve a problem that you think genuinely exists, you'll never maintain the tenacity to succeed.
If you only do something a bit better than someone else is doing it, how much can you win?
As a manager, you'll have a tension between centralizing vs decentralizing functions (eg. marketing). Creating 'growth groups' has allowed FB to avoid pitfalls of environmentalism. (sounds like the Strategic Product Management function)
You don't need to be in the Valley to succeed - it helps if you know nothing. (Waterloo + communitech?)

----
My notes:
Most companies mess up by moving too slowly, and trying to be too precise.

The biggest risk is not taking any risk. (100% of putts left short don't go in)

Old skool: Index and content served to a lot of people -> caching, scale systems
Social: different fundamental experience. memcache (open source project)

Having the poeple who make product decisions, also understanding the technical issues is fundamentally important.

Everything is better with social: Emotional and informational efficiency - matchedup with better stuff.

Doing this because it's awesome and should exist.

Thing about engineering, you never do the same thing twice, you just abstract it.

3-4-5 is the unit of a team- 50 is too big

growth groups - first insights: main features of FB is your friends are there. two levels: strategically - grow scale company, and better experience. Didn't Just leave it to chance - wanted to build a competence in growing, scaling, AND finding friends most easily. key things that drive engagement - drop box started one, many other companies did. eg discovery; didn't have great analystics around engagement. Needed 10 friends.

acquisitions- saw true colors of poeople around me. Saw that some just wanted the money.

dropio - ceo lead timeline project at FB. Decided he'd have a bigger impact at facebook than in his old company.

FB at it's essence are the products we build.

Theil: advice as founder, not money manager. Don Graham: had washington post in his company for generations -> build companies for long term.

most inspiring, surprising thing: can be so bad at so many things, but if you stay focused on providing value to your customers, and you're do so unique, then you'll get through it.

Self selection bias; do stuff that you're passionate about -> leads to company. Companies that work are those where the founders are passionate about what they're doing.

The last 5 of social networking has been about getting peeople connected. the next 5 will be: what are those things that now that people are connected? We're at, or close to that tippinging (you can or can't build on a social graph, can or can't ship CDROMs eg aol).

Things build outside of the valley seem to be on a longer term cadence.

Need to do something totally different, because if you do somehting a little bit better than someone else, how much can you really win?

Saturday, October 15, 2011

Social Network Analysis of Twitter... you had to know it was coming

You had to know this was coming - after a mobile and Facebook post, what was left? This is probably the most interesting/ counter-intuitive of the 3.

What is Twitter, a Social Network or a New Media? [Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moo] mined ~42mm profiles, tweets, and trends to better understand the nature of Twtter. The broke it into 3 parts:

Network Analysis - understanding the structure of the Twitter network.
Popularity Analysis - based on # of followers, pagerank, and retrweets
Information Diffusion - how (re)tweeting diffuses through the network.

Here are the major outcomes:

There is basically (basically) a 1-1 correlation with # of tweets, and # of followers/ followings. Tweet more, get more followers.
Low reciprocity. Due to the asymmetrical nature of twitter (ie I can follow you without you following me back), only 78% of links are one way.
Degree of separation: on twitter, there are 4 degrees of separation. This is really unintuitive at first due to the directedness of the network, but if you consider the "super nodes" on twitter (eg Oprah), this makes sense. Conversely, on facebook, most poeple can't be friends with Oprah.
Homophily: People who have a lot of followers tend to be friends with people who have a lot of followers. The more followers you have, the more likely your friends are in other timezones.
User Popularity: Ranking by followers is interesting, but they're actually not generating the most retweets (this is a better measure of influence).
Trending items vs google: items stay trending on Twitter longer than Google due to the retweeting phenomenon.Most active periods are less than a week, but 31% are 1 day long.
Retweet impact: (this is weird) regardless of how many followers you have, if your tweet is retweeted, 1000 people will see it. Of course, if you have more followers, your tweet is more likely to be retweeted, but a retweet view remains constant at 1000 incremental people.
On average, if they happen, first retweets occur ~1 hour after the tweet, 2nd - 6th occur within 10 minutes. Crazy diffusion rate.

Tuesday, October 11, 2011

Basic Social Network Analysis Criterion

Just finished two interesting papers which analyze social networks: Planetary-Scale Views on a Large

Instant-Messaging Network (Leskovec, Horvitz) and Statistical Analysis of Real Large-Scale Mobile Social Network (Zhengbin Dong, Guojie Song, KunqingXie, Ke Tang, JingyaoWang).

The former was an a an analysis of a month's worth of MSN Messenger traffic and network structure. The latter, an analysis of chinese phone log and corresponding network structure.

Though the results were interesting (I won't share them here), I was actually looking for the criterion they analyzed:

Degree: simply put, the number of connections a user (node) has.
Shortest Path: the fewest number of users between two users.
Diameter: the largest shortest path in a network.
Clustering Coefficient: the ratio of actual connections a user has to potential connections. Measures the transitivity of a network (ie the propensity for your friends to also be friends themselves).
Betweenness Centrality: the ratio of the count of shortest paths (between user A and user B) that pass through a user (user C) to all shortest paths (between user A and user B).
K-Core Distribution of Component Size: gives us an idea of how quickly the network shrinks as we move towards the core. Or, how large (number of users/ nodes) is the core component when a constraint of the minimum degree (k) is applied. (ie for a network where nodes have degree, k >20, how many total nodes in the component?)

Most of these characteristics are represented as a distribution (ie what is the degree distribution of all nodes in a network?) and tend to provide insight into the stability and density of a network. For example, a network with a higher-than-average skewed degree distribution (ie people have a lot of friends), will tend to be more stable (ie be more resilient to the k-core test), have shorter paths (on average) and therefore a smaller diameter, will be clustered more, and have higher betweenness centrality.

This is really nerdy stuff...

Sunday, October 9, 2011

Facebook's Research: Leveraging Friendship for Determining Location

A chapter in Networks, Crowds, and Markets on small-world phenomenon describes a generalized relationship between rank-based distance and friendship probability. In research leveraging data from the site LiveJournal, the relationship is approximately: Probability of friendship is = 1/r. So, co-present people are basically 100% likely to have a tie, and the 100th person closest to you, your probability of friendship is 1/100. A quick note on using rank instead of geographic distance - numerically this approach is more meaningful because geographic distance is non-uniformly distributed (eg. in the US, the major of people are on the coasts, not spread uniformly over the area of the country).

Researchers at Facebook, Lars Backstrom; Eric Sun; Cameron Marlow, leveraged this research to further investigate using this relationship between friendship and 'distance' and to develop reasonable algorithms to use this relationship to predict friend location.

As an aside, the typical approach used by smart phone application developers to determine geographic location is to leverage the handset's IP address. For example, Skyhook allows developers to submit handset IP address and Skyhook will return a geograph lat-long. This approach provides a reasonable estimation of location, however due to the 'reassignment' nature of IP addresses, this approach is error-prone and, according to Backstrom, Sun, Marlow, is only accurate 57.2% of the time.

Amazingly, in Find me if you can: Improving geographical prediction with social and spatial proximity, Backstrom, Sun, Marlow leveraged the location of one's friends to determine your location to accuracy greater than that of IP geolocation. With 16 friends sharing location, they were able to determine your location within 25miles ~67.5% of the time!

The obvious implication of this research is that the historical approach of lat-long:IP relationships could technically be augmented with friendship only data to improve results. More interested, potentially controversial, is where many users may opt to not explicitly share their location with vendors such as Skyhook, approaches exist, more accurately, to determine YOUR location if your friends share theirs (ie your friends are indirectly providing services such as Facebook your locations when they 'check in').

Saturday, April 16, 2011

Balanced Signed Networks in Social Media

In previous posts, something that sat uncomfortably with me in social networking theory is the lack of description and weight of edges (ties). I should have kept reading because, of course, the related concept exists in the research; ties between individuals can be positive (friends) or negative (enemies).

An interesting analysis is a paper "The slashdot zoo: mining a social network with negative edges" [Kunegis, Lommatzsc, Bauckhage] is interesting because Slashdot, a popular 'geek culture' site allows members to tag each other as friends or foes.

Further research discusses the concept of balancing these graphs. For example, a network of 3 people, A, B, C is only balanced if all are friends, or only A-B are friends (C is a common enemy). Imbalance occurs when A-B and A-C are friends, but B-C are enemies - this creates a sort of structural instability.

What continues to sit uncomfortably is that the reading seems to overly simplify nuances in real social dynamics and the way that these dynamics are represented online:

This representation of friend/ foe ignores the context of the measurement. For example, the signs of a graph may reverse if the context is "I agree with what you have to say" vs. "I respect what you have to say".
The ties themselves are really an aggregate of 'types' and 'weights' of ties. Consider a political corporate environment where allegiances (friend/foe lines) are formed on power dynamics and corporate structure as well as on personal similarity/friendships. The model doesn't take the mix into account.
In online social networks, there seems to be little explicit definition of 'foe'. For example, you 'friend' people on Facebook, you don't have the concept of 'foe'. An interesting research area might be to determine implicit foes based on friend data (ie if A-B are friends on Facebook and A-C are friends, then you'd think B-C should be friends. If they're not, does that mean they're real-life enemies?).

A related talk on the topic by Jure Leskovec at Microsoft Research (video).

Saturday, February 19, 2011

Social Influence vs. Selection

First, a couple of definitions:
* Selection is a person's characteristics (mutable or immutable) that drive link (friendship) creation.
* Social Influence is the propensity of a person's friendships to drive characteristics.

The first, for example, would be an ethnic group finding a neighborhood where members of the same group live. The second would be how the discovery of a new music act by one friend drives the adoption of the same act by their friends.

I came across some research that looks at selection vs social influence in the context of page editors of wikipedia. The question posed is, how does friendship influence the pages editors work on? How Crandall et al approached this was to look at the similarity (ie the number of common articles they edited) between two editors pre- and post- meeting.

The following graph is an aggregate/ average of many pairs of editors, but the surprising conclusion is the positive-linear nature of similarity and the ramp up/down surrounding when the meet. Of course, there's a level of historical retrospective going on here (looking at behaviors of people that met in the past), but it's interesting to see the build up pre-meeting (selection) and the continued ramp post-meeting (social influence).

Here's the full presentation from "Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining":

Feedback Effects Between Similarity And Social Influence In Online Communities

View more presentations from Paolo Massa.

Representations of Social Networks

In my study of social networks, I keep asking myself why they are commonly represented so simply? The concept of a "graph" is simple enough, and many of the natural extensions I'd like to see never seem to come up.

An artificial graph, below, contains 3 nodes (people), 2 edges (friendships), and 1 "non friendship).

Let's assume you're trying to assess triadic closure (the propensity for B-C to become friends if A-B and A-C are friends). What would be helpful for this graph would be:
1. The nature of the edge between A-B and A-C.
- Are the edges representing true friendships?
- Are the edges actually a blend of two types of edges (professional/ affiliate and personal)? "A" may be great personal friends with "B", but belong to the same club as "C". This is unlikely to drive triadic closure.
2. The weight of the edge between A-B and A-C. Something that has always made me uncomfortable is the qualitative nature of how edges are described. Perhaps because this is, historically, difficult to quantify. Even so, a 1= acquaintence, 2=best friend would add a more comfortable quantitative layer.
3. The nature of the edges between pairs are unidirectional, when stated/ perceived relationships by the individuals within the pairs may not be reciprocal. I'd like to see every edge actually be made up of 2 edges: one for the nature perceived by each node.

Lastly, flying in the face of the triadic closure concept, I'd like to see ties with a weight ranging from -1 (avoidance) to +1 (closeness); 0 would represent non-friendship/ non-tie. If A-B are friends and B-C are friends, and A is a drug dealer, and C is a recovering addict, triadic closure likely won't result here.

I'd think adding these details would provide a more nuanced analysis. Perhaps as I dig a bit deeper, these practices will surface.

Saturday, February 5, 2011

More on contagions...

Just summarizing a great paper that combines social network data and co-location data: "Distinguishing between Drivers of Social Contagion: Insights from Combining Social Network and Co-location Data".

The field now seems ready to move from investigating whether contagion is really at work to why it occurs (Aral 2011; Godes 2011; Iyengar et al. 2011b).

Social contagion may occur for at least five reasons:
1. The process may operate through spreading awareness and interest,
2. Through social learning about the new product’s risks and benefits,
3. Through social-normative influence increasing the legitimacy of the new product,
4. Through concerns that not adopting may result in a competitive or status disadvantage, or
5. Through direct and indirect “network” or installed base effects (Van den Bulte and Lilien 2001).

Who is prone to influence: "Physicians who perceive themselves to be opinion leaders are less sensitive to peer behavior whereas true sociometric leaders are not. This finding indicates that self-confidence rather than true expertise moderates sensitivity to contagion, which is consistent with risk reduction as well as status maintenance mechanisms but not with awareness (e.g., Berger and Heath 2008; Van den Bulte and Stremersch 2004)."

Product-type-specific influence: "what drives contagion is to consider characteristics of the product, and possibly also the influencers. For instance, for products that do not benefit from standard marketing communication and present little perceived risk, contagion may foster adoption by operating at the awareness stage. In such cases, occasional users may be more effective in creating additional awareness than regular users. This is because the latter are more likely to be connected to other regular users and others who are already aware of the product, as noted in a study by Godes and Mayzlin (2009) of stimulated word of mouth for a restaurant chain"

Paper conclusion: "spatial structure overlaps little with network structure, which is why contagion from co-located peers can provide information over and above what can be gleaned from contagion from network peers"

Information-type-specific influence: "Some information and knowledge is quite complex and possibly even tacit. It is hard to convey through “lean” channels such as written documents and presentations at conferences by high-profile speakers, and typically requires “richer” channels, esp. face-to-face interaction (Daft and Lengel 1986)"

Friday, January 21, 2011

Social Contagions and Social Media Marketing Effectiveness

A little background terminology here: things that transmit between nodes in a social network are known as contagions. The most simple real-life contagion example is, of course, diseases, but intuitively we also know that attitudes (eg. product preference) are influenced by who our friends are.

Further, as with disease, in order for contagions to effectively spread, they must be in an appropriate environment. In the case of diseases, not only must the shape of the network be appropriate, but varying degrees of physical contact (it might be as simple as a handshake, or intimate as sexual contact, and genetic predisposition) may be a factor. For product preference, as an example, the degree of susceptibility to a contagion is similarly nuanced. Of course, centrality and number of connections may be large factors, but there are others.

Companies are spending a lot of money promoting their products on Facebook. Estimates have Facebook's advertising revenue at ~$1.86B dollars for 2010 (not inconsequential vs. competitive "portals"). Much of the advertiser interest in advertising on Facebook stems from the belief that "social ads" are more effective than are non-social standard banner ads due to the influence our contacts have over us.

Much of the research by mainstream analysts (1, 2) frame the impact of social media in the form of "influence". This has an implication of a sort of cognitive awareness and formality by consumers; that they make their purchase decisions rationally based on friend's behaviors (eg. "ah, I see Jim has bought , so I will also get one.". Though there may be some contribution by rational decision making processes, I hypothesize that modeling influence as a contagion (eg disease) is a more effective way to measure impact. In other words, you can't control your desire for a product anymore so than you can control your ability to catch a cold.

Google is often criticized for the lack of transparency in their advertising marketplace. Advertisers don't know the publishers, targeting and ad rotation is opaque, and the true price is unclear. Facebook has a similar problem, but it's not as direct or obvious as Google's.

Here's the scenario: An advertiser creates a facebook page and buys ads to promote it. Nested in the ad is a "like" button that, when pressed, acts like other like buttons on the site: for some of your contacts, it inserts an item into their newsfeed. Here's the key problem: Facebook doesn't permit the advertiser visibility into who and where they surface these "likes". For brand advertisers, not all impressions are created equal.
* People are not monolithic influences or non-influencers. My mom might influence cold remedies, but not music taste, for example.

While tastes do signal social identity, what others infer from one’s choice depends upon group membership (Berger and Heath 2007; McCracken 1988; Muniz and O’Guinn 2001). For example, Berger and Heath (2007) find that people
may converge or diverge in their tastes based on how much their choice in a given context signals their social identity. [Do Friends Influence Purchases in a Social Network? Raghuram Iyengar]

* People that I have strong ties with, for some types of products, have already influenced me offline resulting in a wasted cost of an impression. For example, don't bother showing my closest friends that I liked "Against Me's" latest album, we all already have it. That said, if I liked a car brand, it's probably worth showing them my "Like".

With Facebook, as with google, this targeting is done algorithmically. You may get a lot of impressions, and people may say that they're influenced by social media, but are they actually being influenced?

Some further, research from Harvard Business School talks about how relative social standing may paradoxically reduce influence:

Our results show that there are three distinct groups of users with very different behavior.
The low-status group (48% of users) are not well connected, show limited interaction with other members and are unaffected by social pressure. The middle-status group (40% users) is moderately connected, show reasonable non-purchase activity on the site and have a strong and positive effect due to friends’ purchases. In other words, this group exhibits “keeping up with the Joneses” behavior. On average, their revenue increases by 5% due to this social influence. The high-status group (12% users) is well connected and very active on the site, and shows a significant negative effect due to friends’ purchases. In other words, this group differentiates itself from others by lowering their purchase and strongly pursuing non-purchase related activities. This social influence leads to almost 14% drop in the revenue of this group. We discuss the theoretical and managerial implications of our results.

This is consistent with what's known as the middle status conformity thesis. Detailed here (Philips and Zuckerman 2001). Not to put to fine a point on this implication, but if 48% of the population is a "low-class", and these people are not influenced socially, then social advertising to them is ineffective.

Other background reading:
* Impact of Social Influence in E-Commerce Decision Making
* Distinguishing between Drivers of Social Contagion: Insights from Combining Social Network and Co-location Data

Saturday, January 15, 2011

"Giant Components" Implies WInner-Takes-All in the Social Network Race

In graph theoretic social networking analysis, there's a concept known as "Giant Components". As the name implies, in any given human social network, there exists one main, extremely large, set of connected "nodes" (people) surrounded by significantly smaller, disconnected from the giant component, peripheral clusters of social networks.

This is illustrated qualitatively in "Networks, Crowds, and Markets" (free version here) by given the example: consider your current friend group, and who they're connected to, and so on. Ultimately, you'll find you're indirected connected to people from other countries. Another way to put it, if everyone has 100 (unique) friends, you very quickly get to large numbers of connected nodes (100 of your friends x (have) 100 friends x (who have) 100 friends x (who have) 100 friends x (who have) 100 friends = 10B people. However, there will be people, isolated on an island somewhere, that is not connected to the giant component.

Random Example (from here). You can see that a high proportion of nodes below to one connected cluster.

If any one person, in any one of the smaller clusters, becomes connected to the "Giant Component", the entire cluster is then considered part of the "Giant Component". So, it's reasonable to assume that, at some point, the desert island person will eventually meet one person in the giant component. It seems, in this connected world, we're almost fatalistically destined to be part of the giant component.

It is inevitable then, that we become part of the Facebook giant component, right? They're nearing 600 millions users, and check out this giant component.

In reality, things aren't as inevitable. It's not obvious initially, but a few things to consider:

The definition of the edges (connections between people) are a little more nuanced than simply "knowing" someone. What if you, instead of drawing a social graph based on Facebook-stated friendships, you drew it based on spending greater than 10 hours a day together? The graph would become much more fragmented.
Graphs can be used to represent different classes of social graphs. For example, and Facebook even does this, my family, and my coworkers could be represented as separate graphs. In other words, people are capable of belonging to multiple networks.

Both of these facts create an opportunity for emerging or niche social networks to evolve and grow -- and not necessarily at the expense of Facebook either! In retrospect, Livemocha, an interest-based social network, benefited from this.

Another example (or maybe a 3rd bullet is required above stating "cultural norms") is Mixi, a Japanese social network. Recently featured in the NYTimes, Facebook has been relatively unsuccessful in Japan. Some speculate it is cultural in nature; that the Japanese are more private and that Facebook's religious-like fervor towards unfettered openness doesn't resonate there. Allegedly, on Mixi only 5% of users use their real picture as an avatar.

The "giant component" question seems to simply be one of definition. Existentially, or environmentally, aren't we all connected?

As an aside, I'm taking a Social Media Analysis reading course this semester (similar to this one at Carnegie Melon). I have a weekly blog-writing assignment - this is the first post of many.

Social Graph Paper