Social Graph Paper: October 2011

Sunday, October 30, 2011

Zuck Interview

Start at 43:30: - http://www.justin.tv/startupschool/b/298692604

Thought this was a really transparent and authentic interview with Zuckerberg @ YCombinator. The main take-aways:

If you don't do what you love, solve a problem that you think genuinely exists, you'll never maintain the tenacity to succeed.
If you only do something a bit better than someone else is doing it, how much can you win?
As a manager, you'll have a tension between centralizing vs decentralizing functions (eg. marketing). Creating 'growth groups' has allowed FB to avoid pitfalls of environmentalism. (sounds like the Strategic Product Management function)
You don't need to be in the Valley to succeed - it helps if you know nothing. (Waterloo + communitech?)

----
My notes:
Most companies mess up by moving too slowly, and trying to be too precise.

The biggest risk is not taking any risk. (100% of putts left short don't go in)

Old skool: Index and content served to a lot of people -> caching, scale systems
Social: different fundamental experience. memcache (open source project)

Having the poeple who make product decisions, also understanding the technical issues is fundamentally important.

Everything is better with social: Emotional and informational efficiency - matchedup with better stuff.

Doing this because it's awesome and should exist.

Thing about engineering, you never do the same thing twice, you just abstract it.

3-4-5 is the unit of a team- 50 is too big

growth groups - first insights: main features of FB is your friends are there. two levels: strategically - grow scale company, and better experience. Didn't Just leave it to chance - wanted to build a competence in growing, scaling, AND finding friends most easily. key things that drive engagement - drop box started one, many other companies did. eg discovery; didn't have great analystics around engagement. Needed 10 friends.

acquisitions- saw true colors of poeople around me. Saw that some just wanted the money.

dropio - ceo lead timeline project at FB. Decided he'd have a bigger impact at facebook than in his old company.

FB at it's essence are the products we build.

Theil: advice as founder, not money manager. Don Graham: had washington post in his company for generations -> build companies for long term.

most inspiring, surprising thing: can be so bad at so many things, but if you stay focused on providing value to your customers, and you're do so unique, then you'll get through it.

Self selection bias; do stuff that you're passionate about -> leads to company. Companies that work are those where the founders are passionate about what they're doing.

The last 5 of social networking has been about getting peeople connected. the next 5 will be: what are those things that now that people are connected? We're at, or close to that tippinging (you can or can't build on a social graph, can or can't ship CDROMs eg aol).

Things build outside of the valley seem to be on a longer term cadence.

Need to do something totally different, because if you do somehting a little bit better than someone else, how much can you really win?

Saturday, October 15, 2011

Social Network Analysis of Twitter... you had to know it was coming

You had to know this was coming - after a mobile and Facebook post, what was left? This is probably the most interesting/ counter-intuitive of the 3.

What is Twitter, a Social Network or a New Media? [Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moo] mined ~42mm profiles, tweets, and trends to better understand the nature of Twtter. The broke it into 3 parts:

Network Analysis - understanding the structure of the Twitter network.
Popularity Analysis - based on # of followers, pagerank, and retrweets
Information Diffusion - how (re)tweeting diffuses through the network.

Here are the major outcomes:

There is basically (basically) a 1-1 correlation with # of tweets, and # of followers/ followings. Tweet more, get more followers.
Low reciprocity. Due to the asymmetrical nature of twitter (ie I can follow you without you following me back), only 78% of links are one way.
Degree of separation: on twitter, there are 4 degrees of separation. This is really unintuitive at first due to the directedness of the network, but if you consider the "super nodes" on twitter (eg Oprah), this makes sense. Conversely, on facebook, most poeple can't be friends with Oprah.
Homophily: People who have a lot of followers tend to be friends with people who have a lot of followers. The more followers you have, the more likely your friends are in other timezones.
User Popularity: Ranking by followers is interesting, but they're actually not generating the most retweets (this is a better measure of influence).
Trending items vs google: items stay trending on Twitter longer than Google due to the retweeting phenomenon.Most active periods are less than a week, but 31% are 1 day long.
Retweet impact: (this is weird) regardless of how many followers you have, if your tweet is retweeted, 1000 people will see it. Of course, if you have more followers, your tweet is more likely to be retweeted, but a retweet view remains constant at 1000 incremental people.
On average, if they happen, first retweets occur ~1 hour after the tweet, 2nd - 6th occur within 10 minutes. Crazy diffusion rate.

Tuesday, October 11, 2011

Basic Social Network Analysis Criterion

Just finished two interesting papers which analyze social networks: Planetary-Scale Views on a Large

Instant-Messaging Network (Leskovec, Horvitz) and Statistical Analysis of Real Large-Scale Mobile Social Network (Zhengbin Dong, Guojie Song, KunqingXie, Ke Tang, JingyaoWang).

The former was an a an analysis of a month's worth of MSN Messenger traffic and network structure. The latter, an analysis of chinese phone log and corresponding network structure.

Though the results were interesting (I won't share them here), I was actually looking for the criterion they analyzed:

Degree: simply put, the number of connections a user (node) has.
Shortest Path: the fewest number of users between two users.
Diameter: the largest shortest path in a network.
Clustering Coefficient: the ratio of actual connections a user has to potential connections. Measures the transitivity of a network (ie the propensity for your friends to also be friends themselves).
Betweenness Centrality: the ratio of the count of shortest paths (between user A and user B) that pass through a user (user C) to all shortest paths (between user A and user B).
K-Core Distribution of Component Size: gives us an idea of how quickly the network shrinks as we move towards the core. Or, how large (number of users/ nodes) is the core component when a constraint of the minimum degree (k) is applied. (ie for a network where nodes have degree, k >20, how many total nodes in the component?)

Most of these characteristics are represented as a distribution (ie what is the degree distribution of all nodes in a network?) and tend to provide insight into the stability and density of a network. For example, a network with a higher-than-average skewed degree distribution (ie people have a lot of friends), will tend to be more stable (ie be more resilient to the k-core test), have shorter paths (on average) and therefore a smaller diameter, will be clustered more, and have higher betweenness centrality.

This is really nerdy stuff...

Sunday, October 9, 2011

Facebook's Research: Leveraging Friendship for Determining Location

A chapter in Networks, Crowds, and Markets on small-world phenomenon describes a generalized relationship between rank-based distance and friendship probability. In research leveraging data from the site LiveJournal, the relationship is approximately: Probability of friendship is = 1/r. So, co-present people are basically 100% likely to have a tie, and the 100th person closest to you, your probability of friendship is 1/100. A quick note on using rank instead of geographic distance - numerically this approach is more meaningful because geographic distance is non-uniformly distributed (eg. in the US, the major of people are on the coasts, not spread uniformly over the area of the country).

Researchers at Facebook, Lars Backstrom; Eric Sun; Cameron Marlow, leveraged this research to further investigate using this relationship between friendship and 'distance' and to develop reasonable algorithms to use this relationship to predict friend location.

As an aside, the typical approach used by smart phone application developers to determine geographic location is to leverage the handset's IP address. For example, Skyhook allows developers to submit handset IP address and Skyhook will return a geograph lat-long. This approach provides a reasonable estimation of location, however due to the 'reassignment' nature of IP addresses, this approach is error-prone and, according to Backstrom, Sun, Marlow, is only accurate 57.2% of the time.

Amazingly, in Find me if you can: Improving geographical prediction with social and spatial proximity, Backstrom, Sun, Marlow leveraged the location of one's friends to determine your location to accuracy greater than that of IP geolocation. With 16 friends sharing location, they were able to determine your location within 25miles ~67.5% of the time!

The obvious implication of this research is that the historical approach of lat-long:IP relationships could technically be augmented with friendship only data to improve results. More interested, potentially controversial, is where many users may opt to not explicitly share their location with vendors such as Skyhook, approaches exist, more accurately, to determine YOUR location if your friends share theirs (ie your friends are indirectly providing services such as Facebook your locations when they 'check in').

Social Graph Paper