Sunday, January 1, 2012

Prediction Data as an API in 2012

A good post by Tristan Walker from FourSquare, got me thinking about the 'data space'. In his post, he mentions a company, Palantir (BusinessWeek article); from what I gather, they emphasize development of prediction models as applied to terror prevention, and consumed by non-technical field analysts. Their founder provides a good overview in this video (it's $10) from the O'Reilly Strata conference last year. As an aside, Palantir sounds like it has an interesting corporate culture (has a limit to salary, and is really driving home a 'solving real problems' message).

Reading the BusinessWeek article, begged the question; how is Palantir's approach different than that of Recorded Future's? Recorded Future seems to have a similar vision, but their approach is framed somewhat differently; they're creating a 'temporal index', a big data/ semantic analysis problem, as a basis to predict future events. Again, check out the O'Reilly video - it features a panel of Recorded Future, Palantir, and Twitter (the Twitter speaker, Rion Snow, mentions a lot of really interesting research that has been done against Twitter).

A few more companies that seem to be circling this big data/ prediction modeling space:

  • Sense Networks - I've been following Sense for a while due to the Sandy Pentland/ Nathan Eagle connection (both have done extensive Reality Mining in mobile that I find interesting). They do similar big data prediction analysis, but the domain seems to be mobile-centric. 
  • Digital Reasoning - mentioned here
  • BlueKai - I'm not sure most people would put BlueKai in the same boat as Palantir and Recorded Future, but I'm starting to frame this space as a value chain, and BlueKai seems to have found a way to aggregate, improve, and repackage data for use in digital marketing. 
  • Primal - a Waterloo-based company, it's not entirely clear to me at this point their angle, but the appear to sit between BlueKai and Recorded Future/ Palantir and emphasize "filling out the sparseness of information in social networks". 
Finally, "Follow the Data" seems to be a good source of information on what's happening in the industry. Look forward to spending some time reading through the content. 

It seems to me that the near future of this space will be reminiscent to what happened in mobile in the last few years; API-ization of 'state' data (eg. what Skyhook did in wifi location). There will be data-domain experts (the ability to make meaning out of phone transcript calls between terrorists is a different domain than, say, making meaning out of online shopping behaviors); spanning the ability to make sense of unstructured data, aggregate from multiple sources, run prediction models on it, and make it available to various "application" providers. What we're seeing today are very purpose-built algorithms/ applications, and I suspect the long term value here is in making and interpreting meaning in the unstructured data (rather than storing the data).