Thinking about email: Implied social networks in the inbox

bottom-img

Note: if you like this post, you’ll want to learn more about the Tinkerpop stack.

A few months ago, my health improved for a bit, and I was able to do some thinking about email.  I believe your most valuable, untapped social network is in your inbox.  I thought about competing with Xobni, Rapportive, etc. from a more analytics-intensive, social network analysis perspective.  More than that, I wanted to create a new kind of social network based on a new kind of sharing.

So I set about slurping gmail and yahoo via OAuth IMAP.  To grab a few emails is trivial.  To reliably get the entire mailbox - often 100s of thousands of messages, is actually hard.  Many exceptions can crop at many levels of the stack, and you’ve got to catch all you can and resume on failure.  And who the fuck knew that Ruby has Error in addition to Exception as a base level Error/Exception thingy?  Not me, for about a week of debugging.

  • Slurping data can be exhaustively time-consuming.  Plan for it.
  • Consider - maybe you don’t need all the data to ship your first version.

I built an in-ram, Voldemort-backed summary of your inbox social network in real-time using JRuby and Pacer.  Then I set about visualizing it to see where I was at.

  • Visualize first, or you cannot product manage your analytics product.  Remember, there’s an extra variable here: what is possible to derive from the data.  You want to be informed and surprised.  Plan for epiphany.

This is a print of an early iteration of a visualization of an inbox using the OpenORD layout in Gephi.  Grey lines are unreciprocated, the blue lines are reciprocated.  I thought about improving and selling these for ramen profitability.

The problem here in thinking about what kind of features to build into a simpler, list-based HTML interface is that there is way too much noise.  You can’t easily make inferences about your own network.  So I looked at all the filters in Wasserman and the literature.  The first one I tried was the k-core, which works great in social graphs.  

It turns out though, that k-cores are pretty terrible in weighted social networks.  There are several implementations of weighted k-cores for social networks.  After some searching, I found one that worked reasonably well and implemented it in Pacer.

After I had these really clear summaries of social networks, I started doing simple NLP of the emails themselves as they streamed in: splitting emails into significant n-grams (2 and 3-grams mostly) and doing a TF-IDF summarization of its content.  This enables topic-specific maps, and it enables you to track the flow of n-grams across your network, and all kinds of interesting things.

What I was doing next was the point of the entire thing, and just in case… I’m not telling.  It is really cool, and if I had my health I’d be kicking ass with this idea.  Or it would have flopped.  But I would have known.

Then my pain got bad again and I fell on my ass.  I have an application up at http://kontexa.com that will OAuth into your gmail.  If I fire off a command, I can make you a poster and with a little fine tuning, email it to you, or print it and mail it in one of those tubes taking up so much space in my office.  Thats as far as I got before I flared.

Now I’m able to work part time, and I need to pay the bills… you can’t part-time a startup.  I got a part-time gig with a very cool company.  So this whole thing is shelved.  But this is the direction of what I was doing.  I thought I would share.

Can you guess the cleverness from the picture above?