Indexing the firehose

October 22nd, 2009

Both Google and Bing have signed agreements with Twitter to be able to index the live feed of ‘tweets’. There are several things I’d love to know about this.

Firstly, just from technical curiosity: how fast is that data flow, exactly? I wonder what kind of infrastructure is needed to index it in real time. Presumably they’re going to index everything?

Secondly, the business side… Several companies have exited successfully by creating something interesting enough for Google or Microsoft to want to buy. I wonder how many healthy ongoing businesses can be made from creating a data stream interesting enough for them to want to index?

And thirdly, the statistics will be fascinating, if we ever get to hear them. For example, I wonder how often the search query will now be longer than the item returned…