On Twitter Data

I’ve been getting various kinds of private communication about this, so it’s probably worth some commentary…

For some time now, I’ve been wondering when someone would start to use systems like Twitter as a way to deliver information between programs. A few weeks ago, Todd Fast, a colleague at Sun gave me a preview of what is now the Twitter Data proposal. Todd and Jiri Kopsa have done all the heavy lifting on this, so if you have substantive comments or requests, they are really the people you should be dealing with. They were kind enough to recognize me as a reviewer of their work, but the initial idea is theirs.

Twitter Data is a bit different than what I was envisioning. I was thinking more along the lines of jamming JSON or XML data into a Twitter message as a starting point for program level data exchange. That would allow us to leverage existing tools and libraries and make the entire thing straight forward. The interesting part, then, would be in the distribution network that arose from programs following other programs. This could also be embedded into a person’s Twitter feed by allowing clients to ignore tweet payloads that were structured data.

Twitter Data proposes a way to annotate the data oriented parts of a regular Tweet in order to make it easier for machines to extract the data. Some people think this is a good idea, and some people think it’s a terrible idea. It’s easy to see the arguments on both sides. Pro, is that you could turn your Tweet stream into a way to deliver information about you to programs, and that Twitter Data would make it that much easier to do. The Cons (that I’ve seen so far) are that people don’t want to have this kind of data exchange mixed into their Twitter stream, or that parsing the natural language that appears in the 140 characters of a tweet shouldn’t be that hard.

So we have two dimensions (at least) to the problem that Twitter Data is trying to address:

  1. Is it a useful thing to have structured or semi structured information about a person included in their Twitter feed?
  2. If so, should that data be out of band, mixed in, or extracted (natural language processing)?

Independent of the merits of the specific Twittter Data proposal (and I definitely think that there are merits), I think that these two questions are worth some discussion and pondering.

Tags: ,

4 Responses to “On Twitter Data”

  1. rgz says:

    In other words microformats.

  2. Todd Fast says:

    Actually, you’re confusing the role of Twitter Data with the particular uses of Twitter Data. Microformats are a convention for presenting certain, specific types of data. This convention is built on top of a lower-level, well-defined syntax (HTML) which makes it possible to embed that data.

    Like HTML in the Microformats case, Twitter Data is one abstraction level lower, and concerned with making abstract data embeddable in Twitter in a natural and searchable way. What that data actually is, for example, a microsformat, JSON string, or Base64-encoded bits, is what Twitter Data enables, but not strictly what Twitter Data is about.

  3. […] Twitter Data — using Twitter as a conduit for messages that have semantic markup. My gut reaction is that I’d prefer pure JSON in the data tweets, because a hybrid gives you poor use of the limited bandwidth and there seems no strong reason to care about human readability. (via Ted Leung) […]

  4. Do they call it nanoformats, if it’s just part of a microblog?

Leave a Reply