Log in

No account? Create an account

When URLs Collide

There is something visceral and horrible at seeing a car hit a human body and then seeing that body topple to the ground. The last time I saw this, it stuck in my mind for a few days. Maybe it will depart more quickly this time. I was heading over to take pictures of the idiots at Beta-Theta-Pi (same boozehounds who have made fools of themselves numerous times over the years I've been here) who were trying to mock the G20 protests when I saw the accident.

I think many of the times I've been most passionate in political arguments have been when I've argued with people who held positions I once strongly held and have since rejected. This extends well beyond libertarianism, into a variety of individual positions as well as former ideas. Behind my feet is a graveyard of ideas - it may add a bit of spice to my feelings when I so argue that perhaps I am trying to reassure myself that the many changes I've been through are not in fact random, and the example of people my age moving towards positions I once had feels like a tiny counterintuition pushing against that. Maybe. Perhaps this means I am not a good enough relativist :P

When it comes to applied, software-for-end-users computer stuff, I have long been passionate about annotation. Part of this is the continuation of an admiration for Vannevar Bush's ideas about information in society - I feel that core human potential in information is best studied beginning with his ideas. The ability to take any piece of information, any object, any topic on the internet and share one's thoughts, starting a discussion, rating, categorising, etc, is something we need to become better thinkers, both individually and collectively. It is not enough to rely on content providers for this - they often lack the means, they desire too much control, and the deepest forms of collaboration are not possible by their hand. Examining this individually,

  • We cannot expect people providing snippets of content on the web, on twitter, on their own site or some shared one, to produce systems for comment and collaboration for the entire public. Unless shared content platforms become very sophisticated and very easy, this is too much a burden, particularly given the social obligations of a forum host (content takedown, spam, etc). Some content platforms are lightly enabled for this kind of thing (like Youtube or slashdot), but it's not good enough.
  • They insist on control in order to serve their own interests - with some, they offer content producers control that should not be given for annotation/comment/discussion (Google/Youtube should absolutely prohibit uploaders to turn off these things, although more ideally a platform should make it impossible in principle to block such things), with some they actively censor mention of competitors, and with others they snip out things they dislike
  • They are incapable of giving us the depth of what we want. An annotation rich environment would let anyone comment on anything, rate anything, start discussions on anything, and it would also let people filter what they see by their friends, by communities, or by trusted moderators (or promoters). A given piece of content could and should have a number of separate annotation pads (some private-personal, some private-shared, some public), and similar with the other kinds of content. Because of the complexity of getting this done, we would like to see a single architecture to do this (ideally one that is distributed but offers strong notions of identity/community/moderator/channels).
How might we prototype such a system? Livejournal, plus some client-side hacks, might do the trick for most of what we'd like.

These ideas were inspired by noting that Scott Adams (of Dilbert)'s blog has an Atom feed, to which I subscribe on LJ. LJ has its own comment system, distinct from that of the content hosting of his blog (very few blog systems, if any, only provide a feed without any primary web content store). If I comment on the LJ syndication, he'd never know about it unless he took the effort to learn that the feed exists, and that same feed might create secondary content stores with their own content, annotation, aor rating systems outside his control. This is in fact pretty cool (people who really believe in authors rights might disagree, but as I've stated before, I hold that the only control an author should expect or have over content they create and allow distribution of is one against misattribution). I would be amused if, for example, conservative Christian sites were to slurp content from my blog or website and criticise it (I might even join in discussions held by their rules if the rules wern't too onerous). The problem with just blogging everything is two-pronged - first, that it is difficult to stumble onto all the commentary/annotation for a bit of content (even moreso if one imagines wanting to just share some of this with a limited bunch of people), and second, that it doesn't scale well.

If we wanted to solve the indexing while using LJ as a backend, we'd consider using LJ's existing syndication system, with a custom feed source that would accept new posts generated by a web request, providing a forum. This might be ok for very recent news, but LJ syndicated accounts both have content that expires (generally not desirable for our new information area) and don't let us do a neat trick that solves the indexing (how-do-I-find-it) problem. Let's instead consider a regular LJ account, but using time as a hash index. In particular, if content has an URL, we would use a well-known (or reliably retrieved from the LJ account's first post) hash function to transform that into a time between FIRST_TIME_PLUS_5_SECONDS_LJ_BACKDATED_POSTS_CAN_BE_POSTED and LAST_TIME_FORWARD_DATED_POSTS_CAN_BE_POSTED_MINUS_ONE_YEAR. Whenever we want to annotate a site, we place the content into a post at the time the hash function provides, initialise a rating thread where other clients would provide their rank, initialise a categorisation thread for the same, and have all regular discussion happen in later threads. If a bucket collision happens, new times are allocated in that year we left above to hold that content, and a quick pre-note is added to the post providing URL-to-postids so allocated. We'd need a language for annotations (particularly inline ones). This might not scale particularly well, but it would allow a fair amount of existing figured-out-tedness in LJ to be used until a really good design for this is paired with someone willing to face down content providers used to control (and other providers that act as middlemen, e.g. Amazon).

⸘The Port Authority has twitter‽ That's pretty cool.


I have spent a lot of time thinking about the sort of thing you are talking about. I don't really think about it directly in terms of 'annotations', although I see that as an important application. What seems most valuable to me is the idea that 'stuff', some of which is in the form of URLs, can be ambiently discussed. Discussable objects other than URLs: people, books, restaurants, businesses, ice cream flavors...

As you might guess, this would be an undertaking, since to wield the full power of such a system, one would first have to solve the ontology problem, so that one can name the objects one wishes to speak about. But one can design a system that can be used to comment on webpages, with the extension in mind that it someday be able to comment on other things, too.

Twitter's hashtags give a tantalizing glimpse of such a system. They allow commentary on any topic, where the topics are defined in an unstructured way, as free text tokens. It's already possible to use twitter to comment on websites -- just pick some convention as to how you include the URL in your tweet. (In fact, just today I ran into a blog which syndicated all tweets mentioning the post URL into the comment feed. This is pretty close to the right thing.)

The use of Livejournal to build this system is cute, but it doesn't solve the problem. It divorces control of the comments from the original content provider, but it is still beholden to a centralized censor (as is Twitter.) As I see it, the solution is this: Allow people to publish their commentary in a distributed way, i.e. on whatever server or service is available to them; then use technology to syndicate and assemble these comment threads for display, to hide their disparate "locations".

If instead of Twitter, one used StatusNet/laconi.ca/identi.ca, various names for a twitter-alike which supports federated servers, one would be pretty close to victory. I don't think the 160 character limit and the real-time nature of these services is quite the right fit, but they give a good flavor for what the right thing might look like...