Posts Tagged ‘live web’

WatchWithMe.net is a project I’ve been kicking around (and prototyping off and on) for over a year. It is, at its heart, and like many great projects, an attempt to solve a personal problem. But it’s also a problem I think that many people have, even if they don’t know it yet.

A bit of background: during the summer of 2008, knowing that I was soon to move to New York City for graduate school, I quit my job and took a ten week roadtrip. I drove all over the United States crashing on friends’ couches and gorging myself on social interaction. I watched a lot of DVDs that summer, with literally dozens of people. Movies, TV shows, anything that looked even remotely interesting. And it was fun. A lot of fun. It turns out that watching things with your friends is incredibly enjoyable in a whole different way from watching things alone. Of course, I already knew this because I’d spent years watching stuff with my close friend Nikki in Alabama, and things were always more fun that way.

But that trip inevitably ended, and I packed up my tiny apartment and moved into an even tinier apartment (because, hello, New York City). While I was still settling into life in the big city it dawned upon me that all that fun I’d had watching stuff with people, and especially with those specific people, was a thing of the past. Who was I going to get to watch under-appreciated kids shows in New York? (Though, to be fair, in a city like New York the problem is more finding them.)

Still, the experience was compelling enough that I wanted to find a way to repeat it, and given the explosion in internet technologies it seemed like the sort of thing that technology could help with. The first thing I tried was the most slap-dash. I talked to some friend via instant messaging and we figured out what we wanted to watch that was available on YouTube. Then we set up a chat room and tried to hit play all at once. Rather predictably this resulted in coordination problems: no one hit play at exactly the same time, and things only got worse when we tried to pause so someone could answer the phone or run grab a drink.

It struck me that timing and coordination are, in fact, things that computers do extremely well. So it seemed obvious that software should handle coordination so that people can focus on watching and talking.

Thus was born WatchWithMe.net. The first prototype was built on the open-source Flash player Flowplayer, but since I’m not a Flash programmer the rest of the system was in HTML and JavaScript, and it was hard getting the two to play well together. So I put the project on the back burner.

And while it simmered away, getting a couple of semi-successful tests, a whole bunch of browsers released versions with support for the HTML5 video tag. Which is when it dawned on me that this is what I had been waiting for. With the ability to write the entire thing in a single environment, I started over, which led us to where things stand now.

The Tech Stuff

The technologies that run the current system are:

HTML5 video, specifically the Ogg-Theora implementations in Firefox and Chrome

AJAX, with a terrible inefficient 1-second polling interval to keep each client synchronized with the server

PHP, to handle the server-side interface with the database

MySQL, mediated through PHP to handle all the dynamic data storage and retrieval

AJAX polling is obviously not the most efficient way to manage this sort of tight synchronization, and there are actually a number of huge headaches that have to be managed by using it, but it’s effective and avoids Flash. (And it sets the stage for migration to web-sockets whenever they manage to take off.)

One of the other things that drove the current, inefficient design was a desire to avoid running a server-side timing process to synchronize users to. What the system does instead is pick a single user and on every AJAX polling action sends that user’s current video timestamp (along with whatever other information needs to be sent such as chat messages or system messages) to the server, and other users get that timestamp as a synchronization target every time they poll. It’s messy and latency becomes an issue, but it works.

With mid-terms coming up in Live Web I need to pick a project. The problem is that I have two of them I might work on. The first is the project I know will end up being my final in the course, the project I came into the course planning to improve: Watch With Me. The other option is to build a flash-based version of the game telephone.

Watch With Me, at this point, needs mostly grunt work retooling. As a concept I’ve already proved it works, so any work for class wouldn’t really be about improving the concept at this stage. Still, it is work that needs doing and the project is really cool and worth executing.

Telephone, however, would be a relatively new thing for me. A project started from scratch and thus one where a lot of the design and conceptual work still needs to be done. It’s also a much smaller project, the sort of thing that can be done to my satisfaction (and not need any more work) by the time mid-terms are due. Basically it would use webcams and built in mics on laptops to create a chain of video chat users. You’re only connected to the person in front of you and behind you in the chain, so that you have to pass any messages from one person to the other. It might just be simply fun to play with too.

Which is why I’m leaning in the direction of the Telephone project. While Watch With Me is, in the long term, far more compelling, in the short term Telephone could be more fun and represents more of a conceptual stretch for me. Especially since I’m intending to use Watch With Me as my final project for the class.

For a long time we turned to live media because it’s “immediacy” (more on the scare quotes in a second) was unsurpassed. You got literally current news from live updates. Of course “live” generally had a built-in delay. Not a huge one, but an appreciable (seconds to minutes) one. Still, this was pretty darn close to immediate, and we came to associate the concept of live media with the idea that “media doesn’t go any faster”.

Except that it does. Because, traditionally, “live” media has been filtered through the same publishing apparatus as other heavily produced media, and that has built-in delays. Modern communications technology, however, has given us access to unfiltered live media production (Twitter, FaceBook, and more media-rich applications such as live streaming from a cell phone camera). And it turns out that for raw information, text is faster, more compact, and generally more useful than the forms of media we’ve traditionally considered “live”.

With news-delivery seemingly eliminated as an interesting application for live rich media, we’re left with only one obvious significant use: simulating real-world space. This is the realm of live performance (in both the entertainment and the educational sense of the word). Lectures, plays, improv, music, etc. These are things that we understand work best in live contexts where there is potential for feedback and interaction, and feedback and interaction of the sort which impacts the performance directly requires real-time speed.

One thing worth noting here is that while a screen tends to be a great way to receive a live stream of audio and visual data, they make terribly restrictive systems for creation of that data. That is to say: most live streams are one-way. It is hard to perform while watching a screen because it restricts your movements and involves multi-modal interaction with mismatched turn-taking. (That is: most feedback systems have data incoming at the same time that a performer has data outgoing. This is in contrast to performances in live space where the tendency is to coordinate turn-taking so that data is only going in one direction at a time.)

One of the reasons this problem arises is that we still haven’t solved the turn-taking problem for online discussion. In face-to-face interaction we’ve come to intuitively handle multi-modal communication in ways that allow us to pass turn-taking information on a separate channel from the one we pass actual data-content stuff. Usually turn-taking is a body-language (visual) thing while data-content is vocal (aural). Most online interaction, however, is handled purely through visual data. Or, in the case of live voice chat, there is no good way to pass visual turn-taking data.

Which, at this stage, leaves me rather disinterested in live streaming of rich media. The point of going live should, I think, be that it enables interaction, but it’s not at all clear that interaction is enabled with current communication tools. I think it is well within our ability to create new tools which support interaction at this level, but at the moment I’m far more interested in interaction by audiences around media than in the way audiences impact performance. Consider, for instance, that watching a live stream of a concert is very different from going to that concert in person, even if you have the same audio/visual setup. The difference is that you aren’t sharing interaction space with the audience. It’s audience interaction that really drives a lot of the power of live performance, and I find that extremely compelling. But the weird thing to note is that audience interaction happens even with pre-recorded media (cinema, for instance), so there’s nothing particularly compelling about the live component there.

I realize that all of this is in extremely sloppy form, but that’s about how my thoughts run on the subject at the moment.

The idea for this project grew out of an offhand comment during class, one that Shawn suggested was interesting and one that, as I thought about it, struck me as interesting as well.

“Pichat” is an attempt to force people into creativity in two separate, and rather unrelated, ways. It is, at its core, a relatively standard AJAX-based chat interface, but instead of transmitting text back and forth, Pichat transmits images. The only text used is that necessary to identify users.

The interface is relatively simple. Using an AJAX call to google’s search API, a user may enter any valid google search string and get the first four image results that fall within a specific size class. These four results are displayed for the user to consider. If the user decides to utilize one of these images, all they need to do is click on it and it will be sent as their message to the chatroom. If none of the four images seems suitable then they must search for a different term.

There are two points of restriction here, and both of them are in some way compelling. The first is that communication is image-only. Trying to compress a thought or expression into a single image (even when given the entire internet to draw from) can be an extremely difficult and creative process, and I’d love to explore the sorts of conversations that arise in this environment. Additionally, you only get access to the top four search results from google. If none of them do what you want, rather than being able to page through to more, you must refine your search. Crafting a search string specific enough to get you what you want provides an interesting challenge as you can not simply search for something “close enough” and then dig through the pile of results manually until you find what you want.

On a down note, the code for this thing is abysmal. It’s a flat-file stored, full text-dumping PHP implementation on the back end that results in massive levels of back-end processing and bandwidth inefficiency and significant front-end inefficiency in addition to a number of interface issues that just make it ugly.

One of the first things that stands out to me (and justifies my obsessive time-stamping) is the fact that the set up and storytelling took twenty-five minutes, and the story itself took just over twelve. Since the story probably took about three or four minutes to related in class (maybe five or six if you include discussion time), this is a rather significant slow-down. Not that this is particularly surprising since there’s always a slow-down when moving from a high-bandwidth mode of communication, like face-to-face discussion, to a low-bandwidth mode like instant messaging.

But the slow-down isn’t entirely tool-based. Or, perhaps more accurately, it’s not tied to the technical aspects of the tools. Because while I do type slower than I talk, I can type extremely quickly. Combined with the way that we tend to distill things when they shift to text (elaborating less in order to make things more compact and coherent) I probably could have whipped the story out in a minute or two. Just looking at the content of my story-telling shows how little there actually is there. The story may be a hundred and fifty words, but it’s probably less than that, and it still took significant time to compose and transmit.

So there’s clearly more at work here than text being slower than speech, and I think it has a lot to do with the social conventions of IM, which are quite deeply drilled into my head. IM is a give-and-take medium. Turn taking is indicated by message submission, so the conversation tends to pause slightly after every line in an implicit offer to everyone else to respond. Only if there is no response for a while does the thread of the story get picked back up, which provides for a sort of stilted feeling if you were to read it aloud in real-time, but seems to be a natural expression of the IM medium.

Further, IM is generally considered to be a conversational medium rather than a performer-audience one. People are expected to interject and comment, and when they do the discussion is briefly derailed as people respond to that response. We don’t really have, or at least I and the people I spend time with online don’t really have, a set of norms for non-conversational story-telling. That means that most of our stories tend to come out looking like conversations rather than a more formal sort of presenter-audience interaction.

I don’t know if there’s much more to say than that. It’s not something that bothers me, after all. In fact, I think I rather like it. It does, however, highlight two important things:

1) Various mediums lend themselves to various uses. Picking a medium that is unsuited to your intended use may be a bad idea, or it may just result in something interestingly unexpected. After all, while it doesn’t look much at all like the in-class presentation of my story, I rather enjoyed the online telling of it too.

2) Computer-mediated interactions are about more than just the technological tools being used. While there’s nothing inherent in the technology of instant messaging that prevents a story from being told in a much more traditional form (what one might call a “wall of text”), there are cultural norms about the use of instant messaging that would make that feel weird, almost like a violation.

And so, I feel that the exercise was well worth undertaking, and that it was fun, to boot.

The assignment was to tell our story in an online environment. Ironically I was assigned one of the two I was fully prepared to do already: an IM-based chat. Since I know many, many people who are on AIM, and many of them are frequent chatters, I sort of cheated and dropped into an existing (and regularly-occurring) chatroom. It also, conveniently, is peopled by total nerds. Who have questionable senses of humor. As you’ll see. I sanitized the chat logs of their SNs and then checked to see if they wanted anything else cut (which they didn’t), and here are the results (this is just the record of the discussion, my analysis will be undertaken in a later post (now linked)):