g-speak overview 1828121108 from john underkoffler on Vimeo.

If Minority Report has become the benchmark by which gestural interaction is judged, that was always intentional. The film’s production team wanted to work with the people actually developing science fiction-like technology. And it’s sci-fi like technology.

So, let’s not talk about how cool-looking the clip is above – not that it doesn’t look cool. After all, most of what you actually see on the screen is stuff you can do with your desktop computer and some projectors. So the question is, what benefit do you get from really nailing a gestural input? It’s the input that matters.

Even if you engage exclusively your right brain on this, there’s quite a lot that’s impressive – the properties proponents of this kind of interface have been advocating for many years:

  • The interface is 3D. Not to overstate the obvious here, but the ability to intuitively navigate in 3D is no small matter. This sort of interface might not work for detailed 3D modeling, but for quicker, more comfortable 3D navigation, the mouse / mouse wheel has always been woefully inadequate. The mouse is fundamentally designed as a 2D pointing device, which is why it requires awkward conventions like WASD keyboard navigation in 3D games. Joysticks work for spatial navigation (ask your friendly fighter pilot who relies on them in life-or-death situation). But actually moving stuff around in 3D requires something different.
  • Gestures are intuitive. We hear a lot about gestures, but these are actual, human gestures – the kinds of motions you’d make to a person, the kinds you’d use when running a dog around an agility course. (And, believe me, if you can keep up with a border collie, you’ve got a good interface!)
  • It’s collaborative. Here’s an experiment: share your mouse with a friend. How’d that work out for you?
  • It could help navigate information. This to me is actually the least convincing part of the demo – but I think that’s an opportunity. We’ve had a chicken and egg problem: our interface is 2D, so our information is 2D. Sure, there’s the odd exception, like Google Earth – but how much time do you use Google Earth compared to Google Maps? Thought so. Some of the demos here remind me of Apple’s 1990s tag navigation interface for the Web. Others return to the odd, needlessly-3D photo organizing app model that seems to permeate these demos. (And until you can shout “enhance” at your computer like on Star Trek to see some tiny area of an image, I wonder how useful that will be.) I think we have to re-learn how to organize information in three dimensions, having done it in two dimensions for so long.
  • It blurs the lines between computing and performance. The reason we focus so much on live performance on this site is that, at its heart, it’s all about real-time communication. If you can make something work live onstage, or live in a club in front of drunken people, you’ve probably mastered it on some important level.

Myron Krueger’s shadow is cast over more recent gestural interface developments – in a good way. Photo by Dave Pape.

g-speak is really, truly, brilliant work – not just as a video demo, but from what I can see, in the detailed work they’ve done with the gestural interface and the way screens are networked together. To say that it’s “the first major step” in interfacing since 1984 would require ignoring the extensive work done on this sort of interface. Look back to Myron Krueger’s work in the 1970s which predated even today’s UI as we know it, and work looking more like this in the years since. Then again, maybe that’s the point.This isn’t about novelty; on the contrary, it’s trying to work out how to design interfaces connected to metaphors and human physical wiring that pre-date the invention of computers.

Theremins: best played shirtless. (Um, I’m sure it … interferes with the signal. Or something.) Leon Theremin nailed this sort of interface in 1919 for sound. Almost 90 years later, it’s reaching the same level of sophistication in computer interfaces. Photo (CC) Joshua Vernon-Rogers.

Odds are, you can’t afford Oblong’s platform. But that leaves tons of other possibilities this sort of thing could inspire. Musicians already know that moving your hand around in space with no tactile feedback makes precision challenging – the Theremin requires years of practice to master, eludes many would-be players, and limits certain kinds of controls. (Oh yeah … the Theremin also came before 1984. Quite a few years before 1984 … think 1919.)

In other words, we’re now seeing the first realization of the level of sophistication that we knew was coming. But it’s only one implementation. Look out for more gestural interface development in the future. And now that people are nailing the input/output method, the bigger challenge is next: content.

Visualists, unite!

As seen on Engadget; via Burak Arikan’s FriendFeed (and Burak is one of the people doing some interesting work himself!)

  • Andy

    Re: "It could help you navigate information." Jeff Raskin, in his classic _The Humane Interface_ talks about his idea for a "zoomable" interface where all of your information is laid out spatially, and you can move around at a high level to see clusters of things, and then zoom in to get to the specific information you want. It is the next step from the interface he helped create with the Canon Cat that effectively put all of the user's information in a big file and then gave the user tool to search and manipulate the information. I have my doubts about the universal applicability of the "zoomable" idea, but there are specific instances where I think it could work.

  • http://createdigitalmusic.com Peter Kirn

    Exactly. Well, at the risk of pointing out the obvious or being circular, it should help you navigate information that makes senses spatially. Oddly, most of the spatial stuff we've seen in demos has been maps, even though the map data itself is mostly two-dimensional. Funny, right?

  • http://vade.info vade

    Very nice. I can think of a lot of things to do with a rig like that. I think though, after a while of usage, standing and having to hold your arms out like the users do in the demo will get very very tiring. Gesture interfaces have to be somewhat subtle and not too demanding on the user physically. I certainly would not want to do a two or more hour set on a rig like that forced to use those gestures, but thats not the target market (obviously), and its still *impressive as hell*. The few clips with video in there are pretty cool.

  • http://createdigitalmusic.com Peter Kirn

    Good point. Those kinds of issues aren't to be ignored, either — someone pointed out that this can be an issue with touchscreen displays, too…

  • http://www.accentfeed.blogspot.com Miguex

    thats great!
    no fingerprints on the screen, you gotta love that.

    I will probably never be able to afford this, but I can see someone else taking good advantage to doing live visuals.

    Expect VJs on stage as the focal of attention with something like this.

  • http://www.digitalfunfair.co.uk gavspav

    I can't help feeling the video would be more effective if the gloves were white and there was some old school rave track playing!

  • http://www.newmagic.com Virtual Magician

    Good to train mime's… Marcel Marceau would wet his pants.

  • http://www.postlude.co.uk/blog Jamie Bullock

    I wonder if you could fake this with 3 orthogonal webcams, multiblob detection and a hell-of-a-lot of data smoothing?

  • Pingback: g-speak spatial operating environment - Hack a Day

  • pietro

    Hey how does he took the man with the snake from the movie ? is it actually something posssible or just something to make the demo more out of my league and futurist (which is kind of the same since future is not in my league…)

  • Pingback: Querystring » g-speak spatial operating environment

  • Pingback: UI and us » Blog Archive » Parallels of Musical Instruments and GUIs