Magical, 3D-Warping Techniques Steadies Your Videos

Technology still has the power to appear like magic. And one place we may desperately need magic: straightening out our horribly shaky, handheld video shots. Software makers like Apple have already offered up some techniques for doing this - in the case of Apple’s Final Cut Studio, optical flow analysis attempts to track the image as it shakes around the screen and compensates by adjusting the orientation of the frame. But a research team at the University of Wisconsin, partnering with Adobe, will present a new approach at the legendary graphics-geeky SIGGRAPH conference in August. They go one step further, applying a 3D mesh to the image to warp your image three-dimensionally to make the stabilization even more seamless.

Me writing about it is basically useless. Check out the mind-blowing results in the video. From the description:

In this paper, we describe a technique that transforms a video from a hand-held video camera so that it appears as if it were taken with a directed camera motion. Our method can adjust the video to appear as if it were taken from nearby viewpoints, allowing for 3D camera movements to be simulated. By aiming only for perceptual plausibility, rather than accurate reconstruction, we are able to develop algorithms that can effectively recreate dynamic scenes from a single source video. Our technique first recovers the original 3D camera motion and a sparse set of 3D, static scene points using an off-the-shelf structure-from-motion system. Then, a desired camera path is computed either automatically (e.g., by fitting a linear or quadratic path) or interactively. Finally, our technique performs a least-squares optimization that computes a spatially-varying warp from each input video frame into an output frame. The warp is computed to both follow the sparse displacements suggested by the recovered 3D structure, and avoid deforming the content in the video frame. Our experiments on stabilizing challenging videos of dynamic scenes demonstrate the effectiveness of our technique.

The research, at the University of Wisconsin-Madison:
Content-Preserving Warps for 3D Video Stabilization

You can view all the techie details there, as well as many more demo videos. This is promising stuff, and we’ve seen in recent years a vast acceleration of the time between academic research and shipping commercial products — especially with cheap computational power on home computers to play around with, and increasing challenges for software vendors to differentiate what they’re doing in a mature application space.

Side note: boy, do I want to go to SIGGRAPH this year.

Also along these lines: Spacetime Fusion, tests of Final Cut’s SmootCam feature, more SmoothCam tests

For those of you purists, yes, it’s still worth considering the art of steadicam shots - at least before technology obliterates it for us clueless masses. Previously: B&H Interviews Steadicam Inventor: Shooting is Like Dancing

More on Project Natal: Latency Concerns, Johnny Chung Lee, Freaky Interactions with a Fake Kid

Microsoft’s Project Natal unveiling for Xbox 360 was no question a blockbuster of technology presentations, nothing short of sheer magic in a games industry that has lately looked somewhat backward-looking. The combination of a 3D-capable camera with facial and object recognition and vocal recognition and mic interaction takes already-smart elements and puts them together into something bigger. But demos are just that – it’s the reality of what’s happening in interaction design that’s interesting.

So, some more details on Project Natal:

Latency?

Note that the video in the post yesterday carries a significant disclaimer: it’s essentially a conceptual mockup, not a real demo. In videos we’ve seen of the current prototype, there does seem to be a significant lag between an action and its representation on the screen. This may have to do with the sheer amount of data and analysis that’s being done on it. Unfortunately, as this is only in prototype stage, it’s impossible to do much more than speculate.

I’m not the only one to notice this: Keith Lang, interaction designer at Plasq, sees the same concern in his (excellent) round-up of coverage of Project Natal:

Microsoft Announces ‘Natal’ 3D System [UI&us]

Don’t underestimate how important the latency could be, either. Even tiny differences in latency can have a major impact on how someone feels about an interaction. This is also significant to music people, who generally like their interactions to use tiny latencies and approximate the rate of the audio they’re controlling.

I’ll reserve judgment until the final version, naturally! But it’s something to watch.

Johnny Chung Lee and the 3D Technology

The ingenious creator of various Wii tracking hacks, it seems, is now with Microsoft. (Nintendo, your loss. Rest of the world, he has code tools on his site, so even without hiring the guy, you can benefit from his knowledge.) Cristian Campo spots the news in our comments.

For his part, Johnny is careful to note that he’s not responsible for what you see, but is working with them on productization.

Project Natal [procrastineering]

He can’t reveal anything but what’s public, but he does have some more extensive details on the technique – essentially, information that is public but in a more technically-specific form:

The 3D sensor itself is a pretty incredible piece of equipment providing detailed 3D information about the environment similar to very expensive laser range finding systems but at a tiny fraction of the cost. Depth cameras provide you with a point cloud of the surface of objects that is fairly insensitive to various lighting conditions allowing you to do things that are simply impossible with a normal camera.

But once you have the 3D information, you then have to interpret that cloud of points as "people". This is where the researcher jaws stay dropped. The human tracking algorithms that the teams have developed are well ahead of the state of the art in computer vision research. The sophistication and performance of the algorithms rival or exceed anything that I’ve seen in academic research, never mind a consumer product. At times, working on this project has felt like a miniature “Manhattan project” with developers and researchers from around the world to coming together to make this happen.

We would all love to one day have our own personal holodeck. This is a pretty measurable step in that direction.

Creepy Kids

Seaman, you’ve got nothing on this. (Sorry, Leonard Nimoy.)

Yes, it seems Peter Molyneux’s latest project uses Project Natal to simulate interactions with a kid. This does start to make me wonder if – as “realityengager” wonders in CDM comments – we should just go out into the real world and interact with that. (Daddy? Why won’t you play with me any more? Why are you only playing with Xbox 360 Milo kid?) But as a tech demo, of course, it’s mind-boggling – and it’s nice to think what it might mean for storytelling.

See the video at top. Molyneux suggesting that even science fiction hasn’t written about this sort of technology is especially absurd, as it seems science fiction spends most of its time writing about exactly this, but you get the point.

I just want Project Natal support in XNA so artists can play with this stuff. Hear us, Microsoft?

Gorgeous Timeline-Less Audiovisual Multi-Touch Sequencer, Built in Flash


CASTALIAN / New concept Audio Visual Touch Sequencer from nucode on Vimeo.

We’ve talked in the past about the idea of user interfaces and visual output merging. Instead of a UI on one screen and visuals on another, the idea is that the interface itself melds into the output. I can think of few better examples of how this begins to evolve than a video recently posted on Vimeo by user nucode. Working with a projected, camera-tracked multi-touch interface and audiovisual loops in custom Flash-based software, nucode manipulates samples as though on an alien, futuristic interface.

The result: a sequencer that has no timeline and seamlessly pulls content from online sources:

  • Audiovisual loops, set as rotating circles/bubbles, palettes of sounds and visuals
  • Sequence events together by attaching bubbles to one another – no timeline needed
  • Gesture triggering of YouTube video search (make a gesture, get a video from YouTube)
  • Simple real-time audio (low-pass filter, echo, and so on – sounds like there’s either some live synthesis or more sophisticated scrubbing going on, too)
  • Runs in the browser on any OS, built with Flash and ActionScript 3

read more

Locative Art, Now: Microsoft’s Photosynth Makes Photography into 3D Virtual Reality

In William Gibson’s novel Spook Country released last year, artists create a new generation of “locative art.” Peer through goggles at a real-world scene, and see something that isn’t literally there. Few would say it was Gibson’s best novel – perhaps partly because the plotline didn’t live up to how compelling the locative art ideas were. But the art has already moved from science fiction into reality.

read more

Don’t Call it Minority Report; Call g-speak a Spatial, Gestural Operating Environment


g-speak overview 1828121108 from john underkoffler on Vimeo.

If Minority Report has become the benchmark by which gestural interaction is judged, that was always intentional. The film’s production team wanted to work with the people actually developing science fiction-like technology. And it’s sci-fi like technology.

So, let’s not talk about how cool-looking the clip is above – not that it doesn’t look cool. After all, most of what you actually see on the screen is stuff you can do with your desktop computer and some projectors. So the question is, what benefit do you get from really nailing a gestural input? It’s the input that matters.

Even if you engage exclusively your right brain on this, there’s quite a lot that’s impressive – the properties proponents of this kind of interface have been advocating for many years:

  • The interface is 3D. Not to overstate the obvious here, but the ability to intuitively navigate in 3D is no small matter. This sort of interface might not work for detailed 3D modeling, but for quicker, more comfortable 3D navigation, the mouse / mouse wheel has always been woefully inadequate. The mouse is fundamentally designed as a 2D pointing device, which is why it requires awkward conventions like WASD keyboard navigation in 3D games. Joysticks work for spatial navigation (ask your friendly fighter pilot who relies on them in life-or-death situation). But actually moving stuff around in 3D requires something different.
  • Gestures are intuitive. We hear a lot about gestures, but these are actual, human gestures – the kinds of motions you’d make to a person, the kinds you’d use when running a dog around an agility course. (And, believe me, if you can keep up with a border collie, you’ve got a good interface!)
  • It’s collaborative. Here’s an experiment: share your mouse with a friend. How’d that work out for you?
  • It could help navigate information. This to me is actually the least convincing part of the demo – but I think that’s an opportunity. We’ve had a chicken and egg problem: our interface is 2D, so our information is 2D. Sure, there’s the odd exception, like Google Earth – but how much time do you use Google Earth compared to Google Maps? Thought so. Some of the demos here remind me of Apple’s 1990s tag navigation interface for the Web. Others return to the odd, needlessly-3D photo organizing app model that seems to permeate these demos. (And until you can shout “enhance” at your computer like on Star Trek to see some tiny area of an image, I wonder how useful that will be.) I think we have to re-learn how to organize information in three dimensions, having done it in two dimensions for so long.
  • It blurs the lines between computing and performance. The reason we focus so much on live performance on this site is that, at its heart, it’s all about real-time communication. If you can make something work live onstage, or live in a club in front of drunken people, you’ve probably mastered it on some important level.

read more