When Microsoft gobbled up vision technology and announced they were channeling their own research into a product for their game console, artists, researchers, and hackers lamented. It seemed the tech might be destined only for a handful of mainstream game titles. Hours after the product launch, however, and one open source bounty later, it was clear the opposite was happening: Kinect was opening to new possibilities. Some of the world’s leading visual experimenters, many of them regulars in this site’s stories, were quickly pulling in data and reimagining what the device could do. And that’s just in the first days: given its sophistication, the real potential lies ahead.
I pulled together a number of the artist-hackers to get their thoughts:
Theo Watson, OpenFrameworks co-originator, is one of the original hackers and has built Mac support
Memo Akten, OpenFrameworks contributor, is building expressive and artistic applications of the tech
Kyle McDonald, artist and visual researcher, is working with massive clouds of point data, building on his previous work in 3D scanning
Dan Shiffman, Processing guru and NYU faculty, is working on tools to make this more accessible to Processing and Java coders.
Adafruit on the Competition to Hack Kinect
Phil Torrone of Adafruit explains what went on behind the scenes as Adafruit Industries offered a bounty to hack Kinect.
CDM: What was important about this particular project?
Torrone: The results speak for themselves, the creative potential was unlocked.
When did you actually make the decision to commit to this?
The day before the kinect was launched in the usa.
Have you been surprised by anything that’s happened? Was this the pace of progress you anticipated?
We never underestimate the creativity and passion of people who and do love open source.
You of course gave some cash to the EFF and not just the winner … have you had conversations with EFF about how to protect artists working on the project, or the legality of the work?
We did not talk with EFF at all prior to this effort; we did let them know we were sending them $2k after we declared a winner.
With so much going on, what’s the best way for interested parties to keep track of what’s going on?
Likely the Google Group
How can someone best contribute?
There’s a google group, there’s GitHub where we put our data dump and code.
Where should people go to learn more about this stuff?
Shiffman: In terms of doing Kinect with Processing, I think learning the basics of Processing first (duh), with a focus on image processing is probably good:
Also the main libfreenect development is happening on GitHub:
There is a big cleanup coming to the api so things might be in a bit of a state of flux for the next few days but hopefully soon we will have super solid drivers/apis for all platforms.
You’ve probably seen the post on CAN [creative applications], has a good summary of the early demos and the history of how the opensource drivers came about (Hector etc.)
And I saw a tweet that someone had it working with Cinder.
McDonald: For my work in general, see http://kylemcdonald.net/
For my pre-Kinect 3d scanning work, see:
With Kinect, everything I’ve done with 3d scanning for the last two years is starting to take on a new meaning…
The best place for following Kinect stuff is:
1 openkinect google group
2 #openkinect on freenode (super active discussion)
3 the github wiki
Why hack the Kinect in the first place?
McDonald: It’s essential that we develop drivers and libraries for Kinect, because we have to decide what new technology means to us.
Kinect has taken a technology out of academic labs and defense agencies, and put it in our living room. now we need to decide where we want to point the camera.
Shiffman: A cheap (relatively speaking) “3D” camera is killer technology for the interaction design / computational art community. This kind of tech has been around, but it’s either been too hard to find or prohibitively expensive. I think that you will see a ton of creative uses (in digital art, exhibition design, assistive tech, etc.) that you wouldn’t find if it was only used for console gaming.
Watson: It’s a really amazing piece of hardware for a really affordable price. To put it in perspective, I currently have a commercial-depth camera on loan which produces a similar quality depth image and it retails for $7000! That is really way out of reach for most people who might be hobbyists, artists or researchers, but $150 is incredibly cheap for what the technology allows you to do.
Akten: First, check out Kyle’s little poem
For me, it’s very simple. I like to make things that know what you are doing, or understand what you are wanting to do, and act accordingly. There are many ways of implementing these ideas. You can strap accelerometers to your arms and wave them around, and have the accelerometer values drive sound or visuals. You can place various sensors in the environment, you can use camera(s) to track movement etc. Ultimately, you create an environment that ‘knows’ what is happening inside it, and responds as you designed and developed it to. What excites me is not the technology, but how you interpret that environment data, and make decisions as a result of it. How intuitive is the interface? You can randomly wire the environmental parameters (e.g. orientation of arm), to random parameters (e.g in audio and/or visuals), and it will be fun for a while, but I don’t think it will have longevity, it won’t be an *instrument* that you can ultimately learn to play and naturally express yourself with. In order to create an instrument, you first need to establish a language of interaction – which is the fun side of interaction design, but you always have the technical challenge of making sure you can create a system which can understand that language. It’s too common to design an interaction, but not have the technical capabilities to detect or implement it – then you have a system which reports incorrectly, and makes inaccurate assumptions resulting in confusing, non-intuitive interaction. So you need a smarter system, and the more data you have about the environment, the better you can understand it, and the smarter, more informed decisions you can make. You don’t *need* to use all the data all the time, but it is there if you need it.
Kinect is ultimately a depth-sensing camera. To put it simply, it returns a normal RGB image just like a webcam, but for every pixel in the image, it also returns a ‘distance to camera’. This kind of tech has been around for a while, but very expensive (minimum thousands of dollars), and definitely not a consumer device, more for labs, robotics, military etc. That depth information, is a ton of extra data. With that extra data, we are a lot more knowledgable about what is happening in our environment, we can understand it more accurately, thus we can create smarter systems that respond more intuitively.
One point which is often overlooked – which is a very important point – is not only ‘what can you do with the Kinect that you couldn’t before’, but ‘how much simpler is it technically to do something with the Kinect, as opposed to using other consumer devices’. This really is a very important point. A simple example is the recent rough demo I posted of drawing in 3D with your hands.
That is completely possible to do pre-Kinect. You would need two webcams, you would need to setup your lighting quite specifically. You would want control over your background and overall lighting of the space. And then you would need a lot of hairy maths and code. With the kinect, you just plug it in, make sure there isn’t any bright sunlight around, and with a few lines of code you have the information you need. So now that interaction is available for developer / artists of *all* levels, not just hardcore math geeks – and that is very important. Once you have loads of people playing with these kinds of interactions (who pre-Kinect would not have been able to) then we are bound to see loads of really innovative, fresh applications for it. Sure we’ll get a ton of “pinch to zoom and rotate the photo” demos which will get sickening after a few thousand, but people will be developing ideas that you or I would never have thought off, but instantly love – which in turn will spark new ideas in us to go off and play with – which in turn will feed others.
It’s still really early days yet, it’s just been a case of getting the data off the Kinect into the computer, and then seeing what actually is that data, how reliable is it, how is it’s performance, what can we do with it. Once this gets out to the masses, that’s when the fun will start pouring in
What might people do with these tools as artists?
Watson: There is quite a lot that it can be used for. For interactive installations, we are often dealing with trying to track people in a space. Typically this requires careful lighting and IR cameras and it can be quite a tricky issue, but with the Kinect the depth image allows us not only to track people but understand where they are in relation to each our in z-space. This is just one application however, another really nice feature is that it has pixel matched color and depth cameras and this could allow for a ‘greenscreen-less’ live greenscreening. And then of course there is its use as a 3D scanner, for building depth maps, understanding the space around us etc and more possibilities than I probably realise.
Shiffman: All sorts of things I can’t possibly imagine! (Just the fact that having depth makes background removal so easy is killer for my students.)
McDonald: I’ve noticed tendencies to work at very different levels of abstraction.
Some people are most interested in the raw data, the inherent glitches, the aesthetic of 3d scanning.
Others are interested in slightly generalized data, maybe the idea of ‘scenes’ that are being captured and analyzed, reconstructed.
Some people are interested in specific applications — object recognition, pose estimation, gestures. these are the most abstracted.
I expect work to come from all different levels, in every different medium.
Sculptors will record and build unusual models of spaces informed by 3d scanning, spatial mash-ups will be standard fare, 3d printing for 3d slit scanning. motion spaces, negative spaces. paths through space over time.
Sound artists and musicians will use the device to control standard audio parameters, or use the values as input parameters to complex synthesis environments and for controlling spatialized sound with large speaker arrays.
Photographers will work with long exposures in combination with 3d-reactive projection to augment layers of the space over time.
Interaction designers will invent new gestures and modes of interaction specifically targeted at the strengths of the sensor.
Interactive art will experience a minor renaissance as a variety of tasks that were previously very difficult become very simple (e.g., tracking someone against a background that is the same color, or even tracking someone against a moving background)
… etc., etc.
What’s technically possible with the libraries now; what’s coming?
Watson: At the moment, we can get back the depth image and color image from the two cameras, access the motor, LED and the accelerometer of the device. Some developers are now working on accessing the four microphones which allows for location of sounds in 3D space. Also, a big part of the Kinect as it relates to the Xbox is the full body skeletal tracking, which from a researcher or artist’s perspective is very valuable feature. This is implemented in software on the Xbox and is the result of many years work by some of the top people in the field. A big part of the future research will be at the software level developing tools that build of off and extend the functionality of the hardware, like open source implementations of the realtime skeletonization code.
McDonald: The general rundown is that Linux is fastest, OS X is 5-10 fps behind,
and Windows is just starting to work.
ofxKinect was originally developed by Dan Wilcox and Theo Watson, with some minor contributions from me, and is now also being developed by Arturo Castro. It runs well on OS X and Arturo is still adding Linux support.
Right now it’s only possible to get the RGB and depth images, and to get the depth image in centimeters (which is not what the sensor returns by default). Next will be alignment of the RGB and depth images, and of course making it cross platform. Other suggestions are on the OF forum.
Shiffman: Right now the library just returns two pixel arrays (640×480 RGB image and 640×480 image with depth mapped to grayscale). My to-do list is (a) make all the raw data available, (b) optimize for speed, and (c) add any little analysis tricks / features that might be particularly useful. Basically, anything people do with the openkinect project and OF, I’ll try to add as a feature for Java / Processing.
Stay tuned to CDMotion for more… and let us know if you have specific comments or questions, or have seen work that inpires you. Ed.
Fantastic round-up of what’s happened so far from our friend Creative Applications Network:
Kinect – OpenSource
Memo reflects on his blog…
Kinect – why it matters
And on Music, I’ve got more for anyone interested in MIDI or C#/.net:
Kinect with MIDI