Ed. Our friend Momo the Monster (aka Surya Buchwald) joins us for a guest column with a proposal: what if messages sent between music and visual software could be expressive? His idea is simple, but powerful: it’s musical semantics for live visual messages, as basic as knowing when there’s a bass drum hit. Momo introduces the concepts here; more audiovisuals coming shortly, so feel free to hit him up with some questions. -PK

Last October, I was approached by the management of American Werewolf about creating some custom visuals for their show. This was the first time I’d been contacted to work directly with an artist for an upcoming show with a good lead-time and some budget for creating something from scratch. That’s when I started developing a system to abstract their songs into more uniform data so I could develop interchangeable synced visuals.

The video you see above shows the granularity of the system. For this little demo track, we can read the incoming kick drum, snare, hi-hat, bassline, and a special parameter I call intensity. First, we see a very straightforward example, where each instrument triggers an animation of its own description. Then I change scenes to something I designed for an American Werewolf song, and we can see the modularity of the system in action.

I’d like to explain this system as it exists so far, and get some feedback. I’ll break it down into three parts: the Standard, which is a set of terms I use to abstract the music, the Protocol, which is how I mapped the language into OSC, and the Implementation, which are the audio + video programs I used to make things happen. It’s important to view these separately, because something like the implementation could be switched to another system without affecting the Standard or Protocol.

The Standard

The idea here is to go a step beyond MIDI, and make use of OSC’s advantages. American Werewolf are an Electro House act, and as such there are many things that will be the same for each song. There will likely be a prominent Kick and Snare drum, a Bass line, some Vocals, and perhaps a Lead.

With this in mind, I started breaking it down thus:

RhythmKick, Snare, Hat, Crash (or other large accent)
General Instruments: Bass, Lead, Vox
Specialty: Lyrics, Intensity, Speed, Scene

Thinking this way enabled me to design visuals that reacted to a subset of musical data that was present in most songs. I could always create custom bits beyond that, but this basic framework lets me try visuals with different songs.

The Protocol

OSC seemed a natural fit. I wanted to keep the mapping simple, so I wound up with:

/instrument float

A kick drum played at half velocity would be /kick 0.5, for example. The float argument is always a floating-point number between 0 and 1, and the instrument simply uses its natural name. In the case where there is more than one of an instrument, like two different lead instruments (perhaps a synth and a guitar), the first is /lead/1/, the second is /lead/2/, and messages to /lead will hit both channels.

Lyrics are especially useful this way. You pass along:

/lyrics “We Want Your Soul”

And your VJ Soft can read the actual words and integrate them into a dynamic visual.

I also wound up using the abstraction ‘Intensity’ when I wanted to modify a variety of visuals at once. This is a human-readable aspect of a song -while you could possibly design a method to compare volume peaks, beat changes, frequency ramps, etc., it’s easiest just to spit out some data that reads ‘the song just jumped from an intensity of 0.5 to 0.7, and now it’s ramping up to 0.9). Then all of the visuals in your scene react as appropriate – getting bigger, more vibrant, shaking more, morphing, etc.

The ‘scene’ command is very handy for narrative visuals. Some songs I designed have a large variety of scenes which all react to this same musical data in different ways, and its important that they trigger whenever the music reaches a certain point. This command could be used as a generic, however, by using existing song conventions:

  • /scene intro
  • /scene verse
  • /scene chorus
  • /scene verse
  • /scene chorus
  • /scene bridge
  • /scene outro

The Implementation

So theory is great, but it’s just theory. Here is how this all came together for me:

Ableton Live MIDI -> Max4Live Plugins (convert MIDI to OSC) -> Storyteller custom AIR Application.

I developed a number of custom Max4Live plugins that would take MIDI information from within Live and turn it into logical music data:

  • NoteBangs: Turn a MIDI note into an OSC Message, converting velocity to floating point argument
  • NoteRange: Turns a range of notes into an OSC Message, converting the position in the range into the first argument, and the velocity into the second
  • CC: Turns a MIDI CC into an OSC Message, converting the ‘amount’ into a floating point argument
  • Lyrics: Takes a text file and a MIDI note range, triggers lines from the text file as OSC Messages based on the incoming MIDI note, with a velocity of ’0′ triggering a /lyrics clear message
  • OSCOut: Takes all the messages coming from the various plugins and sends it out to an OSC receiver (in this case, it is  set to localhost)

These plugins are Max MIDI Effects, so they take MIDI on their channel and convert it in realtime, passing the data along in the chain. In the case of recorded MIDI data playing from live, the logic is straightforward. There are many cases, however, when MIDI data needs to be turned into baked audio clips for performance purposes (too many stacked effects), in which case the MIDI clip can be Grouped with the audio clip to launch simultaneously.

A big catch here, of course, is that Max4Live currently requires a substantial investment beyond just the purchase of Live. For the performance with American Werewolf, I had them just send me the MIDI notes, and I ran Live with the custom plugins on my end. These plugins could easily be created as standalone max patches, though one of the huge conveniences of the method as I have it above is that all the mapping is saved within the Live Project.

For the visuals, I wound up writing a basic VJ playback application in Adobe AIR, which can read data from UDP Sockets as of Air 2.0. The visuals are all written in actionscript, using Flash libraries for the graphics, and the application reads an XML file which maps the music data (/kick, /snare) to methods in the Flash files (triggerLights(), blowUpAndSizzle()). I’ve got a modification to this system in the works which will allow the Flash files to play back in any supported software (VDMX, Modul8, Resolume, GrandVJ, etc.) while still receiving method calls from the Master app. Realistically, that project is probably about 6 months away from launch.

Looking Ahead

I want to stress that the implementation above is just that – one implementation. You could run the music from Ableton Live and have it control visuals in OpenFrameworks, Processing, Quartz Composer, what have you. You could run the music from Max/MSP or Logic or Cubase if you could figure out how to do the MIDI to OSC translation. I’d love to get feedback on this system as I’m developing it because I think it could be useful for others, and the more of us that use a common language, the easier collaboration will be in the long run.

Imagine creating visuals that react to a Kick Drum or Bassline and knowing that they’ll sync effortlessly to the variety of acts you’ll be playing with over the next year. It could happen.

  • http://www.listn.to/DJAutom8 Jonathan

    This looks amazing! I've VJ'ed for a while  alongside my DJing, and I always love anything that is beatsynced, or intensity synced. This is exactly what I would like! I love modular systems, and this looks like it would be extremely powerful! Side note: Until your VJ application is released, could you use the same OSC commands for Resolume, except apply it directly to zooms, fades, different scenes, etc? Second side note: I would be extremely delighted to beta test your app, and/or help with converting it to support Resolume! Thanks!

  • http://mmmlabs.com Momo The Monster

    Thanks, Jonathan. You could absolutely use the same OSC commands with Resolume as-is. That's the idea – the same audio data should easily be able to drive any VJ software.

    If there is interest, I would be happy to run a small beta.

  • http://www.blairneal.com Blair Neal

    Very cool stuff..it is nice to get down to that level of complexity.

    Do you worry at all about being…too beat synced or too tied into the data and not the feeling of the song? I tried a project a few years ago where I tried running an automated system for music analysis..but for me it became all about the mapping and didn't offer much spontaneity.

    Obviously this is meant to be a standalone system and someone's not there performing along with it…its way better than sending a band off with completely pre-recorded visuals…but I always have a battle within myself between doing too much mapping and leaving things more open ended. I tend to abide by the "beware the sync" slogan because syncing lets me turn my brain off too frequently… I'm wondering what your thoughts were on that in regards to this project.

  • http://mmmlabs.com Momo The Monster

    Blair Neal – excellent points! I think there is always a balance to be made. A system like this could map just a few basic elements, and then a human could apply their touch on top, matching the more ethereal elements and emotions in the song.

  • http://bearmod.tumblr.com/ Bearmod

    Momo – It's cool to see this all laid out and see your system.  I run a similar setup when doing visuals for electronic musicians as well as for my live electronic A/V PA performances.  Standardization that takes advantage of the "language" based addressing in OSC  is a really smart idea to put out there.  Despite the reality that, as a live visualist, once you've set up a set with these elementary mappings, as far taking a fundamental set of midi from musicians, it's a really simple task for every band to give you a MIDI line after that. I think a lot of artists are unnecessarily still intimidated by the prospect of live video and OSC today and this kind of standardization could potentially help what can all too often can be an uninteresting combination of visuals with music.  

    Blair Neal – Your point about constant synchronicity between visuals and music throughout a set is important.  Especially when mapping to the same elemental elements of each song. I avoid this by appending kill switches to each of the midi notes I take in from a another musician or from my own drum machines and synths.  I call them 'sync mutes' and I find it makes for a much more dynamic way to perform.

  • http://www.lawriecape.co.uk Lawrie Cape

    Absolutely fascinating. I tried something similar, to try and  do some live visuals for an electro artists – but we never got past the proof of concept phase. This is far better thought out and implemented.

    I'm really looking forwards to seeing where this goes – and would love to see a beta!

  • http://www.kasperkamperman.com/projects/extract/ Kasper Kamperman

    Great piece! Five years ago I did a graduation project (the Extract-Project) that was about synced lights with music. We translated audio tracks to midi tracks and synced that to a lighttable (by artnet) - 

    (video:&nbsp ;http://vimeo.com/1909014).

    In the end a general standard (as you describe) really missed out. So for every song we had to re-invent the mapping. Quite a work. 

    Thanks to OSC you can do more descriptive thinks than with channel and controller numbers. 

    You talk about midi baked in audioclips. Is there already a format for that? Back then we used an Audiobox (for multichannel sound) and smpte to synchronise a midi sequencer. 

    It would be nicer to have kind of an editor to add OSC information to files directly (especially the human 'readable' things like intensity and structure). Maybe a subtitle track could serve for that. 

    Interesting read. I wish I had more time to get into this subject again.

  • prevolt

    This kind of tight A/V sync is the best!

    I've been interested to find ways in the future that each member of a music collective could generate visual information with their audio performances, and this is a great step in that direction.

    Squarepusher's "Planet Gear" from a couple of years ago comes at the problem from a video synthesis angle and is just reacting to frequency bands (and not sending control messages), but I thought of it as I watched this one.

    http://squarepusher.net/planetgear/

  • http://deeje.com deeje

    I'm working on an iPad app called tappr.tv that allows users to create interactive visuals and am building up to a version that allows for syncing visuals to songs from the iPod library.  I've also been laying the foundation to support network/OSC input and/or output, and your proposal sounds very compelling…  I'm in SF, how can we collaborate to explore this more?

  • http://www.namethemachine.com matt davis

    possibilities for cross-medium collaboration are endless now. exciting times.

  • http://mmmlabs.com Momo The Monster

    Kasper: just watched the video of the Extract Project – love it! Would you care to share the system (or one of the systems) that you used to break down the music?

    Kasper – I dig the idea of an editor for this info. My current focus is on songs created from scratch, so the Metadata is created within the composition app, but a separate way to do it could benefit already-created songs. Something to keep an eye on.

    deeje – sounds cool. Feel free to contact me – username surya with the domain mmmlabs and then dot com.

  • Steve Elbows

    Great stuff. I've been obsessed with this sort of stuff for a decade, and have been repeatedly flabbergasted that there is not really a lot of stuff out there where visuals are more tightly coordinated with the music.

    So bravo for laying down some nice straightforward foundations for this sort of thing. When I've tried myself in the past I've been too easily put off by a variety of issues, the latest being the realisation that midi may not convey what is actually being heard, eg if the softinstrument being played has an evolving sound, or audio fx plugins are used to glitch the beat and other elements.

    Given the popularity of grid type controllers for music, visually representing grid patterns is another related area Im fascinated by.

  • Steve Elbows

    Any plans to share the max4live patches?

  • http://www.alphabomb.com Brendan

    Love what you're doing! Ableton (or any host application that can share it's data, midi/osc or otherwise) obviously makes it easy to get that tight A/V sync that otherwise eludes us. The holy grail (as I see it) is making this work with all audio on the fly. Frequency range and amplitude mapped to visual aspects (which is what I've been up to for a few years) is so-so in a live environment (it's why I've gravitated to Bass music), but muddy when specific sounds span a large range of frequencies. The next big leap forward is going to come from the development of SELECTIVE hearing for computers, and it's adaptation into usable control messages for visualists.

    If anyone can comment to the state of selective hearing (the ability to pull out individual sounds that span frequencies that overlap with other sounds) I would LOVE to hear more

    .

    Any Chuck Audio http://chuck.cs.princeton.edu/ enthusiasts (or others) able to comment?

  • http://www.kasperkamperman.com/ Kasper Kamperman

    The music for Extract Project was also especialy created by that project. However the musicians worked with different kinds of software (Acid, Reason) and we didn't want to disturb that creative proces to demand midi data (and so restrict them to Cubase, Logic etc.).

    However we had all the exports of their individual tracks (drums, synts etc), so that makes audio analyses a lot easier.

    A programmer wrote some software in Matlab (mathematics analyses program) and a program to transform that to midi.

    Later I've used KT Drumtrigger VST plugin (http://koen.smartelectronix.com/) together with Bidule (http://www.plogue.com/?page_id=56) for drums.

    For volumes of synths I think I've used a plugin as well, but I forget the name (however there are plenty plugins available).

    Analyses of chords I tried with Widisoft (http://www.widisoft.com/) but that wasn't a big succes (however maybe the software has improved the last 4 years).

    Kasper

  • http://mmmlabs.com Momo The Monster

    Brendan – that's currently beyond my abilities. I believe I have found a middle ground, however. Listening to music live, I can use this system to quickly map patterns for playback. For example, I can listen to a bassline and approximate the intervals on a MIDI keyboard, which gets run through Ableton Live and converted into logical OSC in realtime. If it's a theme in the song, I can quickly record that into an empty slot in my Bassline channel for easy recall later. Doing the same for the Lead and Rhythm parts, one by one, I can build up song pieces within a few bars.

  • andreas

    great work! 

    it's funny that now, doing realtime audio analysis is trivial. Getting Live to transmit OSC to an external application? Also trivial.

    For a skilled max patcher these devices can be built relatively easily, once the protocol is settled upon.

    The non-trivial aspect is doing the specification for it. And in this case the work done is very useful. I love the simplicity of it; something that I think illudes many who work in this field.

    I will now blatantly steal the "lyrics, intensity, etc" bit for my own use. 

  • http://www.timreha.com Tim Reha

    Very cool!

    Super work Surya! Love the breakdown into logical bits. Visit us in Seattle soon!

  • http://www.alivemachine.com racer

    That's interesting that you used adobe air and flash libraries with udp functions.  Is there any potential for real 3-D using this?

    On the concept of a future universal mapping paradigm, doesn't it already exist as 0 to 1? With the creative part left open to what the programmer does with all those decimal numbers in between?  

    It looks to me like you already have developed something that would work the same with future projects.  A kick drum from American Werewolf is pretty much the same thing as a kick drum on a Spice Girls track.  Waveform or midi. In your last sentence, are talking about the technological issue of synchronization or more like a universal visual language of synchronization.  That's something I've been interested in for a while.

    An alternative for live sync data is using incoming audio on seperate tracks as input and mapping those.  These won't give data for chords (maybe you could if you've got some sort of math phd degree?), but pretty much everything else can be mapped to with some work. 

  • http://tagmagic.wordpress.com Jaime Munarriz

    I've built a similar setup running Live and PureData. As Max4Live didn't exist then, I was sending internal Midi messages through MidiYoke. Each channel served for a special purpose, and then each note number. I even managed to Open/Close patches by this method.

    Sadly, Max4Live + Live is expensive, and Live alone lacks OSC…

    So I stay with Midi messages till I build a Live alternative on Pd. And that's hard work.

  • http://mmmlabs.com Momo The Monster

    andreas – please do! And it's the opposite of stealing as far as I'm concerned. If we ever get together to jam, I could control your visuals muahahaha.

    tim – I'll be up in April, we're going to do an NWAV workshop and showcase at Fred WIldlife Refuge.

    racer – sure, you could do real 3d. With Molehill in beta, 3d is becoming more practical in Flash. Alternatively, you could use Unity, which I believe has methods for OSC.

    I think we're agreed on the 0-1 concept. It makes conversion to/from other standards pretty simple, too.

    I love your suggestion for live audio detection via separate tracks. It does make me wonder if there should be parent node in the address that describes the instrument in generic terms, like:

    /rhythm/kick/1 0.24

    /melody/lead/2 0.23

    Jaime, you bring up an excellent point. I'm aiming to get a beta of some of the Max4Live patches up for download this week. I'm hoping some enterprising geeks will create compatible versions that do not require M4L.

  • http://mmmlabs.com Momo The Monster

    I've started a Google Doc, attempting to describe the info we have so far, and analyze suggestions. Take a look and let me know if you would like write access to contribute, explain, organize.

    https://docs.google.com/document/pub?id=1jrRKBWVi

    Once we make some progress, I'll write up another article to spread the word.

  • Steve Elbows

    Unity does not have built-in OSC support but there are a few different scripts around which enable OSC and work well. When you make available the Max4Live patches I will knock up a few Unity examples.

  • Meierhans

    I really like the idea of creating a standard. We should not close the doors for new ideas to early, but really, thats a great idea. If you are willing to share your MAX patches I would for shure give it a try.

  • http://www.alivemachine.com racer

    I was thinking along the same lines as Meierhans.  Could you post your current m4l patch and actionscript code on the google doc?  I am interested to see an example how to get osc messages from ableton into flash.

  • http://www.realtysouth.info/ Realty South

    I admit, I have not been on this webpage in a long time? however it was another joy to see It is such an important topic and ignored by so many, even professionals. I thank you to help making people more aware of possible issues.Great stuff as usual.

  • http://tappr.tv deeje

    so, where are you with all this? I'd love to hear an update, chat about potentials…

  • http://www.chitownbootcamps.com/ Chicago Boot Camp

    Excellent site, keep up the good work my colleagues would love this..