Grant Stento, “Sound in Color”

Music is often written to represent the world around us. Many artists have written their music as a mimicry of nature, exemplifying mimesis. Others have tried to formulate a mathematical approach to creating music as a representation of nature, like Xenakis. I have tried to take that approach to its limit, formulating algorithms that convert videos of nature directly into sound.

Processing and SuperCollider are two free-to-use programs. Processing is powerful image processing software that can communicate through OSC to SuperCollider. SuperCollider is an audio synthesizer that can create sound in real-time. Using Processing to analyze either live or recorded video, the information can be sent to SuperCollider and turned into sound in real-time.

Simultaneously, the information gathered about the video for the sound is reintegrated back into the video. In this way, the video is changed visually to reflect the way it is being interpreted for sound. In other words, the parameters measured from the video are used to create sound, and these parameters are then piped back into the video. This visually emphasizes what sounds you are hearing, demonstrating how visual stimuli affect aural stimuli and vice versa.

The parameters gathered from each frame of the video are the average red, green, and blue channels, the average brightness, the horizontal location of maximal brightness (representing the focal point), the average color warmth, and the “complexity” of the image, measured by using edge detection to count the number of objects in the frame. Each parameter is used in a unique way to create sound that corresponds to that frame.

Warmth, calculated using the red and blue channels (green doesn’t have an effect on color warmth), is used to decide whether the music should sound “happy” or “sad”. High warmth (red or yellow images) leads to happier sounds, and cool images (blue or purple) lead to sadder sounds. Of course, this is only on average. A mostly-warm picture will still have “sad” sounds, just less often than “happy”. The “happiness” of a sound is based in the Western notion of major chord/scales being happier than minor.

Average brightness and maximal horizontal brightness affect volume and panning, respectively. The brighter the video, the louder the sound. If most of the brightness is coming from the left side of the video, the sound is panned more to the left than the right. “Complexity” (the number of edges/objects in the video) determines how fast the music plays. The melody gains tempo when the imagery gets more complex.

The sounds generated were synthesized in SuperCollider. There are three channels: bass, harmony, and melody. Melody and bass use a synth I created called “bing”, and the chords are a synth I made called “harmonica”. The sounds aren’t meant to mimic any natural sound; I named them just based on what they sounded most like to me. The bass and melody rely on the underlying harmony, with bass playing root and the melody playing notes from the scale that corresponds to the current chord. Chords change randomly but stay within the same key, until a dom7 or m6 is played, where the key is changed to the chord’s relative IV or V, respectively. The bass and melody then follow, both playing at other random times. The rate of change, as mentioned above, is determined by the complexity of the video.

The video is also manipulated. The difference before and after manipulation is shown between the top and bottom videos displayed. Warm images are made warmer, cool images cooler, bright areas are made brighter and darker areas are made darker. This is meant to reflect the way the sound is interpreting the video, and is also meant to help you better feel the emotion the sounds are trying to reflect.

Back to Music 3240 project page