Jared Wong, “Sounds Across America: A Collection of Stochastic Soundscapes”

Background and Creation

This project was inspired by several artists.  Iannis Xenakis’s use of statistical process in generating music and sound, and my experiences in Computing in the Arts with Graeme Bailey (CS 1610) led to my exploration of stochastic processes for generating natural music/soundscapes.  Anna Lindemann’s use of mappings of natural and algorithmic processes to musical sound inspired me to further explore the patterns in natural sounds and recordings that a computer might be able to discover, patterns which might not be audible to humans.  The ultimate goal of this project is to use statistical processes, biased randomness, and existing sound analysis libraries and tools to create new soundscapes which can be “placed” – soundscapes which bring the listener into a nonexistent world where these patterns are reflected in some way.

Software and Methodology

To perform this analysis, I developed a program called Stochastic Soundscape.  This program performs analysis on sound files in order to gather statistical information and ultimately generate these randomly-created soundscapes from which my collection was curated.  Stochastic Soundscape is a program that consists of what are effectively a set of tools; simply downloading and running this program on random sounds will not likely create soundscapes with any sort of interesting features or meanings.  The program consists of a utility which generates musique concrète sound samples from longer input audio files (useful for creating a large library of samples), a utility which generates structural data based on input files, a utility which analyzes and categorizes sample data, and a generator which puts it all together.  There are some scripts within this codebase which perform multiple steps together, but in practice it is often more useful to perform individual steps separately in order to fine-tune the sounds of the output.

More details of the generative process can be found on the project homepage, located here.  The settings file details many different ways in which data can be analyzed and soundscapes can be generated – for example, the code includes the ability to split all sounds into low, medium, and high frequency bands and perform the analysis/generation process separately across these bands; this is employed in my final curated selections.  

In summary, the program takes two types of data: structural and sample.  Structural data is analyzed by first looking for pulses (bumps in loudness which might indicate individual sounds) of some sort.  Each file is then iterated over clip by clip between pulses, with transition data being recorded based on the type of clip that succeeds a previous one.  Samples are analyzed in the same way, and then reassembled probabilistically using this data (with some randomness intentionally introduced to keep the sounds from being deterministic).

Curation

To generate this collection, I used a large set of natural and artificial data, and adjusted settings until I was able to generate samples of audio that I felt had some intrinsic meaning or reflected some concept of nature.  The following are curated selections of output from this program, selected based on features explained below.  Every output was based on a run of the program – the selections are merely curated portions of generated outputs.  

*Creative note: Many of the ambient source files were NPR shows featuring ambient recordings which also contained human voices.  The samples were selected based on length and category, and I did realize voices were so present until I began generating soundscapes from the dataset.  Although I had initially wanted to remove the sounds with voices, I found that the voices added an additional layer of liveness and energy to the generated soundscapes.

The Collections

  1. Nature Across America –– The soundscapes contained in this collection are generated from a library including sound samples from 37 different locations across the United States (and some from Canada).  Both structural data and sample material was drawn from the same library of sound.
  2. Pop by Nature –– The soundscapes in this collection are generated using pop music for structural material, and natural sounds for samples.  The natural sounds were taken from the same set of 37 sounds across America.
  3. Nature by Pop* –– The soundscapes in this collection are generated using natural soundscapes for structural material, and pop music sounds for samples.

*Pop music is very strongly compressed – this results in unavoidable clipping from the current implementation of the Stochastic Soundscape software.  For this reason, I am only including a couple of interesting (albeit rather noisy) clips.

Stochastic Soundscape: Technical Notes and Reflections

Originally, this project was conceived as very open-ended.  The initial plan was to design a robust system in Python which could take sounds as input, and create something new. Over the course of development, the project in its current form took shape.  Throughout the process, I considered or explored various extensions – some of these had to be shelved due to technical limitations, and some of which I simply did not have time to implement due to technical issues in other parts of the project.

One of the shelved ideas was an endless soundscape generator.  This exists in a very crude form in the file “endless.py”, but development on this ceased prior to the implementation of band analysis.  The concept was simple – using the same process by which soundscapes are generated, I wanted to create an endless soundscape which could simply be turned on and run – this would simply play the selected samples instead of combining them into a file.  The technical issue with this is that the time it takes to select a sample is long, and the sounds themselves are short.  It takes roughly 10-15 minutes to generate a 2-3 minute soundscape, so following the same process simply is not feasible.  At the time when I was developing this, I had not yet introduced the level of complexity that currently exists in selecting sounds, so this was a more reasonable goal (even then, however, I experienced issues with latency which created a less-than-stellar experience).

Another shelved idea was the use of Supercollider to generate sounds based on desired characteristics.  This idea was shelved partly due to time constraints, and partly because once I had a large enough sound library, I simply did not need to generate sounds as I had enough sounds to work with.  Technical issues with my existing implementation which were discovered when I finally received the large sound set required me to spend more time improving the implementation of the existing library in order to make it efficient on large datasets, so I did not have time to experiment with this.

Much time was spent on making data processing efficient.  Although the implementation was strong enough for small sound sets, it would have taken days to process the nearly 20 GB of sounds I obtained from the Macaulay Library for use with this project.  In order to make this manageable, I was forced to only use roughly the first 6-7 minutes of all of the samples.  Given the number of samples I had, this still created enough of a variety of sounds that I was satisfied artistically.  With the original implementation, however, this would still have taken an excessively long time to process – by default, Python uses only one processor, so it would be processing all data in sequence.  To make this more reasonable, I spent a significant amount of time rewriting the codebase to enable multicore processing – by utilizing all of the processing power on my laptop, I was able to cut down sample clipping, soundscape analysis, and sample library generation to a reasonable time – with the sound set that I have currently and the current implementation, it takes approximately 40 minutes to go from a brand new sound set with no samples or analysis to being able to create soundscapes.  

The last shelved idea was the use of a GUI.  I spent a bit of time experimenting with a user interface, but decided that this was not entirely necessary.  In order to keep the project simple and straightforward, I defined file directories that would be used for each of the files – the project can be run by simply downloading the code, installing the requirements, placing files in the desired folders, and running the various files in the src/ folder according to the directions specified in the README.  In lieu of a GUI for configuring settings, I moved all user-relevant settings into a file called settings.py.  This file is located in the src/ folder along with everything else; when running the project, simply adjusting settings in this file prior to running will configure it as desired.  The settings file is very thoroughly documented within the code itself, to ensure that the meaning of all settings is clear.

Finally, a note on the level of polish of the code – the code is not without its bugs.  There are two bugs that I am aware of which are relevant to know about – one of these is a result of a library I am using, and one of them is (very likely) a side effect of the design of the project, and one I am not sure how to overcome.  The first bug is one which sometimes causes soundscape generation to never complete.  When generating soundscapes where the sample library contains very short sounds, samples may sometimes be selected with lengths shorter than that of the crossfade setting (these cannot be added, so the program bypasses them).  This results in an endless loop, where the program will keep trying to select sounds with similar characteristics that cannot be applied.  I have considered a few ways of fixing this bug; at the time of submission of this project, the bug is still present in the code.  To create the final soundscapes for this project, I simply ran the program many times and terminated the ones that got stuck; it will not always stick, as this event is due to its random selections of sounds.  This is affected by crossfade length in the settings as well – setting a very short crossfade length means this bug will occur less, and setting a crossfade length which is too long means that it will occur too often to generate an output.

The second bug (less critical, perhaps, than the first) is one which causes the very beginning of every generated soundscape to be very loud.  I spent some time attempting to resolve this and came to the conclusion that it is likely a result of the first sound not being crossfaded into, and therefore it simply ends up much louder because the shortness of the samples causes crossfades to lower the amplitude of most of the generated audio.

The last improvement I made to the software base (in addition to cleaning up and multithreading it) was the addition of some intentional randomness into the sample selection method.  A setting now exists for “SAMPLE_SELECTION_SIZE” – this sets the size of how many “close” sample classifiers will be used when selecting samples from the sample library.  The transition matrix generated from the structural material is based on classifiers – this number specifies the number of “close” classifiers from which the sample is randomly selected.  For example, a SAMPLE_SELECTION_SIZE value of 10 means that for a classifier of X, the program will create an array of all samples matching the 10 classifiers most “similar” to X (where similarity is defined by some form of distance function) and randomly select a sample from this array.  This feature was introduced in order to manually add some level of randomness to the sounds which are being chosen – prior to implementing this, the soundscapes would often end up being repetitive and seemingly deterministic.

One final note: I made the artistic decision to have generated soundscapes follow the transition information from the sequence generated by one Markov chain – what this means is that the Markov chain does not mutate based on selected samples.  For example, if classifier A was selected, and a sample of classifier B was chosen due to the randomness introduced by the SAMPLE_SELECTION_SIZE, the next transition is determined based on probabilities for classifier A from the structural material and not classifier B (or the next closest classifier if B does not occur in the structural material). This decision was made because I wanted to keep the outputs somehow tied to the structural material’s data – I suspected that if I allowed it to mutate based on this introduced randomness, the resulting outputs may simply become too random to generate anything meaningful anymore.  As it stands, the outputs already sound quite random, but a close listen does reveal certain interesting textures and features.

Back to Music 3240 project page