Saturday, 7 April 2012

2D Fourier Transform on Music

For a long time, I have thought about what to do to visualize music. The need for this comes when posting music recordings onto YouTube, which is a very nice platform for distributing music. Just showing a black screen does not really do justice to the visual medium, and so one has to think on what to put there to the music. Many music pieces have there a still image, or a collection of images which are relevant to the music. This is fine, but does not use fully the possibilities which video is offering. Ideally, one would have a video that is tightly synchronized to the music and complements the acoustic impression with a visual one - as it is done in popular music videos, since the time MTV has pushed this forward ("video killed the radio star" comes to mind). But for classical music there is really not very much in terms of appropriate music visualizations: there are videos of the performers, there are those series of smartly arranged still images, but rarely is there a video which truly complements the audio and the music in terms of visualizing directly the music and its elements.

In the past there has been some work on this, and I am especially fascinated by the work that Oskar Fischinger has done in the 1920-40s. Sometimes a few of his films have been shown on Turner Classic Movies in between regular movies. He really tried to take music and visualize it with abstract shapes which represent the music.

On most mediaplayers, there are automatic visualization tools which show some automatically generated animations, based on the music audio. However, I have never found a satisfying visualizer, because most of the visualizations appear to be some random motions, on top of a bit of wave analysis.

That kept me thinking, and then in December 2011 I had the idea to use the Fourier Transform to detect periodicities in music. The Fourier Transform (FT) is a well known approach for analyzing audio: it is generally used to create spectrograms and show the frequency content in music. FT detects periodicities in signals, and this is usually used to detect the frequencies which are present in an acoustic signal. Music is based on such frequencies, which relate to musical notes. But music has also "slower" periodicities", not related to the high frequency of sound (>20 Hz), but related to the measures and the beat in music. So I decided to apply the FT on music with a larger window: for a sound spectrum, the FT is usually applied for a time window of 100 ms. But for revealing longer-period periodicities in music, I would apply the FT for a time window of several seconds.

To make the computations easier, I decided to not use the sound wave itself, but the abstract notation of music as input. This is a series of discrete note events, which have pitch and volume. In music sequencers, the notes are usually shown on a piano plot roll, which has as horizontal axis the time, and as vertical axis the pitch resp. the note frequency. A FT can be applied on this piano plot roll, using the 2-dimensional approach for FT that is used in image processing (noise reduction, feature detection).

The figure to the left here shows the 2D Fourier Transform of the piano roll plot of Gustav Mahler's "Urlicht" (4th movement of Symphony No.2). In this plot, the horizontal axis shows spatial frequencies of the rhythm and the beat distribution. The vertical axis indicates the distribution of intervals and chords. There are clearly some peaks to be seen, which indicate predominant intervals in this music. Most of the relevant data are around the central axis (x=0) in the FT plot, which is the axis with small rhythm elements, indicating long notes (or pauses). Along this central vertical axis, the musical chord and interval elements can be seen as distinct points or regions.

This 2D FT can be re-transformed back into a piano roll plot, by again applying the Fourier Transform algorithm. I have tried this, and indeed it comes out correctly. However, this back-transformation also needs the phase of the FT (something that is also numerically computed as part of the transformation, using complex numbers). If this phase is not present, then the resulting piano plot roll is overlaid with its mirror image, due to symmetry. It would be quite beneficial to find a way to remove this mirror image, without using the phase of the FT. Then, the inverse transformation could be used to edit the music in its Fourier space. This would provide completely new ways of creating music.

Overall I think that this 2D FT has quite a significance for music: it is invariant to pitch and tempo (at least if the music time is used as reference, instead of real time), and this can lead to unique fingerprinting of music or musical phrases, which can be used for identifying music. Also, it can provide new ways of creating music by editing its Fourier Transform. And last but not least, I have found a new way of visualizing music. At least, the FT algorithm provides a new dataset (2D array) as a basis for an interesting visualization that directly represents the musical content.

I have published this approach recently at the Music, Mind, Invention workshop in Ewing, NJ. The officially published paper for this is here, and the presentation which I gave is here, although that presentation does not have me speaking, so it may be a bit hard to understand the context and background.

I plan to develop this further, make a few videos with this approach as visual accompaniment of my music, and I will work towards releasing the software as a toolkit so that it can be used by others. Lots of work to do then.