A realtime spectrogram in under 100 lines of code

Or, how I built Simple Spectrogram

Motivation

Lately, I’ve learned a lot about sound and music theory. A question that I often asked myself was, how do I know what I’m hearing?. In a related vein I’d ponder things like how could I replicate the sound of a piano, or how precise birds were when making calls. Listening is one way to understand, but I knew I could visualize the sounds as well, using one of my favorite kinds of pictures, a spectrogram. Put simply, a spectrogram is a graph of a sound, where the axes are time and frequency.

I set out to find an app that would generate a live spectrogram for me. The Google Music Lab Spectrogram is of course beautiful, but it doesn’t work on my phone. I also tried some apps from the iOS app store: some charged individually for basic features such as pinch-to-zoom, while others shared my data. I didn’t try any paid options though - I felt like in this case “do some math on the sounds around me” was something I should expect my phone to do easily for free and privately. After learning that most browsers support fast fourier transforms, which would allow me to easily create this without even having to implement the algorithm, I decided to make my own. The barebones version comes in comfortably under 100 lines of HTML, CSS, Javascript.

Screenshots of the spectrogram in use

An annotated spectrogram, captured in the coffee shop where I am writing

A spectrogram I annotated while sitting in a coffee shop writing the first draft of this piece.

A french horn sustaining notes

A french horn sustaining notes. The gentle warbling of a french horn is attributed to all of its frequencies interacting in a way that is pleasing to the human ear; it harmonizes with itself. This horn player told me it made him a better musician, because he was able to see more overtones and know when he was playing a note properly.

The hidden image in Aphex Twin's song

A hidden image in an Aphex Twin’s song. We can encode all kinds of things in sound, though it won’t necessarily sound pretty

My housemate screaming as long as they can

My housemate screaming as long as they can. Notice how they lose steam at the end as they run out of breath: both the frequency and intensity drop over time.

Overview

This approach uses:

Create an AudioContext in response to user interaction

In order to process sound in a web browser, we need to use instantiate an AudioContext object. However, in some modern browsers, a Web Audio AudioContext can only be created in response to a user interaction, to prevent abusive behavior like auto-playing advertisements when the page loads. Since this app in particular is quite simple, I use a button to request user permission, and instantiate the AudioContext in response to a button press. I also chose to request microphone permission in the same function. Since we need both for our spectrogram to function, there’s no special reason to handle it separately:

<body>
    <button
        onclick="spectrogram()"
    >Tap to start</button>
    <canvas 
        id="spectrogram"
        height="1024"
        width="1024"
    ></canvas>
</body>

Creating AudioContext in response to a user action. Requesting microphone access:

function spectrogram(event) {
    var audioCtx = new(window.AudioContext || window.webkitAudioContext)();
    var analyser = audioCtx.createAnalyser();
    requestMicrophone();
}

Limitations

In the same way that 2 points specify a line, 2 samples can uniquely determine the frequency of a sinusoid. The maximum detectable frequency is a direct consequence of the sample rate. This is also called the Nyquist Frequency. AudioContext.sampleRate returns 44100 samples per second for my browser. This could vary between devices, but in this case the Nyquist frequency is 22050, or half the sample rate. This is beyond the limit of nearly every human’s hearing, so the spectrogram will cover the entire range of audible sound.

The canvas element we use has only 1024 logical pixels in the vertical direction. So each bucket has a range of 22050 / 1024 or 21.533203125 hertz. This isn’t super precise - if someone sang 21 hertz off a note, you’d probably notice. But it’s plenty to paint a mosaic of the sound to understand how its constructed.

Drawing on a canvas

We can start drawing. My approach here is very direct, but seems to perform well:

  1. Plot frequencies in the rightmost column of the canvas
  2. Translate the entire canvas left by 1 column
  3. Loop
function draw() {
    // draw 1 time slice per column
    analyser.getByteFrequencyData(dataArray);
    for (var y = 0; y < bufferLength; y++) {
        intensity = dataArray[y];
        canvasCtx.fillStyle = `rgb(${intensity},${intensity},${intensity})`
        canvasCtx.fillRect(canvas.width - 1, canvas.height - y, 1, 1)
    }
    // shift canvas contents left by 1 pixel
    canvasCtx.drawImage(canvasCtx.canvas, -1, 0);
    requestAnimationFrame(draw);
}

One question I received was how the performance of using drawImage this way compares to redrawing the entire canvas for each frame. I like this approach because it minimizes memory usage – only 1 array of size height is needed, which we paint to canvas and then reuse, as opposed to having an array for every column of our canvas; it’s O(height) rather than O(height * width). Redrawing a 2-dimensional matrix 60 times per second is much costlier.

Adding CSS

Our canvas has a logical size of 1024 x 1024. That the height is the same as frequencyBinCount is no accident, and allows us to fill the canvas from top to bottom, regardless of how we squish or stretch the element to display it on screen. Let’s pretty it up by:

<style>
    body, button {
        margin: 0;
        background: black;
        width: 100%;
        height: 100%;
        color: white;
        font-size: 3em;
    }
    canvas#spectrogram {
        width: 100%;
        height: 100%;
        display: none;
    }
</style>

Closing

Thanks for reading this far - making a fun interactive tool was a treat. And while this spectrogram isn’t highly polished, it is open source; comments and contributions are welcome.