Audio Features Extraction With JavaScript and Essentia

Here's how you can use Essentia.js to find out how energetic your favorite songs are!

Have you ever wondered how Spotify recommends music to its users?

In October, I wrote a winning guide for the Analytics Vidhya Blogathon 25. In that article, I shared how to build a replica Spotify backend, including a recommendation system.

But today we will focus on analyzing different aspects of music with NodeJS. And, similar to my Blogathon article, we'll use a tool named Essentia.

What is Essentia?

Well, Essentia is a library of tools for retrieving information from audio. It contains a corpus of analysis algorithms that can perform tasks such as tempo and key extraction, loudness detection and much more. Their free and open-source library also has a set of Tensorflow deep learning models for more advanced analysis, like genre and mood recognition.

Essentia is from the Music Technology Group, also the same team behind freesound.org.

Their original library is in C++, but they have bindings for both Python and JavaScript.


To see which features we could use for song analysis, we can look at some of the audio features available from Spotify's API.

Here are the ones I picked: danceability, duration, energy, key, mode, loudness, tempo

Now let's get to the analysis!

Installing Essentia.js

In a terminal within your Node project, run the following command.

npm i essentia.js

Decoding Audio Files

All audio passed into an Essentia algorithm needs decoding into an array and then a vector.

npm i audio-decode

This library can decode both MP3 and WAV files and return the data for each audio channel. The decoded audio then needs conversion to a vector (a C++-style vector).

We can now use the following to import the libraries and set up Essentia:

const { Essentia, EssentiaWASM } = require("essentia.js");
const fs = require("fs");
const decode = require("audio-decode");
const essentia = new Essentia(EssentiaWASM);

And here is the function that will decode the audio:

const decodeAudio = async (filepath: string) => {
    const buffer = fs.readFileSync(filepath);
    const audio = await decode(buffer);
    const audioVector = essentia.arrayToVector(audio._channelData[0]);
    return audioVector;
};

To test Essentia's algorithm implementations, we will use this audio file from Pixabay. Place it inside your project folder as "audio.mp3".

(async () => {
    const path = "./audio.mp3";
    const data = await decodeAudio(path);

    // ...
})()

Audio Analysis

Danceability

Danceability refers to how appropriate a song would be for dancing. It is a mixture of other factors like beat strength and rhythm.

const computed = essentia.Danceability(data);
// { danceability: N }

const danceability = computed.value;

Duration

Duration is the length of a piece of music.

const computed = essentia.Duration(data);
// { duration: N }

const duration = computed.value;

Energy

Mathematically, the energy of a signal is the area under its curve on a graph. In terms of music, energy measures intensity and activity.

const computed = essentia.Energy(data);
// { energy: N }

const energy = computed.value;

Key & Mode

A musical key is a group of notes that form the basis of a song.

Mode refers to the type of scale - major or minor.

const computed = essentia.KeyExtractor(data);
// { key: "C" | "D" | "E" ..., scale: "major" | "minor", strength: N }

const KEYS = ["C", "D", "E", "F", "G", "A", "B"];

const key = KEYS.indexOf(computed.key);
const mode = computed.scale === "major" ? 1 : 0;

Loudness

Loudness refers to how loud a song is in decibels (dB).

const computed = essentia.DynamicComplexity(data);
// { dynamicComplexity: N, loudness: N }

const loudness = computed.loudness;

Tempo

Tempo is the speed of a piece of music in beats per minute.

const computed = essentia.PercivalBpmEstimator(data);
// { bpm: N }

const tempo = computed.bpm;

And that's all the audio features analyzed. But now that we have all this information, what should we do with it? Well, one suggestion would be to build a song recommendation system, which I have already explained here. Another idea would be to make it available through a REST API.

Building a REST API

For this API, we'll use express to handle incoming requests and send responses. Also, we can use formidable to handle file uploads, so check out how to use it here.

So, when the client uploads a file, we will decode it, analyze it, and then return the audio features.

import express, { NextFunction } from "express";

import formidable from "formidable";

import fs from "fs";
import { IncomingMessage } from "http";
import { Essentia, EssentiaWASM } from "essentia.js";
import decode from "audio-decode";
import IncomingForm from "formidable/Formidable";

const app = express();
const port = 3000;

const essentia = new Essentia(EssentiaWASM);

const KEYS = ["C", "D", "E", "F", "G", "A", "B"];

app.use(express.json());
app.use(express.urlencoded({ extended: true }));

const parseForm = async (
    form: IncomingForm,
    req: IncomingMessage,
    next: NextFunction
): Promise<{ fields: formidable.Fields; files: formidable.Files }> => {
    return await new Promise((resolve) => {
        form.parse(
            req,
            function (
                err: Error,
                fields: formidable.Fields,
                files: formidable.Files
            ) {
                if (err) return next(err);
                resolve({ fields, files });
            }
        );
    });
};

const decodeAudio = async (filepath: string) => {
    const buffer = fs.readFileSync(filepath);
    const audio = await decode(buffer);
    const audioVector = essentia.arrayToVector(audio._channelData[0]);
    return audioVector;
};

app.post("/upload", async (req, res, next) => {
    const form = formidable();

    const { files } = await parseForm(form, req, next);

    // The file uploaded must have the field name "file"
    const file = files.file as any;

    const data = await decodeAudio(file.filepath);

    const danceability = essentia.Danceability(data).danceability;
    const duration = essentia.Duration(data).duration;
    const energy = essentia.Energy(data).energy;

    const computedKey = essentia.KeyExtractor(data);
    const key = KEYS.indexOf(computedKey.key);
    const mode = computedKey.scale === "major" ? 1 : 0;

    const loudness = essentia.DynamicComplexity(data).loudness;
    const tempo = essentia.PercivalBpmEstimator(data).bpm;

    res.status(200).json({
        danceability,
        duration,
        energy,
        key,
        mode,
        loudness,
        tempo,
    });
});

app.listen(port, () => {
    return console.log(`Express server listening at http://localhost:${port}`);
});

Conclusion

And that's all! If you liked this article, consider following me.

And if you have any other suggestions on how to use these audio features, drop a comment.

You can find the entire code for this article at GitHub.

Goodbye for now.

Did you find this article valuable?

Support Wool Doughnut by becoming a sponsor. Any amount is appreciated!