Make Your AI-Powered Assistant Play Music! (Featuring Web3)

Make Your AI-Powered Assistant Play Music! (Featuring Web3)

Integrate the Houndify API to make a personal assistant like Alexa or Siri

In the last article, we set up our application and added audio visualization with Wave.js. Now we'll make the client play music. The demo below shows what we are trying to do.

First, the server downloads the song from YouTube. Then, it uploads the file to a decentralised storage network. Finally, the client retrieves the file and plays it.

Setting up

If you didn't read the last article in the series, I would recommend following it first.

But if you'd rather start afresh, the starter code can be found here.

Remember how the Houndify dashboard had domains that required client integrations to function? Well now here are two of them.

illustration2.jpeg

Enable these two domains then click save changes in the top-right of the page. These will allow the user to request a certain song, play a song from music charts, or specify a genre.

Initiating command handlers

Now that those domains are enabled, go back to your editor, open up src/App.tsx and insert this code:

...
import handleCommand from "./handlers";
...

function App() {
    ...
    const handleResult = async (result: any) => {
        let newResult = await handleCommand(result);
        say(newResult.SpokenResponseLong);
    };
    ...
}

If the user's command requires extra logic, handleCommand will transform the result.

Create a file named index.ts in a new src/handlers directory and insert the following code:

const COMMANDS = ["MusicCommand"];

export default async function handleCommand(result: any) {
    const command = result.CommandKind;
    if (!COMMANDS.includes(command)) return result;

    const { default: commandHandler } = await import(`./${command}`);

    return await commandHandler(result);
}

First, COMMANDS stores the Houndify commands the client is capable of handling. MusicCommand is the command that allows the music playing and charts feature.

Then the function takes in our result object and looks at the command. If we are able to handle it, we can import the handler from the folder and use that. If not, it returns the same result with a default response.

Handling the Music Command

Wait, how will this actually work?

Well, when the user requests a song, the result contains info like the track name and artist. It also contains two new versions of a result. We use them when the client is about to play the song. If successful, we return the AutoPlayResult and AutoPlayFailedResult otherwise.

We will then use the track name and artist to find a corresponding YouTube video. We then upload it to the server. Then on the front end, we retrieve the file and then play the audio.

First, define these two constants:

const SUCCESS_RESULT = "AutoPlayResult";
const FAILED_RESULT = "AutoPlayFailedResult";

Then, create a handle function which will be the main part of the handler:

// src/handlers/MusicCommand.ts

export default function handle(result: any) {
    // If there is no successful result
    // then no song will be able to play
    if (!result[SUCCESS_RESULT]) {
        return result;
    }

    switch (result.MusicCommandKind) {
        // Both music command kinds have
        // the same data, it's a track
        case "MusicChartsCommand":
        case "MusicSearchCommand":
            return handleMusicCommand(result);
        default:
            return result;
    }
}

Then add the following code which outlines what we discussed:

import playSound from "../lib/playSound";
import { Howler } from "howler";

...

const handleMusicCommand = async (result: any) => {
    try {
        let track = result.NativeData.Tracks[0];
        const cid = await downloadTrack(track);
        const audioURL = await retrieveFileURL(cid);

        Howler.stop();
        playSound(audioURL, { format: "webm" });

        // Play music
        return result[SUCCESS_RESULT];
    } catch {
        return result[FAILED_RESULT];
    }
};

We'll get back to the cid variable later. It's part of the web3 bonus!

We can then create a helper function to create an optimised search query from the track info. And then an extra function to call the new endpoint from our server.

const getTitle = (title: string, artist: string) => {
    return `${title} ${artist}`
        .toLowerCase()
        .replace(/ *\([^)]*\) */g, "")
        .replace(/ *\[[^\]]*]/, "")
        .replace(/feat.|ft./g, "")
        .replace(/\s+/g, " ")
        .trim();
};

const downloadTrack = async (track: any) => {
    let title = getTitle(track.TrackName, track.ArtistName);

    let { data } = await axios.post("/yt-download", null, {
        params: { q: title },
    });

    return data.cid as string;
};

For the first endpoint, we will have to use the YouTube Data API to search for a video and return the video ID. So, let's jump over to the server we created last time and do that.

Using the YouTube Data API

You can find a guide here to set up the API and retrieve the API key.

Once that is set up, add the API key to the .env file like so:

...
YOUTUBE_DATA_API_KEY={YOUR_API_KEY}

Let's first install body-parser to parse JSON data from a request. And then youtube-search to wrap around the API.

npm i body-parser youtube-search

Then, add the following code to the server.js file:

const bodyParser = require("body-parser");
const search = require("youtube-search");

...

app.use(bodyParser.json());

...

const searchYT = async (q) => {
    const { results } = await search(q, {
        key: process.env.YOUTUBE_DATA_API_KEY,
        maxResults: 3,
    });
    const result = results[0];

    return result.id;
};

Now we can define the route to download the video from YouTube and return a reference to the file.

Downloading the YouTube video

On the server side, we will use a program called yt-dlp to extract the video from YouTube. It then makes use of [ffmpeg] to convert the video to audio.

You can download the binary file for yt-dlp here, and you can get ffmpeg here.

Save the binary files into a folder named binaries in your project. The folder should look like this:

illustration3.png

Now we must install the yt-dlp-wrap library to handle executing the commands to yt-dlp:

npm i yt-dlp-wrap

Then the following code would download any given video from YouTube by its ID:

const { default: YTDlpWrap } = require("yt-dlp-wrap");

const ytDlpWrap = new YTDlpWrap("./binaries/yt-dlp.exe");

// lo-fi hip hop music on YouTube
const id = "lTRiuFIWV54";
const filename = `public/songs/${id}.webm`;

const args = [
    `https://www.youtube.com/watch?v=${id}`,
    "-f",
    "ba",
    "--ffmpeg-location",
    ".",
    "-o",
    filename
];

ytDlpWrap.exec(args)

But uploading all the song files to the server's public folder isn't much of a good idea. There are 3rd party services out there which are better optimised for file storage. In this tutorial, we will be using Web3.Storage which uses the decentralised web to store files. There are two reasons I'm using it:

  1. You get 1 TB of storage completely free (yes, free!)
  2. It's quite simple to get started after reading the quick start here.

Bonus - Decentralised storage

Add your API key to the .env file like so:

REACT_APP_WEB3_STORAGE_TOKEN={YOUR_KEY}

Installing Web3.Storage

Now install web3.storage with the following command:

npm i web3.storage

Uploading YT vids to Web3.Storage

We can first define a function to download a video with yt-dlp and stream the output to our storage provider.

// server.js

...

const { Web3Storage } = require("web3.storage");

...

const web3Storage = new Web3Storage({
    token: process.env.REACT_APP_WEB3_STORAGE_TOKEN,
});

const uploadYTToWeb3 = async (ytID) => {
    const ytdlpArgs = [
        `https://www.youtube.com/watch?v=${ytID}`,
        "-f",
        "ba",
        "--ffmpeg-location",
        ".",
    ];

    const filename = `${ytID}.webm`;
    let cid = await web3Storage.put([
        { name: filename, stream: () => ytDlpWrap.execStream(ytdlpArgs) },
    ]);

    return cid;
};

Now we can add our final API route for this tutorial:

app.post("/yt-download", async function (req, res) {
    try {
        const ytID = await searchYT(req.query.q);
        const cid = await uploadYTToWeb3(ytID);

        res.json({
            cid,
        });
    } catch (error) {
        res.status(500).send(error);
    }
});

Now move over to the front end. Add the following code to retrieve the file in the MusicCommand handler:

// src/handlers/MusicHandler.ts

import { Web3Storage } from "web3.storage";

//@ts-ignore
const web3Storage = new Web3Storage({
    token: process.env.REACT_APP_WEB3_STORAGE_TOKEN as string,
});

const retrieveFileURL = async (cid: string) => {
    const storageRes = await web3Storage.get(cid);
    const files = await storageRes?.files();

    if (!files) throw new Error();

    const audioURL = URL.createObjectURL(files[0]);

    return audioURL;
};

Remembering previous songs

At this moment, you may be wondering how efficient this is. Each time we call the route, the server downloads the video, regardless of which video we request. So if we ask for the same song again, the server will take the same amount of time to respond.

We can speed this up by storing a JSON file named song_cids.json. It will contain a mapping of YouTube IDs to storage CIDs (Content Identifiers). So, if the video ID we request is in the file, we can bounce back with a suitable CID for the file.

Add the following to the server code:

// server.js

const { existsSync } = require("fs");
const fs = require("fs/promises");

const cidFilename = "song_cids.json";

const saveCIDMapping = async (ytID, cid) => {
    let data = await readCIDFile();
    data[ytID] = cid;
    await fs.writeFile(cidFilename, JSON.stringify(data));
};

Now, extend the route like so:

app.post("/yt-download", async function (req, res) {
    try {
        const ytID = await searchYT(req.query.q);

        // Check if we downloaded the song before
        let cid = (await readCIDFile())[ytID];
        if (cid) {
            return res.json({ cid });
        }

        cid = await uploadYTToWeb3(ytID);
        res.json({
            cid,
        });
    } catch (error) {
        res.status(500).send(error);
    }
});

Toast notifications

If you spin up the dev server now, try to request a song. There's at least a five second wait and there's not even a way of telling what's even going on! To provide extra visual feedback, we can use the react-toastify library. With it, we can add elegant toast notifications with ease. Install it with this line:

npm i react-toastify

Open src/App.tsx and import the files we need:

import { ToastContainer } from "react-toastify";
import "react-toastify/dist/ReactToastify.css";

Then add the ToastContainer within the component like so:

<div className={styles.root}>
    <h1 className={styles.h1}>Assist310</h1>
    <VoiceInput transcription={transcription} />
    {error && <div className={styles.errorContainer}>{error}</div>}
    <ToastContainer />
</div>

Now open up src/handlers/MusicCommand.ts and alter the handle function:

export default function handle(result: any) {
    if (!result[SUCCESS_RESULT]) {
        return result;
    }

    switch (result.MusicCommandKind) {
        case "MusicChartsCommand":
        case "MusicSearchCommand":
            const onResponse = ({ data }: ToastContentProps<any>) => {
                return data.SpokenResponse;
            };

            return toast.promise(
                handleMusicCommand(result),
                {
                    pending: "Downloading...",
                    success: {
                        render: onResponse,
                    },
                    error: {
                        render: onResponse,
                    },
                },
                { type: "info" }
            );
        default:
            return result;
    }
}

Finally...

And that's all! We used yt-dlp to download videos from YouTube, and web3.storage to store the downloaded songs. You can find all the code for this tutorial at this branch of my repo and play around with it. Stay tuned for the next part of the series where we'll actually be able to pause the song. This prevents you from having to listen to the whole song if you later decide it's not the one for you. Enjoy!

Resources

Storing Data on IPFS using Web3.Storage!

Handling promises | React-Toastify

MusicCommand API Reference | Houndify Docs

Issue #30102 · ytdl-org/youtube-dl

YouTube Data API Overview | Google Developers

Did you find this article valuable?

Support CS310 by becoming a sponsor. Any amount is appreciated!