How to Build an AI Agent for Voice-Enabled Payments with the Tether WDK

·

8 min read

Voice-enabled payments are revolutionizing how users interact with financial applications, offering a seamless and intuitive way to manage transactions. Using Wallet Development Kit (WDK) by Tether, developers can create powerful, AI-driven solutions for modern blockchain payment.

In this guide, we'll walk through building an AI Agent for voice-enabled payments using the Tether WDK. You'll learn how to set up the development environment, understand the core components of the WDK, and implement a fully functional voice-enabled payment agent. By the end of this tutorial, you'll have a working example and the knowledge to expand its capabilities further.

Let's get right to it!

What is the Tether WDK?

The Tether WDK is a multi-asset cryptocurrency wallet library developed by Tether. It enables businesses and developers to seamlessly integrate advanced wallet-related functionalities and user experiences for Bitcoin and USD₮ into any website, app, or device.

The library is prebuilt with all the necessary components to create a wallet. These include:

  • Wallet seed: This component is a sublibrary within the Tether WDK library that handles BIP39 seed generation and management. The component is used to generate BIP39 seed phrases for all assets.

  • Wallet store: This component is a sublibrary within the Tether WDK library wallet that stores data. It supports multiple storage engine implementations, allowing developers to implement the storage engine that best suits their projects' needs.

  • Wallet indexer: This component is a remote blockchain data provider. It seamlessly integrates JSON-RPC and WebSocket APIs in the background to fetch and deliver real-time blockchain data efficiently.

  • Wallet test-tools: This component contains tools used for developing and testing the Tether WDK. It supports setting up test environments for both Bitcoin and Ethereum local networks.

In addition to the Tether WDK's prebuilt components, the library also supports the creation of custom components, allowing you to build components to address your use case.

Building the AI Agent

Now that we've learned about the Tether WDK and its components, let's use the library to build an AI agent for voice-enabled payments.

Step 1: Prerequisites

In this section, we'll go through all the steps, dependencies, and installations required for this tutorial. To follow along with the rest of this guide, you will need the following:

  • Fulcrum Electrum: Fulcrum is a fast and light Simplified Payment Verification (SPV) server for Bitcoin Cash, Bitcoin BTC, and Litecoin.

  • speaches: speaches is an OpenAI API-compatible server supporting streaming transcription, translation, and speech generation. An active speaches instance is required to follow along with this tutorial.

  • Ollama: Ollama is a popular Large Language Model (LLM) backend used to run and serve LLMs offline. An active Ollama instance is required for this tutorial.

  • NodeJS: You can download and install a NodeJS version with Long-Term Support (LTS) from the NodeJS download page. This installation adds NodeJS to your machine's directory and allows you to install dependencies with the Node Package Manager (NPM).

  • A code editor of choice.

With the environment setup completed, you can start setting up NodeJS for your project. To begin, create a new folder in a directory of choice and create an index.js file.

Next, you must initialize NodeJS in the newly created folder. To initialize NodeJS, run the following command at the root of the folder's directory:

npm init -y

This command creates a project scaffold with the node_modules folder and package.json file that contains our project's dependencies. Next, we need to download the Tether WDK. To do this, run the following command in the terminal at the root of your project's directory

npm install github:tetherto/lib-wallet#v0.0.1

This command pulls the Tether WDK library—lib-wallet—directly from the Github repository and updates both the node_modules folder and package_json file with the necessary dependencies.

Step 2: Render the UI

After installing all the dependencies and libraries, we can begin working on the UI. In this step, we'll create an example address book to enable us interact with our final application. We'll also define two JavaScript functions—renderAddressBook and renderAddresses— that enable us interact with the addresses on the screen. These functions set up the basic UI for recording audio, transcribing it, and performing wallet actions based on the transcription.

// Example address book
const book = {
  bob: {
    btc: "bcrt1qrfd2ujntu7la5vjqpjr69u8tc8rl6fxvx6hrzm",
  },
  alice: {
    btc: "bcrt1q7mm7seyccvf4dyc2je97zumh4aes7xhgetwc6m",
    eth: {
      usdt: "0x9ede22b627388b5db43c3488f27480b45d22d238",
    },
  },
};

function renderAddressBook(book) {
  const container = document.createElement("div");
  container.className = "address-book";

  for (const [name, addresses] of Object.entries(book)) {
    const personElement = document.createElement("div");
    personElement.className = "person";

    const nameElement = document.createElement("h3");
    nameElement.textContent = name;
    personElement.appendChild(nameElement);

    const addressList = document.createElement("ul");
    renderAddresses(addresses, addressList);
    personElement.appendChild(addressList);

    container.appendChild(personElement);
  }
  const node = document.getElementById("addr");
  node.appendChild(container);
}

function renderAddresses(addresses, parentElement, prefix = "") {
  for (const [key, value] of Object.entries(addresses)) {
    const listItem = document.createElement("li");

    if (typeof value === "string") {
      listItem.textContent = `${prefix}${key}: ${value}`;
    } else if (typeof value === "object") {
      listItem.textContent = `${prefix}${key}:`;
      const nestedList = document.createElement("ul");
      renderAddresses(value, nestedList, "  ");
      listItem.appendChild(nestedList);
    }

    parentElement.appendChild(listItem);
  }
}

You can add appropriate styling to these elements as required.

Step 3: Setup the wallet

Now, we can begin setting up the wallet to be used for the project. To do this, we will use Tether's WDK library. Before using the library, we must first import the required libraries into the index.js file. For this tutorial, we will import both the Tether WDK library and the Wallet seed sublibrary. To do this, copy the following snippet and paste at the top of your index.js file:

const Wallet = require("lib-wallet/src/wallet-lib");
const Bip39Seed = require("lib-wallet-seed-bip39");

Next, let's generate a seed phrase and define the wallet configuration. To generate a new seed phrase, we used the generate method within the Wallet seed library's Bip39Seed class:

// Generate a new phrase on page refresh
const PHRASE = await Bip39Seed.generate();

// Wallet config
const wconfig = {
  network: "regtest",
  electrum_host: "ws://localhost",
  electrum_port: "8001",
  token_contract: "0x959922bE3CAee4b8Cd9a407cc3ac1C251C2007B1",
  web3_indexer_ws: "ws://localhost/eth/hardhat/indexer/ws",
  web3_indexer: "http://localhost/eth/hardhat/indexer/rpc",
  web3: "ws://localhost/eth/hardhat/indexer/web3",
  seed: {
    mnemonic: PHRASE,
  },
};

Finally, let's define a function to initialize the wallet as shown:

async function initWallet() {
  renderAddressBook(book);
  const w = await Wallet.wallet(wconfig);

  await w.syncHistory({ all: true });
  document.getElementById("seed").textContent = w.seed.mnemonic;
  Wallet.demoWallet = w;

  return w;
}

When run, the initWallet function will initialize the wallet using the configuration defined in wconfig, sync the wallet's transaction history, and then display the generated seed phrase on the page.

Step 4: Wallet interactions

Now that we have successfully set up the wallet, we can define interactions that a user can perform on the wallet. To do this, we need to create a function that performs actions on the wallet based on the parsed transcription. The function also includes conditionals that determine the error messages if a user attempts to perform an unsupported action or use an unsupported asset.

async function walletAction(msg) {
  const wallet = Wallet.demoWallet;

  const asset = wallet.pay[msg.asset.toLowerCase()];
  if (!asset) return setStatus(`asset: ${msg.asset} is not supported`);
  if (!asset[msg.action])
    return setStatus(`action ${msg.action} not supported by wallet`);

  if (msg.args) {
    msg.args.fee = 10;
  }

  const res = await wallet.pay[msg.asset.toLowerCase()][msg.action](
    { token: msg.token?.toLowerCase() },
    msg.args
  );

  try {
    setStatus(JSON.stringify(res, null, 1));
  } catch (err) {
    console.log(err);
    setStatus("command failed");
  }
}

function setStatus(txt) {
  const statusDiv = document.getElementById("status");
  statusDiv.textContent = txt;
}

Step 5: Set up voice recording

The first phase involved in this step is to define constants for the active Ollama and speaches instances. To do this, copy the following lines of code and paste them into your index.js file. The link to each instance must be the same as the link on your local device.

// Edit these to your local instances
const SPEACHES = "http://localhost/whispr/audio/transcriptions";
const OLLAMA = "http://localhost:11434/api/chat";

Next, let's define a function that handles audio interaction on the app. The initMic function initializes the microphone, starts/stops recording, and then uploads and sends the audio to the SPEACHES service for transcription.

async function initMic() {
  let mediaRecorder;
  let audioChunks = [];
  let isRecording = false;

  const recordButton = document.getElementById("record");
  const audioPlayback = document.getElementById("audio");

  recordButton.onclick = toggleRecording;

  async function toggleRecording() {
    if (!isRecording) {
      await startRecording();
    } else {
      stopRecording();
    }
  }

  async function startRecording() {
    audioChunks = [];
    const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
    const mediaRecorder = new window.MediaRecorder(stream, {
      mimeType: "audio/ogg; codecs=opus",
    });

    mediaRecorder.ondataavailable = (event) => {
      audioChunks.push(event.data);
    };

    mediaRecorder.onstop = async () => {
      const audioBlob = new Blob(audioChunks, {
        type: "audio/ogg; codecs=opus",
      });
      const audioUrl = URL.createObjectURL(audioBlob);
      audioPlayback.src = audioUrl;

      await uploadAudio(audioBlob);
    };

    mediaRecorder.start();
    isRecording = true;
    recordButton.classList.add("recording");
    setStatus("Recording.... (Press again to stop recording)");
  }

  function stopRecording() {
    mediaRecorder.stop();
    isRecording = false;
    recordButton.classList.remove("recording");
  }

  async function uploadAudio(audioBlob) {
    const formData = new FormData();
    formData.append("file", audioBlob, "recording.ogg");
    formData.append("language", "en");

    setStatus("Uploading...");
    const response = await fetch(SPEACHES, {
      method: "POST",
      body: formData,
    });

    if (response.ok) {
      const result = await response.json();
      setStatus(`transcribed: ${result.text}. processing ....`);
      parseTranscribe(result.text);
    } else {
      setStatus("Upload failed");
    }
  }
}

Step 6: Parse and process audio transcriptions

Upon retrieving the transcribed text, we can send it to the Ollama LLM instance service for further processing. It expects a structured JSON response. Therefore, we must parse the returned data before sending it for processing, as shown:

async function parseTranscribe(txt) {
  const data = {
    model: "llama3.1",
    stream: false,
    messages: [
      {
        role: "user",
        content: `${JSON.stringify({
          asset: "",
          token: "",
          action: "",
          args: {
            amount: "",
            unit: "",
            address: "",
          },
          addressBook: book,
          text: txt,
        })}`,
      },
    ],
  };

  let msg;
  try {
    const response = await fetch(OLLAMA, {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
      },
      body: JSON.stringify(data),
    });

    if (!response.ok) {
      throw new Error(`HTTP error! status: ${response.status}`);
    }

    const result = await response.json();
    msg = JSON.parse(result.message.content);
  } catch (error) {
    console.error("Error:", error);
    throw error;
  }
  walletAction(msg);
}

Now, we can bring all the above steps together. But first, we need to define an asynchronous function—main—that calls the initWallet and initMic functions. Then, we call the main function when the DOM is fully loaded.

async function main() {
  await initWallet();
  document.getElementById("record").textContent = "Record";
  initMic();
}

document.addEventListener("DOMContentLoaded", function () {
  main();
});

Conclusion

By following the steps outlined in this tutorial, you've covered the following:

  • Building an AI Agent for voice-enabled payments using the Tether WDK.

  • Setting up the development environment, configuring essential components, and integrating the wallet library to handle blockchain transactions.

  • Implemented voice commands to streamline user interactions using an AI agent.

With this foundation, you can extend the AI Agent to support more advanced use cases and customize features to meet specific requirements.

The possibilities are endless—happy building!