Anime Voice Productivity Enhancer "No Standing App" Powered by ml5

2020 and 2021 have seen so many workers shifting their work style to the work-from-home style, and developers are no exception.

Some say it boosted their productivity, and others suffered from the transition.

I have to admit; I'm more inclined to the latter.

There are so many distractions and so much freedom in your house.

For instance, you can dance to music knowing it will delay the release of a new feature.

Or, you can start washing dishes knowing you have to respond to your boss.

You do all of those, right? You suddenly stand up and start doing your stuff as if you had discovered your calling. I won't judge you when you can't resist that urge because I've been there once (or thousands of times) too. However, we can not necessarily be happy that way. Sometimes you have to get things done instead of standing up and starting to dance.

That means you need a watcher that keeps you from standing up when you're not supposed to.

So, I created an AI-powered app that will scold you in an anime voice when you stand up.

What I created

Here I introduce this app to you, "No Standing App."

You can see it running from this link.

https://bunhojun.github.io/no-standing-app/

How to use

Recommended browser: Google Chrome

  1. grant permission for webcam on the browser
  2. wait until the webcam gets ready
  3. stand up, and you will hear an anime voice saying "kora!" which means "hey!"

That's it. I implemented minimum UI so I could make the code as simple as possible.

Libraries / Machine learning model

These are the libraries and machine learning models I used to create this app.

ml5.js

ml5.js is a JavaScript library for machine learning. You can not only use machine learning models that have already been trained, but also you can train AI with images you have. What's fantastic about this library is that you can use it without any AI knowledge as long as you know how to code in JavaScript.

PoseNet

PoseNet is a machine learning model that allows for Real-time Human Pose Estimation. It locates the coordinate at which your body (or part of your body) is by using images taken from the webcam.

p5.js

p5.js is a library that helps you render images or shapes with ease. In this app, I used it to implement a webcam. This library often comes up when dealing with ml5. You can benefit a lot from using this library if you use ml5.

p5.sound

p5.sound is a library that lets you load sound files on p5.js with ease.

Sound source

You can use any sound, but I downloaded an anime voice from this website (Japanese only).

https://soundeffect-lab.info/sound/voice/line-girl1.html

How did I code it?

If you are interested, I'll show you how I created this app. It's actually easy.

Basic algorithm

  • judge if you are trying to stand up or not by the coordinate of your nose on the webcam
  • if your nose gets to the top of the screen, it triggers the voice

HTML

<!-- index.html -->
<!DOCTYPE html>
<html>

<head>
  <meta charset="UTF-8">
  <title>Image classification using MobileNet and p5.js</title>
  <!-- load libraries (p5.js/p5.sound and ml5) using CDN -->
  <script src="https://cdnjs.cloudflare.com/ajax/libs/p5.js/1.0.0/p5.min.js"></script>
  <script src="https://cdnjs.cloudflare.com/ajax/libs/p5.js/1.1.9/addons/p5.sound.js"></script>
  <script src="https://unpkg.com/ml5@latest/dist/ml5.min.js"></script>
</head>

<body>
  <h1>立ち上がっちゃだめだよ(Don't stand up)</h1>
  <p id="message">loading</p>
  <script src="sketch.js"></script>
</body>

</html>

The p tag used above shows the loading state of PoseNete (a machine learning model).

JavaScript

// sketch.js

// global variable to store instance that plays the "kora" sound.
let kora;

// p5.js/p5.sound-library-specific function that is called before loading the project. Lets you load a sound file.
function preload() {
  soundFormats('mp3');
  kora = loadSound('assets/kora');
}

// p5.js-specific function that is fired on project running
function setup() {
  // createCapture creates a video (webcam) element on dom tree. VIDEO is a global variable of p5.js
  const video = createCapture(VIDEO);
  // load PoseNet model of ml5 and create a posenet instance
  const poseNet = ml5.poseNet(video, modelLoaded);
  // eventListener that detects human movements or poses. it receives a callback as the 2nd argument.
  poseNet.on('pose', gotPoses);
}

// callback function fired when PoseNet (or other ml5's machine learning models) is loaded.
function modelLoaded() {
  const message = document.querySelector('#message');
  message.innerHTML = '準備OKだよ!(Ready!)';
}

Here are a few essential parts;

const poseNet = ml5.poseNet(video, modelLoaded); 

The code above enables you to use the PoseNet model. ml5.poseNet() gets two arguments, which are a video element of HTML and a callback function fired when a machine learning model is loaded. In this case, I made the p tag I mentioned above show that the model and the app are ready.

And here is another important part

poseNet.on('pose', gotPoses);

When PoseNet detects human poses (which happens a few times in a second), it fires a callback function. In this app, I call a function named gotPoses() as a callback (details below).

When PoseNet detects a human pose

We have to take a look at one thing before explaining the gotPoses(). When PoseNet detects a human pose, the listener passes an argument to its callback function. If you look at this argument, you will see that it is an array.

[
  {
    pose: {
      keypoints: [{position:{x,y}, score, part}, ...],
      leftAngle:{x, y, confidence},
      leftEar:{x, y, confidence},
      leftElbow:{x, y, confidence},
      // what is important is down here
      nose: {
        confidence: 0.9991235136985779
        x: 361.67679872030413
        y: 199.94389418961936
      }
      ...
    },
    // ignore this time
    skeleton: [...]
  }
]

This array has an object, which has two objects called "pose" and "skeleton." In this app, we will use the "pose" object.

This pose object lets you know the coordinate at which each part of your body is.

Since the algorithm is "if your nose gets to the top of the screen, it triggers the voice," we will extract the Y coordinate of your nose from the nose property in the pose object. Let's define the Y coordinate of "the top of the screen" as 50 (by the way, the upper your nose goes, the smaller the Y coordinate gets).

OK, let's move on to the explanation of gotPoses().

// callback function fired when PoseNet detects a human pose
function gotPoses(poses) {
  if (poses && poses[0]) {
    const pose = poses[0].pose;
    const noseY = pose.nose.y;
    if (noseY < 50) {
      onStandingUp();
    } else {
      onSitting();
    }
  }
}

As you can see from the code above, if the Y coordinate of your nose is smaller than 50, we call onStandingUp() or else, onSitting().

Let's look at those two functions

// variable to store the state of if you are standing or not. necessary to avoid being scolded multiple times at one time.
let isStandingUp = false;

function onSitting() {
  if (isStandingUp) {
    isStandingUp = false;
  }
}

function onStandingUp() {
  if (!isStandingUp) {
    isStandingUp = true;
    // play "kora" sound. play() is a method provided by p5.sound
    kora.play();
  }
}

That's it!! If you want to see the final code, visit the GitHub repo.

Conclusion

Did you find the app helpful? You might find it a little bit goofy, but it does its job at least. What's important is I created this app even though I hadn't had much knowledge about AI or machine learning.

I used to think AI or machine learning technology was only for researchers and people who have degrees in the field. But it turns out the AI community is growing, and you can find many inclusive projects or teams, like ml5 or Tensorflow. As is always the case with the tech industry, this tradition always makes me hopeful about humanity.

I hope this article contributes to the tech industry too.

Thanks for reading.