AI decision making using floating-point states such as HUNGER, FUN and AFFECTION

5 Dec

At work we are currently undergoing a production that involves the implementation of an reactive animalistic AI. The user can interact with the AI and should as much as possible appear to have a conscious of its own.

We decided to implement an AI that bases its action on internal state values such as HUNGER, ENERGY, AFFECTION, FUN etc.

I would like to share some knowledge of the process and some of the pitfalls we encountered when developing the AI, in hope to spark a discussion and perhaps enlighten other developers of different approaches.

Why we did not go with a traditional state machine

Naturally, we discussed a finite state machine, but based on previous experience they tend to become messy and hard to maintain. The primary reason is that each state requires a defined transition to another state. As the amount of states increases, the amount of transitions between the states increases and thus makes it difficult to scale. Depending on state machine design it can also become difficult to have the AI be in multiple states simultaneously, e.g. a “run-towards-the-ball” state and a “stop-up-and-sneeze” state.

State machines can become a convoluted mess. Borrowed from “Creating fluid motion transitions with Unity and Mecanim.”

A state machine where transitions are based on a decision score

During our initial investigation, we were intrigued by the way, various machine learning algorithms tackled AI. We played around with Unity’s machine learning agents and we got inspired to develop a similar approach that didn’t involve machine learning. We did not use machine learning as it seems like ‘BLACK MAGIC’ and we were afraid of ending up in a pit we could not escape.

Instead, we implemented an AI that decides which actions to perform by calculating a decision score that is primarily driven by the internal state values.

The AI we developed was separated into the following components

  • Observables, allows the AI to see, hear and sense the environment
  • Actions, the potential actions the AI can take such as “go-to-ball” or “go-eat-that-apple”
  • States, internal floating-point values, ranging from 0 to 1 that represents HUNGER, ENERGY, FUN etc.
  • Controllers, offers control of various systems of the AI e.g. a head controller, walk controller, tail controller etc.

I won’t go into details, except for Actions which is the center piece of the AI.

Consider this, our AI have a predefined list of potential Actions it can take:

  • Beg-For-Food
  • Fetch-a-ball
  • Look-at-user
  • Whack-tail-furiously

Each action then receives a decision score function that calculates a decision score.

  • Beg-For-Food — HUNGER * 3
  • Fetch-a-ball — FUN * 2
  • Look-at-user — (1 – AFFECTION) * 2
  • Whack-tail-furiously — FUN * 0.5

Each action also has a list of controllers that it requires in order to function

  • Beg-For-Food — Requires: Walk Controller, Head Controller
  • Go-to-dog-house — Requires: Walk Controller
  • Look-at-user — Requires: Head controller, Audio controller
  • Whack-tail-furiously — Requires: Tail controller, Audio controller

The general outline of the decision making is as followed:

  1. Determine what actions are plausible (You can’t go eat an apple, if there is no apple)
  2. Calculate the decision score for each action
  3. Order all actions based on their decision score
  4. Iterate through the ordered list of actions and allow the highest scored actions to have first priority in occupying a controller
    1. If an action wishes to use a controller that aren’t occupied, the action will occupy the controller for itself
    2. If an action wishes to use a controller that is already occupied, the action will not execute

So, if we select an AI with the following values:

  • HUNGER = 0.5
  • FUN = 0.5

These values could represent the AI’s internal state at any given time. In this situation we could intepret it as the AI is equally hungry and entertained, but it could really use some Love <3.

The following scores would be calculated for the given actions

  • Beg-For-Food — HUNGER(0.5) * 3 = 1.5
  • Fetch-a-ball — FUN(0.5) * 2 = 1
  • Look-at-user — (1 – AFFECTION(0)) * 2 = 2
  • Whack-tail-furiously — FUN(0.5) * 0.5 = 0.25

Ordered by score the actions would be ordered as:

  1. Look-at-user —  2
  2. Beg-For-Food —  1.5
  3. Fetch-a-ball —  1
  4. Whack-tail-furiously — 0.25

To summarize, each action occupies the following controllers:

  1. Look-at-user —  Requires: Walk Controller, Head Controller
  2. Beg-For-Food —  Requires: Walk Controller
  3. Fetch-a-ball —  Requires: Head controller, Audio controller
  4. Whack-tail-furiously — Requires: Tail controller, Audio controller

If we iterate over all actions, and the highest scoring actions have first priority of occupying a controller, the result would be.

  1. Look-at-user —  Requires: Walk Controller, Head Controller
  2. Beg-For-Food —  Requires: Walk Controller 
  3. Fetch-a-ball —  Requires: Head controller, Audio controller
  4. Whack-tail-furiously — Requires: Tail controller, Audio controller

The reason for the following actions:

  1. Look-at-user — Because it is the first action no controllers are occupied and so this action can be executed
  2. Beg-For-Food — Because the walk controller is occupied by “Look-at-user” action, this action can NOT be executed
  3. Fetch-a-ball —  Because the Head controller is occupied by “Look-at-user” action this action can NOT be executed
  4. Whack-tail-furiouslyBecause “Fetch-a-ball” could not execute because the head controller was occupied, it didn’t occupy the audio controller. Therefore, both the Tail controller and Audio controller are both available and therefore this action can be executed

In the end, we end up with the following two actions running in parallel

  1. Look-at-user 
  2. Whack-tail-furiously

So even though “Whack-tail-furiously” received the lowest score among all the action, it still ends up being executed because all the controllers for it to execute was still not occupied. For us, this mean we can run multiple actions in parallel, as long as all the controllers are available for that given action.

Pitfalls using this approach

This approach gives us great flexibility and gives us great power when it comes to transitioning from one state to another as well as running multiple actions in parallel. However, there are also some serious pitfalls that has to be considered before you dive into this approach.

  • Selecting the correct score function is very difficult. Especially if you attempt to tailor a specific experience such as running a sequence of actions. In some cases, you will have to add or subtract some weights to the score functions in order to get the desired behavior. If you are comfortable with machine learning you could attempt using machine learning to do the workload of establishing the score functions.
  • You need some good debugging tools to debug this system. It is too easy to make a mistake when designing the decision score and what controllers they depend on. It is difficult to narrow down the exact issue, so creating good debugging tools is a life saver. Using Unity, we ended up creating our own editor tools that displays all actions, what actions they depend on, what scores they got and so forth.
Our AI editor tool. Don’t look to closely on the decision score next to the action name. They don’t reflect the score function (to the right) because I have added some arbitrary weights in the code. But you properly get the gist of it.

Thank you Alexander and Mads for providing feedback! 🙂 This is one of my very first technical blog posts. I would very much like your insights and critique both regarding our design as well as my writing. I am planning on creating more technical content like this in the future.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.