Teaching Machines to Read Emotions with Andreas Refsgaard

06.09.186 min read

Last weekend, we had the pleasure of participating in leading nordic contemporary art fair CHART for the third year in a row. Specifically, we were tasked with creating a pavilion for the CHART Architecture competition with few limitations other than working under a specific theme: open-source. This topic isn’t unfamiliar to us, so we saw CHART as a welcome opportunity to engage other creative minds to help us unfold it in a new way. The result of that approach was a pavilion called Sum of Us: conceptualised by architect Sean Lyon, it was a physical cloud which visitors could enter to have their emotions read and interpreted through an installation created by lighting engineer Bo Thorning and sound designer Lasse Munk.

Say what? That’s right—if you entered the cloud, face tracking technology enabled computers to read the expression on your face, judge your mood, and reflect it back to you in real-time with a light and sound installation. But how can you teach computers to read emotions in the first place? And in general, what does that say about the direction in which society is going? We didn’t have the answers, so we asked interaction designer Andreas Refsgaard—a seasoned practitioner of making machine learning playful, and the SPACE10 resident responsible for conceptualising and implementing face tracking as the key to creating an ‘emotionally-aware’ pavilion.

Now that we’ve bid CHART adieu and wrapped up our collaboration with Andreas this time around, we took some time to pick his brains about teaching computers to understand how we feel and the practical and ethical challenges surrounding that very task.

SPACE10: What was the initial idea behind your contribution to the CHART installation?

Andreas Refsgaard: Right now, the ‘cloud’ – meaning, services we interact with all the time and upload our data to – gives us value, but is becoming more integrated into our personal lives. We send stuff to the cloud that has emotional meaning, or upload that stuff to companies who use the cloud to store it. So I thought it could be interesting to make a small system where we’d detect people’s emotions within this concrete cloud, and then have the system react to these emotions.

As well, the cloud aggregates emotions over time. For instance, it knows in real-time if people are happy, so over the span of an hour, it can tell whether the average mood is happy or not.

How can your average visitor understand that emotions are being aggregated over time?

It’s a question of light and sound design, which wasn’t up to me in the end. I prototyped some ideas, though: if the average mood within the last hour was positive, music in a major chord would play. But if it was more negative, it would play in minor. The same for the lights. My idea was that you could have sets of lights which would change in accordance with the types of moods being read.

How do you go about making a computer emotionally-aware in the first place?

I specifically use face tracking to do that. A computer tracks your face by reading your features as points; for instance, there’s a number of points that make up your nose, your eyebrows and your mouth. By using the positions of those points, we can train a system to recognise what combination of point positions correspond to which facial expressions and thus which emotions.

However, using face tracking to read people’s emotions is just one method. You could also try to read emotions with sound. If you were to try to measure people’s voices, you would have data sets where you’d give a computer examples of people speaking and whether those examples signified happiness, confidence, anger, sadness and so on. Then, when you’d give the computer a new sound input, it would try to classify it as happy, confident, angry or sad based on what it’s learned through previous data sets.

In this case, I only used faces to make things simple, but also create something robust enough for the installation’s quite noisy environment.

Say there’s a bunch of people in the cloud at the same time. Does that impact how accurately a computer can read emotions?

From a technical point of view it does not make a difference: I built the system so it could track almost an infinite amount of people at reasonably large distances.

What does it take to enable a system to track an almost infinite amount of people?

It’s adding a lot of cameras and making sure they can run, basically. Initially, we wanted them to run through one computer, but you’d need a super powerful computer to enable that—more powerful than what we had available. So then we thought, how do you make sure that you can have all these webcams communicating with each other on one single sketch? How do you make this work in nighttime situations that are potentially difficult to account for? A lot of time has also gone into making sure that I can adjust raw images to find local contrasts, so the computer can pick up faces even in badly lit environments.

Why are local contrasts relevant, in this case?

Say you have a number of pixels next to each other and they are very similar in their colour, but some are ever-so-slightly darker than others. How do you make sure to, without destroying the original image, make the contrast high enough so that the algorithm is able to recognise a face—even in poor light conditions? In a way, local contrasts make dark pixels darker and bright pixels brighter. The algorithm doesn’t make everything totally white and totally black, but it makes it easier to detect edges and round stuff and gradients (which are the elements that make up a face from a computer’s standpoint.)

It’s a bit like adding filters, similar to when you add filters on Instagram. There you do it for aesthetic purposes, but here you do it to add an algorithm that changes the look of the pixels and makes it easier for the face tracking algorithm to pick up the faces.

What did you find uniquely challenging about this project?

The conceptual question. What we’re doing here is, in a way, a baby step towards computers reading people’s emotions in reality. The interaction we created is very playful, but it also hints at a society where, to an increasing degree, we will track people’s emotions and make systems that respond to a computer’s interpretations of them. It’s a very interesting field, but it’s also a scary field. So in a way, I was making this software for an installation that has fun interactions, but is also hinting at a future of interactions that you could criticise. That was an interesting space to work in.

As an interaction designer working with face tracking, do you feel you have a role in determining the direction the technology takes?

In a way, my role is to say that there’s no way to roll back a technique like face tracking. I want to showcase alternative ways of using these techniques. I want to use them for purposes they weren’t intended for, hopefully in a way that is not harmful. For example, I previously made this project leveraging face tracking to help people with physical disabilities play music. I don’t think the people who created face tracking in the first place ever thought that that would be a use case.

I would like to show that these technologies can be used for a lot of different things, and not only for where the money is at.

How did the environmental constraints of the art fair influence how you worked with face tracking?

The environmental constraints made us simplify things. First of all, we needed to track a lot of people, in real-time, in low light conditions. And then there needed to be an intuitive interaction so that people could understand that they affect the system as they’re in the cloud. To put it into context, some face tracking to emotions systems that are already out there claim to be able to spit out detailed emotions by percentage. But how do you map that to an output? Like, if you get something quite detailed and is perhaps closer to the truth than what we’re doing – essentially measuring smiles – how do you communicate that in a clear manner? So my task with this installation was not to come close to some scientific truth, because it’s not scientific. It’s an interactive art installation, and I’m not specifically interested in it being an accurate detection of someone’s emotions. It’s a pretty good detection of a simple problem.

You could argue that’s a shallow way of looking at emotions, because does the face of a person necessarily indicate their true emotions? I would say no. But ultimately, this installation is not intended to exemplify the full potential of face tracking emotions. It’s something else—it’s an art piece.

Thanks, Andreas.