Building AR effects for Messenger and Instagram video calls

Published in

Popul-AR

10 min readAug 8, 2023

Announced all the way back at the F8 developer conference in June of 2021, Spark AR’s Multipeer API has since gone public, allowing anyone to create Augmented Reality filters for Instagram, Messenger and Portal.

So what is Multipeer? You can roughly translate it to Multiplayer AR, effects where multiple people can interact with the same Augmented Reality experience. The capability can be used in a variety of ways :

Ambient effects — a cat’s tail in one screen and the body in another, aka Chat Bomb from Spark AR themselves
Shared Contexts — watch Northern Lights together or celebrate a Birthday on a call
Expressing a vibe — share and communicate emotions via graphic elements, Forecast Emotions and Emotes on our profile are good examples
AR Games — actual multiplayer games, like Potato Bag Race or Hot Dog Eating Contest

It’s also technically available on multiple platforms, under different names. While Spark AR has Multipeer, Lens Studio has Connected Lenses and Lightship ARDK has Multiplayer AR.

Whatever its name, the Multipeer capability offers the potential for radically different experiences than before, enabling interaction between call participants, whereas AR filters, for example, had up until now been confined to single camera-feed situations (selfies and the back camera).

In this article we’ll be taking a look at the capability within Spark AR and breaking down its good and “not-so-good” points.

To do this, we’ll start by clearly defining the capability itself and move on to the UX shift in paradigm it represents. We’ll dive in a bit more technical by taking a quick glance at compatibility issues as well as how to test said effects and finally append a note on scripting and its use/requirement in regards to the Multipeer functionalities.

Left to Right : Face Twister, Emotes, Forecast Emotions, available on Popul-AR’s socials

1° Multipeer definition

The Multipeer functionality is made for group effects in video calls. What’s especially fun about it is that one user can influence another user’s environment which makes the digital group experience get closer to a real life hangout (either on Instagram, Messenger or Portal).

It does this by allowing the effect to listen for messages that other instances have sent to a dedicated message channel, enabling data to be passed between them to create coordinated group experiences.

Within Spark AR, this is done in two ways :

1° via the MultipeerModule scripting API, which provides the ability to create message channels that effects can send JSON formatted messages to.

The Screen Tap is picked up and sent back out via script

2° via the dedicated Multipeer patches, which let you package numerical, boolean, text or Vector2, 3, 4 data from one instance of an effect into a message to send to other instances.

Multipeer Send — send message / Multipeer Receive — receive message

2° UX Paradigm shift

Now that we’ve gone over what the Multipeer feature is and how it translates technically speaking, let’s address the elephant in the room : the UX shift this represents for AR effects on Spark AR!

Up to this point, effects have been limited to working on a single camera feed, aka either in selfie mode or back camera mode. This greatly limits the person to person interactivity. At best we could work with several face trackers for example to achieve a notion of multiple participants. This approach however was severely limiting because of camera real estate. It’s hard to fit more than three people in back camera mode, even harder to do so in selfie mode.

With multiple instances of the same effect being able to communicate with each other, we can bypass the previous restrictions and have a true notion and even count of participants in any given effect.

The breakthrough is two-fold, giving new perspectives in both psychology and interactivity in regards to AR filters.

UX Psychology Shift

To properly tackle the psychology topic, we need to bring up the second key difference that the Multipeer filters bring: Users are experiencing AR in a widely different context when they are using filters built on the Multipeer API.

A user using a filter for publishing a story has a widely different psychology than people using filters together in a video call.

The goal are widely different and that has such an impact on the user experience.

People using filters for their story usually use it as a creative tool to make their video content standout, but also, their sotry will be -in most cases- public for 24h. When people on a call are usually just hanging around with friends in an intimate setup. A much more easy going context that will probably make users more encline to try things out.

What do you want to get out of an experience shared with your friends? Most people would answer they simply want to have fun, get a few laughs, share something. Not necessarily be “beautified” or made to be more “share-able”.

It’s here that an interesting difference can surface between Multipeer and non-Multipeer effects. If you don’t need your user to share the filter, but rather just need your users to have fun, you end up with a lot less constraints and a new playing field to explore!

UX Interactivity Shift

Now that we’ve gone over the goal of the effects and how that affects the psychology behind the creation process, let’s get back to how this translates via interactivity.

AR has always featured quite a bit of interactivity, from screen taps to facial gestures, the list is quite long and not necessarily what we want to talk about here. If anything we’re interested in highlighting that Multipeer prevents some of these interactions — report to point n°3 below for a list —

What we really want to talk about here is rather the new possibilities offered by an interaction type that wasn’t there before : from one person to another. In the creation process of your filter, you can take into consideration how one person’s action could impact other people’s experience, and the ensuing response that it would generate from them.

A first effect we made attempts to focus on this kind of interaction by introducing visual Emotes in response to each user’s facial expressions — seen in the first photos at the start of the article — . In a call, seeing a bunch of little smile and laugh emotes is bound to cheer you up, in turn making you an emitter of said smile and laugh emotes yourself. These kinds of more “interconnected” interactions are what we strive to explore most with this feature. How can one person’s instance of the filter affect each other instance of it.

3° Feature compatibility

As alluded to in the previous point, unfortunately the Multipeer capability comes both with its own restrictions as well as indirect restrictions. What do we mean by that?

Well first of all, the capability is restricted to only Video Calling Experiences. This makes sense, as the whole point of the capability is communication between participants.

What may be less explicit is that some other capabilities may bring along their own incompatibilities. For example, while Face Tracking can, by itself, be used with any experience type, it also comes bundled with the Instruction capability, which isn’t compatible for Messenger for Portal. This means that if you want to create a Multipeer experience for Portal, you won’t be able to use the Face Tracking capability, despite Multipeer being compatible itself.

Here’s a robust excel spreadsheet that goes over every Spark AR capability and shows its compatibility with the different experience types :

You can find this resource on our LAB : -link to lab post-

On a different note, something to pay close attention to whilst developing Multipeer enabled effects it that the user window ends up being very small! This will have an impact on the visual feedbacks and queues you might want to give your users. It also explains why capabilities such as the Native UI are incompatible.

4° Testing

With all of these constraints in mind, say you’ve managed to build something that you reckon is gonna work… The next obstacle in the way of your effect going viral is gonna be testing said effect. Think about it, the scenarios you’re gonna want to test involve at least two people. Depending on the mimics and gestures you expect of your users, this can become problematic real quick.

Say your game requires the user to open his mouth to eat something, good luck getting Dolapo or any of the other pre-built demo videos to do that for you— personal experience —

Another slight downside is that Spark AR, the app, is actually only capable of simulating one preview window. The one you’re accustomed to, with different options to toggle what view you want to see, but no option to see more than one view. In comes the separate app to download and update independently from the main one!

If you’ve picked up on some sourness in that last paragraph, you’re very acute! Congratulations! But in all seriousness, this app had… issues for a long time. It’s gotten significantly better thankfully in recent months, to the point where you won’t have to close, relaunch, and re-setup the camera feeds every time you change something in your filter.

You can even, get this, launch it automatically via Spark AR itself, through the File > Preview in Spark AR Player for Desktop (Ctrl+Shift+P).

5° Scripting mandatory?

In this final section, we’ll answer the question laid out as well as briefly go over some hurdles we had while developing several different experiences using the capability.

So let’s start by mentioning that if you don’t opt for the scripting approach, what you get are five live patches (at the time of writing) :

Multipeer Send — Send a signal on a dedicated channel
Multipeer Receive — Receive a signal from a dedicated channel
Pulse Pack — Pack and send information using a pulse signal
Pulse Unpack — Unpack information from a packed pulse
Throttle — Throttle a given event to a specified duration

The approach is rather simple and as such a bit limiting. Should you want more information, I encourage you to check out the official documentation on the topic here :

Multipeer Patches Overview

Multipeer Patches Overview Use multipeer patches to build effects that can be applied to a group on a video call. For…

sparkar.facebook.com

We however didn’t opt for this approach from the start, rather favoring the script route, which is roughly similar in terms of how it functions — we’re also sending and receiving messages basically.

In the first implementation we made, the main problem that became apparent was that people would get out of sync. At any given time, if someone lost the connection for some reason or another, they would be immediately desynced with the others.

There was a clear need for some kind of logic to make the effects as well as the messages sent and received more stable. This is achieved in most peer-to-peer contexts by having the notion of a “host” amidst the participants. One instance who would function as the dungeon master, receiving all signals from the others and taking the proper decisions, taking precedence over the individual instances.

This was achieved via the Spark State Library, a magical library that stores a global state between everybody and allows every participant to write/output signals. For more information and access to this resource, head on over to :

GitHub - facebookincubator/spark-state: Facebook Spark AR State Management JS module in AR Library

Facebook Spark AR State Management JS module in AR Library - GitHub - facebookincubator/spark-state: Facebook Spark AR…

github.com

So, how does our approach work?

Simple, we put all participants in an array and order them alphabetically. We then select the second person in the list — this is to make sure there are two people to begin with — and assign them as host. If the current host leaves the session, we re-assign the host position to somebody else, following the same logic.

Results speak for themselves thankfully and our experiences have been a lot more stable since! Even working on some older devices, which we had almost ended up thinking incompatible with the feature prior.

In conclusion, while some basic effects may certainly be achieved using only the two patches provided, any type of more serious game will likely require a more robust script to tackle.

You’re an AR creator? Join a tech-neutral AR culture.
Get support from our team and like-minded creators while building any kind of AR project (Spark AR, Lens Studio, 8thWall or any AR/VR software) on the Lab.

The Lab is a forum made out of our own initiative to build a tech-neutral community of creators

Credit
Article: Boris Josz, Laszlo Arnould, Josh Beckwith