What is Volumetric Video

January 20, 2022
The Digital Human Blog

What is Volumetric Video?

One of the newest technologies this is starting to really lock into the consciousness of new media producers and technology professionals is volumetric video. We talk about this just a little bit in out last blog on What is a Digital Human? Today, the conversation will shift a bit more into what volumetric video is and how it works. And what better way to start than to identify the nature of the volume first.

"The Volume"

"The Volume". Its got a nice sound to it when you say it out loud. It has such weight behind it when it is spoken. But what is it? Technically speaking, the volume can represent any space in XR. This is usually an area where RGB photography and CG meet somewhere in the middle. The term really came into the popular cinematic zeitgeist during its time in use on Disney's "The Mandelorian". Which used virtual production for nearly all of its episodes.

The defining photo of the "Volume" from the set of "The Mandelorian"

In volumetric video, "The volume" is the physical space recorded by multiple cameras. As the metrics of physical position, luminance and chrominance for each point in the volume are recorded by the sensors and cameras, those physical points in space that reflect back into each camera are solved into a representation of that performance as a 3D assets in motion, as in recorded at multiple frames per second.

Intels Volumetric Stage circa 2016

Generally these volumes are a sphere of space. Although other shapes of volumes like the cylinder shaped volume is also used regularly. For general purposes, round and cylindrical volumes will be the solution for most capture needs, but custom solutions exist and can be created to resolve all kinds of technical needs. As long as the photographic and technical needs are met, a volume can take nearly any shape and can have different levels of resolution and coverage in different parts of the volume. As long as the point cloud mapping details can be created and a mesh projected onto it, a volume can be created.

Working Within the Volume

Volumetric Video is the process of filming a volume in 3D space from multiple angles at the same time. The result is a living 3D model. This “volume” is generally noted as a bubble of 3D space as most rigs are round, but volumes of virtually any shape or size can be created as projection mapping models can be created to fit any space necessary.

Recording within this volume is most rewarding when filming a human subject as skin tones are relatively easy textures to reproduce and the ability to realistically demonstrate human movement to go along with a dynamic polygon “mesh” recreates the experience of human interaction when the process is done right.

Speaking technically, any more than 2 cameras filming simultaneously from fixed positions can produce volumetric video. But with all else equal, the assets yielded with fewer cameras will be of lesser quality. Many professional systems use between 24 and 32 cameras, covering varying sizes of volumetric spaces. This will generally allow excellent coverage for a single human beings’ volume of space and can stretch out to a volume diameter of 6’-20’, depending on setup. Larger systems that can record high fidelity volumetric video over a wider space are becoming more popular. 

Volumetric video can now accurately record multiple performers simultaneously, thus widening the scope of what is possible within the medium. With this, more cameras and extensive care for quality control is necessary.

Digital Humans for Everyone
Volumetric Fashion & CG Cloth on Volumetric

With volumetric video being a chosen tool for the next generation of media (also known as "new media"), digital humans for gaming, VFX, fashion, virtual worlds and future comms have come into high demand. Going forward, this is the technology that supports metaverse and digital twins in an endless array of immersive media activations. This is the technology is transforming new ideas into nuse cases that transform traditional 2D media into immersive 3D experiences, engaging across all platforms.  

What would you do with a believably accurate representation of yourself or someone else?

There are many available markets that are beginning to utilize this technology and some that are already maturing.  With volumetric video becoming the go-to creators’ choice for high quality 3D engagements, brand new markets erupt for creators to bring this technology to its peak potential. It is already bringing forth new ecommerce platforms, new VFX processes and toolsets, new fashion marketplaces and group experiences. Digital twins are paving the way to new virtual worlds. With social and competitive experiences. Keynote, informative and branded events, product placement opportunities, educational and preservation markets are coming into their own.

Volumetric Video vs. Photogrammetry

The predecessor of volumetric video is photogrammetry. With photogrammetry, the photographer records still images of an inanimate object, then captures that object from tens or sometimes hundreds of angles utilizing all axis of view and usually from a similar distance. Then alpha channels are created from each photo, disposing of the unusable parts of the image, then the alpha channels are stitched together to create a 3D model of the target.

Similarly, the user can turn the camera outward to record a space, like any inanimate natural or manmade environment. Photogrammetry has been used in many cases for educational and for marketing purposes in 3D spaces, and with the adaptation of lidar into this technology they can create extremely accurate renders of real environments as 3D environments in which their digital assets can exist. All systems have their limitations but overall, these concepts are the basics that drive virtual worlds and digital humans in this space.

The main difference between volumetric video and photogrammetry is that volumetric video is designed to do it rapidly for objects that move or are in motion while photogrammetry is designed to record a volume as a single frame of whatever shape that volume actually takes. Both of these processes utilize projection mapping techniques at their core and are related technologies in this rapidly expanding XR and new media market.

What is a Point Cloud?

As used in volumetric video production, point clouds are a collection of points that are meant to be drawn together. In volumetric video, point clouds create the layer onto which the "mesh" will be laid over.

Facial Mesh from Intel's IMFusion and Intel's Realsense

Projection Mapping

Volumetric video is used for physical installations of 3D technologies. And at the root of all of those processes is the concept of projection mapping. Many would think of projection mapping as the revere of volumetric video.

Some popular uses of projection mapping include theme parks, art installations and plenty more. But as it relates to volumetric video, projection mapping is how volumetric models are brought together. Each individual polygon that recreates a human performance must have a very small piece of the overall image projected onto it. In motion, as each polygon changes shape and moves within the volume,new images are "projected" onto the corresponding polygons. The result of the point cloud system and the "mesh" (projected polygon surface images of the model), recreate the human form by combining these two pieces.

The Results

When done at it's optimum, the results break the uncanny valley and blend seamlessly into their world. Or when activated to give user control it excites and engages. For more on mobile volumetric streaming check out, check out UVol right here with Wild Capture.

Published on January 20th 2022


Wild Capture

A technology studio for digital humans, Wild Capture creates the most lifelike digital humans.

Related Posts