Introduction to Gaussian Splatting

1. Introduction to Gaussian Splatting

When it comes to radiance field rendering, a relatively recent yet effective technique stands out for its innovative approach to depicting complex scenes with remarkable efficiency and realism: Gaussian Splatting [1]. At its core, Gaussian Splatting represents a leap forward in how we project and blend information from three-dimensional points onto a two-dimensional image plane, leveraging the mathematical principles of Gaussian functions. This technique is instrumental in rendering scenes laden with intricate details, achieving an unprecedented level of smoothness and visual fidelity.

Figure 1: Comparative Visualization of Rendering Techniques and Quality: This figure from the original Gaussian Splatting paper presents a side-by-side comparison of different rendering methods applied to a common scene featuring a bicycle. From left to right, we see the results from Instant-NGP, Plenoxels, Mip-NeRF360, and two variations of the Gaussian Splatting method (denoted as 'Ours'), contrasted against the 'Ground Truth' image. Each rendering is evaluated based on its frames per second (fps), indicating the speed of rendering, and Peak Signal-to-Noise Ratio (PSNR), which measures image quality compared to the ground truth. The training time for each method is also provided, showcasing the efficiency of Gaussian Splatting in achieving high-quality results with significantly less training time.

The genesis of Gaussian Splatting can be traced back to the foundational work on Neural Radiance Fields (NeRFs) [5], a paradigm-shifting framework that has revolutionized our ability to create photorealistic images from sparse data sets. NeRFs employ deep learning to interpolate and render complex scenes by understanding the spatial distribution of light and color in 3D space. Gaussian Splatting builds upon this framework, offering a method that not only complements but significantly enhances the rendering process. Through its efficient simulation of light interactions with surfaces, Gaussian Splatting enables the creation of images that are not just visually appealing but are also computationally feasible, even in scenes of daunting complexity.

Figure 2: Anatomy of a Gaussian Splat: This vibrant splat illustrates a single Gaussian function in 3D space, characterized by its position on a grid, its spread as indicated by the coloring which represents the Gaussian's covariance, and the interplay of light and shadow hinting at its alpha transparency.

The essence of Gaussian Splatting lies in its ability to smooth out the projection of information—such as color and density—from 3D points to the 2D plane using Gaussian functions. This smoothing is crucial for eliminating the harsh edges and artifacts that commonly plague rendered images, thereby ensuring a more natural and cohesive visual experience. By applying this technique, researchers have achieved levels of detail and realism previously unattainable, opening new vistas in the fields of virtual reality, game development, and cinematic visual effects [2][3][4].

Figure 3: Coalescence of Gaussian Splats: Here we see a cluster of splats, each behaving according to Gaussian distribution principles, merging seamlessly to form a cohesive structure. This image demonstrates the basic principle of Gaussian Splatting, where individual entities blend together, with their borders indistinguishable from the whole.

Furthermore, the application of Gaussian Splatting extends beyond mere image enhancement. It represents a significant step forward in computational photography and visualization, enabling the reconstruction of scenes from real-world data with an accuracy and depth that were once out of reach. This has profound implications not just for entertainment and media but also for areas such as architectural visualization, digital heritage preservation, and even medical imaging, where the detailed and realistic representation of complex structures is paramount.

Figure 4: Symphony of Splats: A complex, organic structure emerges from the collective rendering of millions of Gaussian splats. This explosion of color and form captures the essence of Gaussian Splatting used at scale, illustrating how many simple elements can combine to create intricate, detailed visual phenomena.

As we delve deeper into the mechanics and applications of Gaussian Splatting, it’s essential to acknowledge the pioneering studies and papers that laid the groundwork for this technique. The original paper on Neural Radiance Fields by Mildenhall et al. [5] introduced the world to the potential of deep learning in rendering, setting the stage for subsequent innovations like Gaussian Splatting. Further developments and explorations into this technique [8][6][7] have continued to push the boundaries of what’s possible, highlighting the collaborative effort of the scientific community in advancing our understanding and capabilities in computer graphics.

2. Mathematical Foundations of Gaussian Splatting

Gaussian Splatting is predicated on the principle of projecting information from a three-dimensional point cloud onto a two-dimensional plane. This section will elucidate the key mathematical constructs that govern this process.

2.1 Gaussian Distribution in Splatting

At the core of Gaussian Splatting is the Gaussian distribution function, which is instrumental in the spatial blending of points. The Gaussian distribution, defined as follows, ensures a smooth gradient of influence around a point’s location:

\[ G(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{1}{2}(\frac{x-\mu}{\sigma})^2} \quad (1) \]

Here, \(x\) represents the distance from the center of the point, \(\mu\) is the mean or center of the distribution, and \(\sigma\) is the standard deviation that dictates the spread of the distribution.

Figure 5: Standard 1D Gaussian Distribution: This graph illustrates the classic bell curve of a one-dimensional Gaussian distribution, showcasing its symmetrical nature around the mean value on the horizontal axis and the exponential decay as we move away from the center.

2.2 3D Gaussian Function

Generalizing to 3D space, the core of our technique is a 3D Gaussian function, represented as follows:

\[ G(x) = e^{-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)} \quad (2) \]

where \(\mu\) is the point of interest (mean) and \(\Sigma\) is the covariance matrix in world space, defining the distribution’s spread.

Figure 6: Three-Dimensional Gaussian Surface: Depicted here is a 3D Gaussian function, also known as a Gaussian bell in three dimensions. This representation emphasizes the smooth gradient and symmetry of the distribution along all axes, creating a hill-like surface that falls off gently towards the plane.

2.3 Rendering Equation with Gaussian Influence

The rendering of an image via Gaussian Splatting can be described by an integral that accumulates the color contributions of each point, modulated by their Gaussian influence and the scene’s opacity at each location:

\[ I(x) = \int_{\mathbb{R}^3} G(|x - p_i|)L(p_i,\omega)V(p_i)dA(p_i) \quad (3) \]

In this equation, \(I(x)\) denotes the intensity at the pixel location \(x\), \(G\) is the Gaussian distribution, \(p_i\) is the position of the i-th point, \(L(p_i,\omega)\) is the radiance from \(p_i\) in direction \(\omega\), and \(V(p_i)\) signifies the visibility function at \(p_i\).

2.4 Opacity and Color Blending

The interplay of light and material properties is captured through the blending of color and opacity. Gaussian Splatting achieves this by overlapping the contributions from different points, accounting for their respective opacities:

\[ C(x) = \frac{\int_{\mathbb{R}^3} G(|x - p_i|)C(p_i)\tau(p_i)dA(p_i)}{\int_{\mathbb{R}^3} G(|x - p_i|)\tau(p_i)dA(p_i)} \quad (4) \]

\(C(x)\) is the resulting color at the pixel location \(x\), \(C(p_i)\) is the color contribution from the i-th point, and \(\tau(p_i)\) represents the opacity at \(p_i\). The denominator ensures normalization of the weighted color contributions, facilitating the correct intensity and hue for the rendered image.

2.5 Projection to 2D Space

For rendering in 2D, a projection of these 3D Gaussians is required. This is achieved by transforming the 3D covariance matrix \(\Sigma\) to camera coordinates:

\[ \Sigma’ = J W \Sigma W^T J^T \quad (5) \]

Here, \(W\) represents the viewing transformation, and \(J\) is the Jacobian of the affine approximation to the projective transformation.

2.6 Ellipsoid Configuration

An ellipsoid’s configuration, offering an intuitive representation for \(\Sigma\), is defined by:

\[ \Sigma = RSS^TR^T \quad (6) \]

where \(S\) is a scaling matrix and \(R\) is a rotation matrix.

3. Kickstart Your Gaussian Splatting Adventure: Setting the Stage with Python

Hey there! Ready to dive into the world of Gaussian Splatting? Let’s get our hands dirty with some code. First things first, we need to gather our tools. Think of this step like packing your bag before a hike. You wouldn’t want to forget your water bottle, right? Similarly, we can’t start without importing some essential Python libraries that will help us on this journey.

What’s in Our Toolkit?

Here’s a sneak peek at the Python goodies we’re bringing along:

import copy
import ipywidgets
import json
import kaolin
import matplotlib.pyplot as plt
import numpy as np
import os
import torch
import torchvision

# And don't forget about our special gear for Gaussian Splatting
from utils.graphics_utils import focal2fov
from utils.system_utils import searchForMaxIteration
from gaussian_renderer import render, GaussianModel
from scene.cameras import Camera as GSCamera

Breaking it down: We’ve got our classic friends like numpy for all things numbers, matplotlib.pyplot to plot our adventures, and torch because, well, neural networks are cool. Kaolin is a bit of a Swiss Army knife for 3D deep learning, so that’s handy!

And those special tools at the bottom? They’re our secret sauce for making Gaussian Splatting magic happen. We’ll dive deeper into those as we move along. But for now, just know that they’re like our map and compass in the wild world of rendering.

One more thing: Ever felt lost looking at tensor dimensions? Well, fear not! We’ve got a little helper function that’s like our very own guide in the tensor jungle:

def log_tensor(t, name, **kwargs):
    print(kaolin.utils.testing.tensor_info(t, name=name, **kwargs))

This handy function will shout out everything we need to know about our tensors, making sure we’re always on the right path.

And that’s it for our prep work! With our bag packed full of Python tools, we’re ready to tackle the exciting world of Gaussian Splatting. Stay tuned for our next step, where we’ll start the actual journey and see these tools in action!

4. Loading the Model

Alright, adventurers! Now that we’ve got our tools, it’s time to summon our magic wand—the checkpoint. This is where our previously trained model lies in wait, full of learned wisdom and ready to splat Gaussians like a pro.

def load_checkpoint(model_path, sh_degree=3, iteration=-1):
    # Magical incantations to find and awaken the checkpoint
    checkpt_dir = os.path.join(model_path, "point_cloud")
    if iteration == -1:
        iteration = searchForMaxIteration(checkpt_dir)
    checkpt_path = os.path.join(checkpt_dir, f"iteration_{iteration}", "point_cloud.ply")
    
    # Awaken the Gaussians from their slumber
    gaussians = GaussianModel(sh_degree)
    gaussians.load_ply(checkpt_path)                                                 
    return gaussians

This spell (function, if you’re not into the whole magic thing) awakens our model, readying it for the adventures ahead. It seeks out the most recent training session if you don’t specify one, ensuring we’re always using the freshest of magical essences.

5. Camera Setup

In any good adventure, knowing where to look is half the battle. That’s why we scry (or load) the perfect camera settings to capture our scene just right.

def try_load_camera(model_path):
    # Attempt to find the mystical eye (camera settings)
    cam_path = os.path.join(model_path, 'cameras.json')
    if not os.path.exists(cam_path):
        print(f'Could not find saved cameras for the scene; using default.')
        # Default camera spell
        return GSCamera(...)
        
    with open(cam_path) as f:
        data = json.load(f)
        # Camera transformation incantations
        ...
    return GSCamera(...)

6. Rendering the Scene

Now, let’s pull the curtain back and reveal the magic of our scene. Here’s how we bring a 3D model into the spotlight of our 2D world. We’ll take our camera settings, the mystical Gaussian model, and the background of deep space (just a fancy way of saying a black background) to render our scene.

# The spell to conjure the image
render_res = render(test_camera, gaussians, pipeline, background)
rendering = render_res["render"]

# Let's see what the oracle (our render function) has to say about our render
for k in render_res.keys():
    log_tensor(render_res[k], k, print_stats=True)

# And for the grand finale, let's reveal the image to the world!
plt.imshow((rendering.permute(1, 2, 0) * 255).to(torch.uint8).detach().cpu().numpy())

Figure 7: The magic of Gaussian Splatting brought to life.

7. The Art of Camera Conversion

7.1 Calculating the Field of View

To understand our digital realm’s extent, we start by crafting a spell for the field of view (FoV):

def compute_cam_fov(intrinsics, axis='x'):
    # Compute FOV from camera focal length
    aspectScale = intrinsics.width / 2.0
    tanHalfAngle = aspectScale / (intrinsics.focal_x if axis == 'x' else intrinsics.focal_y).item()
    fov = np.arctan(tanHalfAngle) * 2
    return fov

7.2 Morphing Cameras Between Realms

The following incantations allow us to morph our view between the Kaolin and GS realms:

def convert_kaolin_camera(kal_camera):
    # Convert Kaolin camera to GS camera
    R = kal_camera.extrinsics.R[0]
    R[1:3] = -R[1:3] # Invert certain axes
    T = kal_camera.extrinsics.t.squeeze()
    T[1:3] = -T[1:3] # Invert certain axes
    return GSCamera(colmap_id=0,
                    R=R.transpose(1, 0).cpu().numpy(), 
                    T=T.cpu().numpy(), 
                    FoVx=compute_cam_fov(kal_camera.intrinsics, 'x'), 
                    FoVy=compute_cam_fov(kal_camera.intrinsics, 'y'), 
                    image=torch.zeros((3, kal_camera.height, kal_camera.width)), 
                    gt_alpha_mask=None,
                    image_name='fake',
                    uid=0)

def convert_gs_camera(gs_camera):
    # Convert GS camera to Kaolin camera
    view_mat = gs_camera.world_view_transform.transpose(1, 0)
    view_mat[1:3] = -view_mat[1:3] # Invert certain axes
    res = kaolin.render.camera.Camera.from_args(
        view_matrix=view_mat,
        width=gs_camera.image_width, height=gs_camera.image_height,
        fov=gs_camera.FoVx, device='cpu')
    return res

7.3 Ensuring Consistency in Our Rendered Reality

Finally, we cast a spell to ensure the transformed camera views the world just as our original did:

# Test the camera conversion by rendering an image
kal_cam = convert_gs_camera(test_camera)
test_cam_back = convert_kaolin_camera(kal_cam)
rendering = render(test_cam_back, gaussians, pipeline, background)["render"]
plt.imshow((rendering.permute(1, 2, 0) * 255).to(torch.uint8).detach().cpu().numpy())

Figure 8: The consistent view of our 3D world through different lenses.

8. Crafting the Interactive Viewing Spell

With the scene set and our camera converted, it’s time to craft an interactive spell that lets us explore our Gaussian Splatting scene with the ease of a sorcerer’s touch.

8.1 Rendering for Interactivity

We’ll define a rendering function that can accept a Kaolin camera. This bit of magic lets us switch perspectives at will:

def render_kaolin(kaolin_cam):
    # Convert the Kaolin camera to a GS camera
    cam = convert_kaolin_camera(kaolin_cam)
    # Render the scene using the GS camera
    render_res = render(cam, gaussians, pipeline, background)
    # Prepare the image for visualization
    rendering = render_res["render"]
    return (rendering.permute(1, 2, 0) * 255).to(torch.uint8).detach().cpu()

8.2 Setting the Stage for Exploration

We pinpoint the focus of our virtual camera onto our scene. This will be the center of rotation for our interactive viewer:

# Determine the point in space our camera will focus on
focus_at = (kal_cam.cam_pos() - 4. * kal_cam.extrinsics.cam_forward()).squeeze()
# Set up the interactive visualizer with the Kaolin camera
visualizer = kaolin.visualize.IpyTurntableVisualizer(
    512, 512, copy.deepcopy(kal_cam), render_kaolin, 
    focus_at=focus_at, world_up_axis=2, max_fps=12)
# Display the interactive scene
visualizer.show()

With this incantation complete, you should now see a window to your 3D world, ready to be explored with the drag of a mouse or the swipe of a finger. Rotate it, zoom in, and let the details of your creation come to life. Enjoy the magic at your fingertips!

8.3 A Touch of Control: Interactive Viewer Enhancements

With our scene set, let’s add a touch of interactivity that allows us to filter and inspect our Gaussian splats with precision.

8.3.1 Peering Into the Scales

We’ll start by examining the scales of our Gaussians to understand their influence on our rendered scene:

# Investigate the scales of our Gaussians
log_tensor(gaussians.get_scaling, 'scaling after activation', print_stats=True)
log_tensor(gaussians._scaling, 'scaling, raw', print_stats=True)

8.3.2 Selective Rendering Incantation

Next, we conjure up a selective render function that lets us filter Gaussians based on their scale, which is controlled by a mystical slider:

def selective_render_kaolin(kaolin_cam):
    # Filter the Gaussians by their scale using a slider value
    scaling = gaussians._scaling.max(dim=1)[0]
    mask = scaling < slider.value
    # Create a temporary Gaussian model with the filtered splats
    tmp_gaussians = GaussianModel(gaussians.max_sh_degree)
    # Apply the mask to all Gaussian parameters
    tmp_gaussians._xyz = gaussians._xyz[mask, :]
    tmp_gaussians._features_dc = gaussians._features_dc[mask, ...]
    tmp_gaussians._features_rest = gaussians._features_rest[mask, ...]
    tmp_gaussians._opacity = gaussians._opacity[mask, ...]
    tmp_gaussians._scaling = gaussians._scaling[mask, ...]
    tmp_gaussians._rotation = gaussians._rotation[mask, ...]
    tmp_gaussians.active_sh_degree = gaussians.max_sh_degree
    # Render the scene with the filtered Gaussians
    cam = convert_kaolin_camera(kaolin_cam)
    render_res = render(cam, tmp_gaussians, pipeline, background)
    rendering = render_res["render"]
    return (rendering.permute(1, 2, 0) * 255).to(torch.uint8).detach().cpu()

8.3.3 Slider Charm

The following code creates a slider to dynamically adjust the Gaussian filter, re-rendering the scene as the slider moves:

def handle_slider(e):
    # Clear the current output and update the rendering
    visualizer.out.clear_output()
    with visualizer.out:
        visualizer.render_update()

# The slider that controls the Gaussian scale threshold
scaling = gaussians._scaling.max(dim=1)[0]
slider = ipywidgets.FloatSlider(value=scaling.max().item(),
    min=scaling.min().item(), max=scaling.max().item(),
    step=0.1,
    description='Max scale:',
    disabled=False,
    continuous_update=True,
    orientation='horizontal',
    readout=True,
    readout_format='.3f',
)
slider.observe(handle_slider, names='value')

8.3.4 Conjuring the Visualizer and Slider

Finally, we bring forth the interactive visualizer with our selective render function and attach the slider to it:

# Initialize the visualizer with the selective render function
visualizer = kaolin.visualize.IpyTurntableVisualizer(
    512, 512, copy.deepcopy(kal_cam), selective_render_kaolin, 
    focus_at=focus_at, world_up_axis=2, max_fps=12)
# Update the rendering to apply the initial filter from the slider
visualizer.render_update()
# Display the visualizer and the slider for interactive control
display(visualizer.canvas, visualizer.out, slider)

With the slider in place, you now wield the power to filter and scrutinize your scene. Slide through the scales and watch as your Gaussians obey, revealing the fine-tuned beauty of your creation.

References

[1] Kerbl, B., Kopanas, G., Leimkühler, T., & Drettakis, G. (2023). 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics, 42(4).

[2] Xiong, H., Muttukuru, S., Upadhyay, R., Chari, P., & Kadambi, A. (2023). SparseGS: Real-Time 360° Sparse View Synthesis using Gaussian Splatting. Arxiv.

[3] Yan, C., Qu, D., Wang, D., Xu, D., Wang, Z., Zhao, B., & Li, X. (2023). GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting. ArXiv.

[4] Tang, J., Chen, Z., Chen, X., Wang, T., Zeng, G., & Liu, Z. (2024). LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation. arXiv preprint arXiv:2402.05054.

[5] Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., & Ng, R. (2020). NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. ECCV.

[6] Yan, Y., Lin, H., Zhou, C., Wang, W., Sun, H., Zhan, K., Lang, X., Zhou, X., & Peng, S. (2024). Street Gaussians for Modeling Dynamic Urban Scenes. arXiv preprint arXiv:2401.01339.

[7] Li, M., Yao, S., Xie, Z., & Chen, K. (2024). GaussianBody: Clothed Human Reconstruction via 3D Gaussian Splatting. ArXiv.

[8] Duan, Y., Wei, F., Dai, Q., He, Y., Chen, W., & Chen, B. (2024). 4D Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes. arXiv preprint arXiv:2402.03307.