Scaling Up 3D Gaussian Splatting Training

Date

06/25/2024

We present Grendel, a distributed training system, to partition 3D Gaussian Splatting (3D GS) parameters and parallelize its computation across multiple GPUs. To optimize batched training, we explore different optimization hyperparameter scaling strategies and identify the simple √(batch_size) scaling rule to be highly effective. Using Grendel, we show that scaling up 3D GS training in terms of parameters and compute leads to significant improvements in visual quality on multiple large-scale scene reconstruction datasets.

Videos Showcase

Please select the highest playback quality. Youtube defaults to low-resolution :(

MegaNeRF Rubble (4K)

MatrixCity (1080p)

Introduction

3D Gaussian Splatting (3D GS) has been an emerging and popular technique for novel 3D view synthesis. Its popularity is because it offers faster training and rendering than comparable previous approaches such as NeRF . However, most existing 3D GS pipelines are limited to using a single GPU for training, where the memory and computation constraints become a bottleneck when applying 3D GS to higher-resolution or larger-scale scenes. To address these constraints, our Grendel system enables fast distributed training with an increased number of Gaussians and larger batches to improve reconstruction quality.

Quantitative Results

density interpolation — Image Quality vs. # of Gaussians on Rubble and MatrixCity. Grendel enables fitting more Gaussians than possible on one GPU, leading to improved PSNR and LPIPS metrics. **Left**: On Rubble, PSNR and LPIPS continue to improve when more Gaussians are used. **Right**: Similarly, on MatrixCity, image quality continues with the # of Gaussians.

Grendel System Design

We design Grendel to leverage the inherent mixed parallelism of 3D GS. For tasks exhibiting Gaussian-wise parallelism, such as projection, color computation, and parameter storage, Grendel distributes Gaussians across GPUs. For pixel-wise rendering and loss computation, pixels are distributed across GPUs. Grendel then uses sparse all-to-all communication to transfer Gaussians to their designated GPUs by exploiting spatial locality. Additionally, Grendel employs a dynamic load balancer that utilizes observations from previous training iterations to partition images, aiming to minimize workload imbalance.

Hyperparameter Scaling for Batched Training

To efficiently scale to many GPUs, Grendel increases the batch size beyond one so it can partition training into a batch of images and into pixels inside each image. However, increasing the batch size without tuning optimization hyperparameters can lead to unstable and inefficient training , yet hyperparameter tuning is itself a time-consuming and tedious process. Driven by a heuristic Independent Gradients Hypothesis for 3D GS training, we propose to scale Adam’s learning rate and momentum coefficients with a square-root and exponential rule:

$λ^{'} = λ \times \sqrt{batch_size}$

$β_{1}^{'}, β_{2}^{'} = β_{1}^{batch_size}, β_{2}^{batch_size}$

Assuming the gradients from different images in a batch are independent, we want a batched update step to equal the sum of individual images' updates. Thus, we scale the learning rate to "undo" the Adam optimizer's second moment normalization and scale the momentum coefficient to keep the effective per-image momentum similar. Our learning rate and momentum scaling rules together enable hyperparameter-tuning-free training by making the training trajectory approximately invariant to the batch size. In experiments below, we first train a 3D GS model on the Rubble scene to iteration 15,000, then reset the Adam and continue training with different batch sizes. Since different parameter groups of 3D GS have vastly different magnitudes, we focus on one specific group, namely the diffuse color, to make the comparisons meaningful. We discover that our proposed scaling rules maintain high cosine similarity and approximately equal magnitudes regardless of batch size:

Scaling Up 3D Gaussian Splatting Training

ICLR 2025 Oral

Authors

Authors

Authors

Affiliations

Affiliations

Affiliations

Date

Videos Showcase

MegaNeRF Rubble (4K)

MatrixCity (1080p)

Introduction

Quantitative Results

Grendel System Design

Hyperparameter Scaling for Batched Training

$λ^{'} = λ \times \sqrt{batch_size}$

$β_{1}^{'}, β_{2}^{'} = β_{1}^{batch_size}, β_{2}^{batch_size}$

BibTeX

Scaling Up 3D Gaussian Splatting Training

ICLR 2025 Oral

Authors

Authors

Authors

Affiliations

Affiliations

Affiliations

Date

Videos Showcase

MegaNeRF Rubble (4K)

MatrixCity (1080p)

Introduction

Quantitative Results

Grendel System Design

Hyperparameter Scaling for Batched Training

λ′=λ×batch_size

β1′,β2′=β1batch_size,β2batch_size

BibTeX

$λ^{'} = λ \times \sqrt{batch_size}$

$β_{1}^{'}, β_{2}^{'} = β_{1}^{batch_size}, β_{2}^{batch_size}$