We present Grendel, a distributed training system, to partition 3D Gaussian Splatting (3D GS) parameters and parallelize its computation across multiple GPUs. To optimize batched training, we explore different optimization hyperparameter scaling strategies and identify the simple √(batch_size) scaling rule to be highly effective. Using Grendel, we show that scaling up 3D GS training in terms of parameters and compute leads to significant improvements in visual quality on multiple large-scale scene reconstruction datasets.
Please select the highest playback quality. Youtube defaults to low-resolution :(
3D Gaussian Splatting (3D GS)
We design Grendel to leverage the inherent mixed parallelism of 3D GS. For tasks exhibiting Gaussian-wise parallelism, such as projection, color computation, and parameter storage, Grendel distributes Gaussians across GPUs. For pixel-wise rendering and loss computation, pixels are distributed across GPUs. Grendel then uses sparse all-to-all communication to transfer Gaussians to their designated GPUs by exploiting spatial locality. Additionally, Grendel employs a dynamic load balancer that utilizes observations from previous training iterations to partition images, aiming to minimize workload imbalance.
To efficiently scale to many GPUs, Grendel increases the batch size beyond one so it can partition training into a batch of images and into pixels inside each image.
However, increasing the batch size without tuning optimization hyperparameters can lead to unstable and inefficient training
Assuming the gradients from different images in a batch are independent, we want a batched update step to equal the sum of individual images' updates. Thus, we scale the learning rate to "undo" the Adam optimizer's second moment normalization and scale the momentum coefficient to keep the effective per-image momentum similar. Our learning rate and momentum scaling rules together enable hyperparameter-tuning-free training by making the training trajectory approximately invariant to the batch size. In experiments below, we first train a 3D GS model on the Rubble scene to iteration 15,000, then reset the Adam and continue training with different batch sizes. Since different parameter groups of 3D GS have vastly different magnitudes, we focus on one specific group, namely the diffuse color, to make the comparisons meaningful. We discover that our proposed scaling rules maintain high cosine similarity and approximately equal magnitudes regardless of batch size:
@misc{zhao2024scaling3dgaussiansplatting,
title={On Scaling Up 3D Gaussian Splatting Training},
author={Hexu Zhao and Haoyang Weng and Daohan Lu and Ang Li and Jinyang Li and Aurojit Panda and Saining Xie},
year={2024},
eprint={2406.18533},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2406.18533},
}