Please be sure to provide your full legal name, date of birth, and full organization name with all corporate identifiers. Avoid the use of acronyms and special characters. Failure to follow these instructions may prevent you from accessing this model and others on Hugging Face. You will not have the ability to edit this form after submission, so please ensure all information is accurate.

The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy.

This is a model checkpoint accompanying the CVPR 2026 paper “LagerNVS: Latent Geometry for Fully Neural Real-time Novel View Synthesis” (arxiv.org/abs/2603.20176). Accompanying code is available on github

The model takes as input a number of 2D input images, possibly with corresponding camera poses, and processes information about the scene within these images. Then, it takes as input a new camera pose from which a user wishes to see the image. The model then renders the observed scene from a new viewpoint. The model ‘re-renders’ the observed content from a novel viewpoint, effectively estimating what it would look like in 3d.   This checkpoint was trained with 1-10 input views, with and without input camera poses, at 512 resolution (longer side) on a mix of datasets. It is intended for general usage on any static scene, can work with or without known source camera poses and at aspect ratios within the [0.5, 2.0] range. It is licensed under the FAIR research license

The expected performance on the splits detailed in the paper is:

Dataset	Views	Posed	Split	PSNR ↑	SSIM ↑	LPIPS ↓
Re10k	2	✓	PixelSplat	28.99	0.900	0.149
Re10k	2	✗	PixelSplat	27.88	0.875	0.161
Re10k	2	✓	FLARE	26.36	0.866	0.190
Re10k	2	✗	FLARE	25.11	0.833	0.210
DL3DV	2	✓	DepthSplat	21.66	0.688	0.290
DL3DV	2	✗	DepthSplat	21.27	0.666	0.303
DL3DV	4	✓	DepthSplat	24.90	0.778	0.189
DL3DV	4	✗	DepthSplat	23.89	0.738	0.208
DL3DV	6	✓	DepthSplat	26.09	0.806	0.161
DL3DV	6	✗	DepthSplat	24.86	0.761	0.181
DL3DV	16	✓	Rayzer	25.20	0.776	0.174
DL3DV	16	✗	Rayzer	23.31	0.713	0.214
CO3D	3	✓	ReconFusion	21.18	0.688	0.393
CO3D	3	✗	ReconFusion	19.88	0.660	0.445
CO3D	6	✓	ReconFusion	23.41	0.728	0.326
CO3D	6	✗	ReconFusion	21.37	0.680	0.388
CO3D	9	✓	ReconFusion	24.40	0.740	0.302
CO3D	9	✗	ReconFusion	22.05	0.689	0.365
Mip360	3	✓	ReconFusion	17.74	0.428	0.505
Mip360	3	✗	ReconFusion	17.29	0.409	0.535
Mip360	6	✓	ReconFusion	19.17	0.466	0.444
Mip360	6	✗	ReconFusion	18.81	0.446	0.472
Mip360	9	✓	ReconFusion	20.19	0.487	0.412
Mip360	9	✗	ReconFusion	19.69	0.461	0.440

Known limitations:

The model is trained with deterministic (i.e. not generative) losses, so it cannot generate plausible completions of unobserved regions of the scene.
The model is suited only to static data.
The model expects source images to have the same focal lengths.
The model did not include humans or animals in the training data, and did not include images with distortion (e.g., fish-eye). As a consequence, we do not expect the model to work well in such scenarios.
Occasionally, when rendering a video, one observes block artifacts and an impression of flicker. This often stems from uncertainty in geometry estimation, unobserved regions, or difficulty in estimating source camera poses.
Regions with high-frequency patterns, such as grass or trees, are systematically poorly represented by our model—investigating this failure mode is an interesting avenue for future work.

Downloads last month: 1,938

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including facebook/lagernvs_general_512

LagerNVS

Collection

Latent Geometry for Fully Neural Real-time Novel View Synthesis • 3 items • Updated Mar 31 • 2

Paper for facebook/lagernvs_general_512

LagerNVS: Latent Geometry for Fully Neural Real-time Novel View Synthesis

Paper • 2603.20176 • Published Mar 20 • 11