LagerNVS
English
meta-ai
facebook
meta-pytorch

Please be sure to provide your full legal name, date of birth, and full organization name with all corporate identifiers. Avoid the use of acronyms and special characters. Failure to follow these instructions may prevent you from accessing this model and others on Hugging Face. You will not have the ability to edit this form after submission, so please ensure all information is accurate.

The information you provide will be collected, stored, processed and shared in accordance with the Meta Privacy Policy.

Log in or Sign Up to review the conditions and access this model content.

This is a model checkpoint accompanying the CVPR 2026 paper “LagerNVS: Latent Geometry for Fully Neural Real-time Novel View Synthesis” (arxiv.org/abs/2603.20176). Accompanying code is available on github

The model takes as input a number of 2D input images, possibly with corresponding camera poses, and processes information about the scene within these images. Then, it takes as input a new camera pose from which a user wishes to see the image. The model then renders the observed scene from a new viewpoint. The model ‘re-renders’ the observed content from a novel viewpoint, effectively estimating what it would look like in 3d. 
 This checkpoint was trained with 1-10 input views, with and without input camera poses, at 512 resolution (longer side) on a mix of datasets. It is intended for general usage on any static scene, can work with or without known source camera poses and at aspect ratios within the [0.5, 2.0] range. It is licensed under the FAIR research license

The expected performance on the splits detailed in the paper is:

Dataset Views Posed Split PSNR ↑ SSIM ↑ LPIPS ↓
Re10k 2 PixelSplat 28.99 0.900 0.149
Re10k 2 PixelSplat 27.88 0.875 0.161
Re10k 2 FLARE 26.36 0.866 0.190
Re10k 2 FLARE 25.11 0.833 0.210
DL3DV 2 DepthSplat 21.66 0.688 0.290
DL3DV 2 DepthSplat 21.27 0.666 0.303
DL3DV 4 DepthSplat 24.90 0.778 0.189
DL3DV 4 DepthSplat 23.89 0.738 0.208
DL3DV 6 DepthSplat 26.09 0.806 0.161
DL3DV 6 DepthSplat 24.86 0.761 0.181
DL3DV 16 Rayzer 25.20 0.776 0.174
DL3DV 16 Rayzer 23.31 0.713 0.214
CO3D 3 ReconFusion 21.18 0.688 0.393
CO3D 3 ReconFusion 19.88 0.660 0.445
CO3D 6 ReconFusion 23.41 0.728 0.326
CO3D 6 ReconFusion 21.37 0.680 0.388
CO3D 9 ReconFusion 24.40 0.740 0.302
CO3D 9 ReconFusion 22.05 0.689 0.365
Mip360 3 ReconFusion 17.74 0.428 0.505
Mip360 3 ReconFusion 17.29 0.409 0.535
Mip360 6 ReconFusion 19.17 0.466 0.444
Mip360 6 ReconFusion 18.81 0.446 0.472
Mip360 9 ReconFusion 20.19 0.487 0.412
Mip360 9 ReconFusion 19.69 0.461 0.440

Known limitations:

  • The model is trained with deterministic (i.e. not generative) losses, so it cannot generate plausible completions of unobserved regions of the scene.
  • The model is suited only to static data.
  • The model expects source images to have the same focal lengths.
  • The model did not include humans or animals in the training data, and did not include images with distortion (e.g., fish-eye). As a consequence, we do not expect the model to work well in such scenarios.
  • Occasionally, when rendering a video, one observes block artifacts and an impression of flicker. This often stems from uncertainty in geometry estimation, unobserved regions, or difficulty in estimating source camera poses.
  • Regions with high-frequency patterns, such as grass or trees, are systematically poorly represented by our model—investigating this failure mode is an interesting avenue for future work.
Downloads last month
1,938
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including facebook/lagernvs_general_512

Paper for facebook/lagernvs_general_512